Mol Phylogenet Evol:武汉植物园石涛研究组发布系统基因组学研究进
近日,国际期刊molecular Phylogenetics and Evolution上在线发表中国科学院武汉植物园水生植物基因组学与遗传育种学科组助理研究员石涛的一篇研究论文,利用多个被子植物的基因组序列,揭示了具备不同特征的基因家族进化历史对构建物种间的进化关系准确度的影响。
以往的分子系统发育研究往往基于单个或几个基因的序列。而在基因组时代,全基因组信息往往被利用研究分类群之间的进化关系。由于功能上的差异,不同的基因、基因家族有时会存在不同的进化历史,用不同基因家族的序列研究物种间的进化历史可能存在着准确度上的差异。例如,在植物中与胁迫抗病相关的R-gene进化上相对不保守,在不同类群中存在单独复制和丢失事件;而大多数管家基因(house keeping genes)在进化上并不发生复制,且序列上相对保守。
以基因树最简约法(Gene Tree Parsimony, GTP)为代表的系统基因组学(Phylogenomics)能够利用具有复杂的基因复制历史的基因家族数据来建立物种的系统发育树。GTP的方法主要寻找可以解释最少(基因复制)进化事件的最佳的物种系统进化树。然而,这种系统基因组学方法的准确性在不同进化模式的基因家族的背景下尚不了解。
武汉植物园水生植物基因组学与遗传育种学科组助理研究员石涛利用多个被子植物的基因组序列,揭示了具备不同特征的基因家族进化历史对构建物种间的进化关系准确度的影响。该研究结果表明,基因家族的大小和基因家族在某个分类群中特异性扩增和缩小对GTP研究物种系统进化的准确性有强烈的影响,形成二项式曲线性(binomial)关系。根据这种进化模式对系统进化分析准确性影响的程度,可以量化GTP中的每次复制事件的生物学代价(BioLogical cost),从而增加构建系统进化树的准确度。该研究发表在
图:不同的基因家族的总体大小和在不同的分类群中的成员数有所不同,而这种差异直接影响着构建分类群的系统发育树的准确度。
原文链接:
Impact of gene family evolutionary histories on phylogenetic species tree inference by gene tree parsimony
原文摘要:
Complicated history of gene duplication and loss brings challenge to molecular phylogenetic inference, especially in deep phylogenies. However, phylogenomic approaches, such as gene tree parsimony (GTP), show advantage over some other approaches in its ability to use gene families with duplications. GTP searches the ‘optimal’ species tree by minimizing the total cost of biological events such as duplications, but accuracy of GTP and phylogenetic signal in the context of different gene families with distinct histories of duplication and loss are unclear. To evaluate how different evolutionary properties of different gene families can impact on species tree inference, 3900 gene families from seven angiosperms encompassing a wide range of gene content, lineage-specific expansions and contractions were analyzed. It was found that the gene content and total duplication number in a gene family strongly influence species tree inference accuracy, with the highest accuracy achieved at either very low or very high gene content (or duplication number) and lowest accuracy centered in intermediate gene content (or duplication number), as the relationship can fit a binomial regression. Besides, for gene families of similar level of average gene content, those with relatively higher lineage-specific expansion or duplication rates tend to show lower accuracy. Additional correlation tests support that high accuracy for those gene families with large gene content may rely on abundant ancestral copies to provide many subtrees to resolve conflicts, whereas high accuracy for single or low copy gene families are just subject to sequence substitution per se. Very low accuracy reached by gene families of intermediate gene content or duplication number can be due to insufficient subtrees to resolve the conflicts from loss of alternative copies. As these evolutionary properties can significantly influence species tree accuracy, I discussed the potential weighting of the duplication cost by evolutionary properties of gene families in future GTP analyses.
doi:10.1016/j.ympev.2015.12.002
作者:石涛