Evolutionary convergences are observed at all levels, from phenotype to DNA and protein sequences, and changes at these different levels tend to be highly correlated. Notably, convergent and parallel mutations can lead to convergent changes in phenotype, such as changes in metabolism, drug resistance, and other adaptations to changing environments.
We propose a two-step approach to detect mutations under convergent evolution in protein alignments. We first select mutations that emerge more often than expected under neutral evolution and then test whether their emergences correlate with the convergent phenotype under study. When no phenotype is available, as is often the case with microorganisms, the first step can be used alone. To do this, a phylogeny is inferred from the data and used to simulate the evolution of each alignment position. These simulations are used to estimate the expected number of mutations under neutral conditions, which is compared to what is observed in the data. Next, we measure using a comparative phylogenetic approach, whether the presence of mutations occurring more often than expected correlates with the convergent phenotype.
ConDor was applied to three real-world datasets: sedges PEPC proteins, HIV reverse transcriptase and fish rhodopsin. The results show that the two components of ConDor complement each other, with an overall accuracy that compares favorably to other available tools, especially on large datasets.
To learn more about the usage of ConDor, please read our help page
ConDor source code is available on github.
We need your help to improve this web service. Please send your comments and/or suggestions to: marie[dot]morel[at]pasteur[dot]fr, frederic[dot]lemoine[at]pasteur[dot]fr and olivier[dot]gascuel[at]mnhn[dot]fr
Evolutionary Bioinformatics unit, Institut Pasteur, Paris, France
Morel, M., Lemoine, F., Zhukova, A. and O. Gascuel, "Accurate detection of Convergent Mutations in Large Protein Alignments with ConDor". doi: https://doi.org/10.1101/2021.06.30.450558
Try it now
You need several files to run ConDor:
- An amino acid alignment in fasta format including outgroup sequences
- A phylogenetic tree in Newick format, including the outgroup sequences;
- A text file with outgroup sequence names
- A text file containing sequence names with the convergent phenotype
- To infer trees from aligned sequences, do not hesitate to use NGPhylogeny.fr
- An example analysis is available here. For the example, we used the dataset from (Besnard et al., 2009) used in the PCOC paper (Rey et al. 2018). It consists of 79 sequences of the PEPC protein in sedges (plant species at C3/C4 transition) and the corresponding tree. You can find a brief analysis of these results in our help page.