RIASSUNTO
Background
The thousands of species of closely related cichlid fishes in the great lakes of East Africa are a powerful model for understanding speciation and the genetic basis of trait variation. Recently, the genomes of five species of African cichlids representing five distinct lineages were sequenced and used to predict protein products at a genome-wide level. Here we characterize the evolutionary relationship of each cichlid protein to previously sequenced animal species.
Results
We used the Treefam database, a set of preexisting protein phylogenies built using 109 previously sequenced genomes, to identify Treefam families for each protein annotated from four cichlid species: Metriaclima zebra, Astatotilapia burtoni, Pundamilia nyererei and Neolamporologus brichardi. For each of these Treefam families, we built new protein phylogenies containing each of the cichlid protein hits. Using these new phylogenies we identified the evolutionary relationship of each cichlid protein to its nearest human and zebrafish protein. This data is available either through download or through a webserver we have implemented.
Conclusion
These phylogenies will be useful for any cichlid researchers trying to predict biological and protein function for a given cichlid gene, understanding the evolutionary history of a given cichlid gene, identifying recently duplicated cichlid genes, or performing genome-wide analysis in cichlids that relies on using databases generated from other species.
Electronic supplementary material
The online version of this article (10.1186/s12862-017-1072-2) contains supplementary material, which is available to authorized users.