Skip to main content
Fig. 3 | BMC Evolutionary Biology

Fig. 3

From: Phylogenomics provides a robust topology of the major cnidarian lineages and insights on the origins of key organismal traits

Fig. 3

Relationship between sparse data representation and the retention of contaminated sequences in phylogenomic data matrices as illustrated by myxozoan species. We conducted BLAST similarity searches against a metazoan genome database for all myxozoan sequences present in both the AG_62tx and OF-PTP_62tx matrices. In addition, we noted how many myxozoan species were present in each partition. Myxozoans are internal parasites of teleost fishes and we noted significant contamination in transcriptome data from these host species. The Agalma pipeline produces a large, but sparse matrix as compared to OF-PTP (Fig. 4). In cases where contamination is common, as with myxozoan data, sparse data matrices have high numbers of partitions with single species represented per clade, which in turn are enriched for contaminant sequences. Partitions with greater than one species of myxozoan present have a lower potential to include contamination. The OF-PTP pipeline produces a denser data matrix, which makes it inherently less prone to selecting contaminants

Back to article page