Data preparation prior to phylogenetic analyses
A list with details on the 21 species with newly sequenced transcriptome data is provided in Additional file 2: Table S1. Accession numbers for all species are given in Additional file 2: Table S2. Higher species affiliation follows the World Register of Marine Species (WoRMS), with the exception of the clade of all aeolids, which we call Aeolidida.
Transcriptome sequencing and data processing
Paired-end sequencing resulted in approximately 7.5 Gbases of raw data per sample. For the newly generated transcriptomes, the number of complete read pairs ranged from 20,266,817 in Calmella cavolini to 43,524,035 in Facelina rubrovittata with a median of 24,882,673 (Hancockia cf. uncinata). After trimming of possible adapter sequences and sequence regions of low quality, the average read length of complete read pairs ranged from 118.1 bp in Hermissenda emurai to 139.6 bp in Doto sp. with a median of 133.8 bp in Polycera quadrilineata (Additional file 2: Table S3). Details on sequence processing are provided in Additional file 1. Transcriptome assembly using six different de novo assemblers per data set resulted in a total number of 366 assemblies, i.e., six assemblies for each of the 61 transcriptomic data sets (see Additional file 1 and Additional file 2: Table S4).
Evaluation of transcriptome assemblies, orthology prediction, and alignment procedures
Evaluation of assembled transcriptomes and subsequently applying BUSCO version 3.0.0 [49] with the Metazoa set including 978 orthologs revealed a median of 731 (75%) complete BUSCO genes per sample (maximum: 943 complete BUSCO genes [27 fragmented, 8 missing] in Caloria elegans; minimum: 158 complete BUSCO genes [123 fragmented, 697 missing] in Doris kerguelenensis). All quality assessment results of the transcriptomes using BUSCO are summarised in Additional file 2: Table S5.
We additionally evaluated the quality of all transcriptomes separately for each assembly method based on the results of orthology prediction and identified single-copy protein-coding genes with our custom-made ortholog set comprising 1992 orthologs (see “Methods” and Additional file 1). Results were ranked based on the cumulative length of transcripts that were successfully assigned to the reference genes used to identify single-copy orthologs (OGs) in the transcriptomes (see Additional file 2: Table S6). The cumulative lengths ranged from 82,409 bp in Pseudobornella orientalis (the genus was recently resurrected by Korshunova and colleagues [50]) (IDBA-Tran, 458 genes successfully assigned) to 784,043 bp in Caloria elegans (Shannon, 1904 genes successfully assigned). The median was 472,305 bp for the cumulative length and 1577 for the number of successfully assigned genes. The best assembly (according to the largest cumulative length) out of the six available per sample was selected as the representative transcriptome for the respective species and was submitted to NCBI. Transcriptomes accepted by NCBI after removal of possible foreign sequence contamination were used for all further downstream analyses (see Additional file 2: Tables S2 and S7). In order to reduce the amount of missing data in subsequent analyses we excluded three samples for which less than 60% of OGs included in the search had been identified: Pseudobornella orientalis, Dermatobranchus sp., and Tritoniopsis frydis. Furthermore, we only kept OGs for which at least 50% of the investigated 58 species had a positive hit. This resulted in 1767 OGs that we subsequently used to generate multiple sequence alignments (MSAs) on amino acid level. Checking the MSAs for outlier sequences (i.e., putatively misaligned or misassigned amino acid sequences), we identified 897 sequences in 112 MSAs that were subsequently removed. Outliers were found in sequences from all remaining 58 species with the highest number of 30 outlier sequences in Limenandra confusa and the lowest number of eight outlier sequences in Doris kerguelenensis (median: 15 outliers, all details are provided in Additional file 1 and Additional file 2: Table S8).
Alignment masking resulted in masking of alignment sites in 1,519 out of 1,767 genes (Additional file 1) leaving ~ 71% of aligned unmasked sites for subsequent analyses.
Compilation, evaluation and optimization of data sets
Analysing the concatenated supermatrix using MARE v. 1.2-rc [51], AliStat v. 1.6 [52] for information content and data coverage, and SymTest v. 2.0.47 [53] for putative violation of stationary, (time-)reversible and homogenous (SRH) model conditions [54, 55] using the implemented Bowker’s matched pairs test of symmetry [56] led to the results shown in Additional file 3: Figs. S1 and S2.
With respect to the amount and distribution of missing data we initially compiled two data sets as described in the methods section. The data set allowing for the highest amount of missing data, termed “original unreduced data set”, was not further reduced after concatenation and comprised 58 species, 771,739 aligned amino acid positions and 1767 gene partitions. The second data set with a full gene coverage for all 58 species (termed “original reduced data set”) comprised 143,859 aligned amino acid positions and 364 gene partitions. Analysing both data sets for violation of SRH model conditions with SymTest revealed that two species strongly violated the SRH conditions: Calmella cavolini and Doris kerguelenensis (Additional file 3: Fig. S2). Therefore, the sequences belonging to these two species were removed entirely from all MSAs from the original unreduced data set. This newly created data set (termed “unreduced data set”) spanned a superalignment length of 771,706 amino acid positions including 1767 gene partitions.
To reduce the amount of missing data, we compiled an "intermediate” data set featuring only those gene partitions for which at least one representative of the selected taxa was present (see “Methods”, Additional file 1, and Additional file 2: Table S9). This data set (termed “intermediate data set”) spanned a superalignment length of 271,732 amino acid positions and included 667 gene partitions. The third and most strict data set with full gene coverage for each of the 56 species (termed “strict data set”) had a superalignment length of 170,140 amino acid positions and included 446 gene partitions. This data set was further examined in an additional analysis using mixture models (i.e., without partition information). Finally, we compiled a fourth data subset with increased overall information content discarding less informative genes (hereafter called “strict SOS data set”) with all 56 taxa and an even further reduced number of 126,094 aligned sites and 335 gene partitions. Details on data matrix diagnostics are provided in Additional file 1, Additional file 2: Tables S10 and S11, and Additional file 3: Figs. S3–S8.
Phylogenetic relationships of sea slug taxa
All analyses irrespective of the data set indicate maximum statistical support for all major splits and phylogenetic relationships at the family level (Fig. 1, Additional file 3: Figs. S9–S17). Notably, low statistical support was inferred with regard to the phylogenetic position of the genus Embletonia. In the following, we discuss taxa relationships using the names according to the latest changes [57] that are implemented in World Register of Marine Species [23, 58], although we disagree with several assignments as discussed below.
Phylogenetic relationships of major taxa and sea slug families
Out of the seven accepted superfamilies of Cladobranchia, we were able to include members of six superfamilies, whereas a representative of the rare Doridoxoidea, recently confirmed as sister to the Arminoidea [26], was not available to us. We inferred Aeolidida, Aeolidioidea (sensu WoRMS), Proctonotoidea, and Dendronotoidea, with representatives of various families and genera, as being monophyletic. This was fully supported by the quartet scores [48] for Aeolidida, Aeolidioidea, and Proctonotoidea, and strongly supported for Dendronotoidea (see Quartet Sampling scores, splits 1–3 and 8 in Fig. 1 and Additional file 2: Table S12). Arminoidea and Tritonioidea are only represented by one genus each. Thus, former results on monophyly [26] still need to be tested in future genomic analyses.
Our analyses revealed the following ambiguities: Flabellina affinis (type species of Flabellinidae), which is currently regarded as a representative of Fionoidea [17], is inferred as sister taxon to Aeolidioidea with maximal statistical support. Quartet sampling, on the other hand, showed only medium support (split 4 in Fig. 1, Additional file 2: Table S12) with the large majority of quartets (67%) supporting the focal branch (Aeolidioidea + Flabellina affinis), but the strong skew in discordance [quartet differential (QD) = 0] indicates the possibility of a single different evolutionary history supported by all remaining quartets. Although the type species was included in our analyses, additional data of further members of the family should be included in future studies to address a possible taxonomic revision.
The family Flabellinopsidae is currently listed as a member of the Aeolidioidea in WoRMS [58] with Flabellinopsis iodinea (Flabellinopsidae) being sister to all remaining taxa in this large clade, confirming previous results [17, 33, 36, 37]. Again, this position is statistically maximally supported by classical support values (BS, aLRT, aBayes) in our study and quartet sampling scores confirmed this position (split 5, Fig. 1) with strong support (94% of the non-uncertain quartets). Although a strong skew in discordance (QD = 0) indicates the possible presence of an alternative quartet relationship, this result is rather less meaningful due to the low number of discordant trees (5% of the non-uncertain quartets). Thus, our results on Flabellinidae and Flabellinopsidae partly contradict recent analyses and subsequent systematic assignments.
The majority of the family Facelinidae is inferred as being monophyletic, but the facelinid species Noumeaella rubrofasciata groups with Myrrhinidae in published analyses [17, 37] as well as in our study with nearly maximal classical statistical support. However, quartet sampling only shows weak support for this relationship (38% of the non-uncertain quartets; see split 6 in Fig. 1 and Additional file 2: Table S12). In fact, the quartet frequencies show no clear signal since all three quartet topologies are roughly equally supported (27% of the non-uncertain quartets support the second possible quartet topology, 36% support the third; QD = 0.85). Thus, the assignment of this species to Facelinidae [57] or Myrrhinidae (our results) should be reconsidered in future studies. Interestingly, this species did not cluster with several other Noumeaella species in a three-gene analysis of Aeolidida by Schillo and colleagues [59].
Within Aeolidioidea, the families Myrrhinidae—excluding Noumeaella rubrofasciata—and Aeolidiidae form a monophyletic sister group relationship in our study, thus confirming the results of [17, 37], and [32]. This is also consistent with recent morphological and molecular analyses using all acknowledged families of Aeolidida [60].
Fionoidea in the sense of Bouchet and colleagues [57] is polyphyletic, mainly due to the position of Flabellina affinis and Embletonia pulchra, the latter is discussed below. Within Fionoidea, the family Trinchesiidae, represented here with three genera, is monophyletic. Fionidae and Eubranchidae are related to Trinchesiidae. Unidentia is sister to these three families. A recent analysis [60] inferred Unidentiidae as sister taxon to Embletoniidae and a clade Embletoniidae + Unidentiidae as sister to all other clades of Aeolidida. Previous analyses did not reveal a robust placement of Unidentiidae and inferred this clade as sister to members of the Aeolidioidea (Facelinidae, Babakinidae, and Aeolidiidae), which in turn are sister to a larger group of Coryphellidae, Flabellinidae, Flabellinopsidae, and Paracoryphellidae [33]. In a subsequent study [61], Unidentiidae was inferred as sister taxon to a clade of Aeolidiidae, Babakinidae, Coryphellidae, Facelinidae, Flabellinidae, Flabellinopsidae, and Paracoryphellidae. In contrast, our quartet sampling analyses do not unambiguously support the relationship of the Unidentiidae as sister to other Fionoidea with a rather weak quartet support (52% of the non-uncertain quartets). The support for the other two possible quartet topologies is almost similar (QD = 0.99), which indicates that no alternative history is favoured (see split 7 in Fig. 1 and Additional file 2: Table S12). In this context, the results of Goodheart and colleagues [17] are quite noteworthy, because in their study, Unidentiidae is the sister taxon of Embletonia while the clade Embletonia + Unidentiidae is sister to all remaining Fionoidea. Results by Martynov and colleagues [60] suggest a sister group relationship to other aeolidoidean families, which is in part compatible with our results (see below).
The family Samlidae, represented by Luisella babai, is considered as being part of Fionoidea [17, 33]. In our study, however, it is inferred as sister to all remaining Aeolidida in all analyses with maximum classical statistical support as well as very strong quartet support (see split 8 in Fig. 1 and Additional file 2: Table S12): About 98% of the quartets support this relationship without evidence for alternative quartet topologies (QD = 1). Including more taxa representing more families of the Aeolidida is necessary to address and hopefully solve this incongruence of our data in comparison to published results.
With regard to Proctonotoidea, Tritonioidea, and Dendronotoidea, our results confirm the findings published by Goodheart and colleagues [17] with the family Embletoniidae being the only exception, as we will discuss below.
The phylogenetic position of Embletoniidae remains ambiguous
The monogeneric family Embletoniidae, which currently only comprises two recognized species, Embletonia pulchra and E. gracilis, has experienced a rich history since the first description of the genus Embletonia by Alder and Hancock [62], with Pterochilus pulcher Alder and Hancock, 1844 as type species. The authors considered this species as a link between cladobranch aeolids and panpulmonate sacoglossans, two taxa that are not closely related to each other, but show many convergent characters. Pruvot-Fol [43], who named the family for the first time, included members of Trinchesiidae, but assigned the whole clade as a “section” to the dendronotoid family Dotidae. The two recognized members of Embletonia share some characters with members of Fionoidea or Aeolidioidea, e.g., the reduction of the lateral teeth, the absence of rhinophoral sheaths [64], and the presence of a cnidosac at the end of the cerata, a synapomorphy of Aeolidida [18], which additionally favours a position within this clade. However, Martin and colleagues [15] and Goodheart and colleagues [17] have shown that this cnidosac differs to a great extent from the typical aeolidid cnidosac by lacking a proper sac-like structure with musculature around it, as well as a connection to the digestive gland, which is necessary for taking up sequestered cnidocysts. Nevertheless, cnidocysts were found in the structures investigated by Goodheart and colleagues [17]. The authors explain this atypical situation with a loss of characters or as constituting a transitional form in the evolution of the cnidosac. Most recently, Martynov and colleagues [60] provided evidence for paedomorphic processes (e.g., reduction of oral tentacles), which would explain a regressive evolution of Embletoniidae within Aeolidida. This phenomenon is known from various unrelated taxa inhabiting soft-bottom interstitial environments [60]. Embletonia feeds on hydrozoans, which is a typical food source of many aeolidids, but also of some dendronotoids. Unique to this genus are the cerata, which show bi- to quadrifid apices. Highly structured cerata are not known from any aeolidids. However, the digestive gland reaches far into these cerata, a character less pronounced in Proctonotoidea, and only present in one further non-aeolidid group, the genus Hancockia [65].
Embletonia also shares traits that are characteristic for non-aeolidid groups, a reason why Pruvot-Fol [43] included the genus into the family Dotidae (Dendronotoidea). This assignment to Dotidae, as well as grouping with Trinchesiidae was, however, rejected later by Schmekel [44], and the closer relationship to Dendronotoidea was emphasized by Miller and Willan [66]. The primary connecting character is the lack of oral tentacles, which are considered to be a synapomorphy of the traditional dendronotaceans [18, 26]. Furthermore, their oral gland ducts do not open into the oral tube by two separate ducts, but fuse into one common duct, which is described for Proctonotoidea. Proctonotoidea mainly feed on bryozoans, however, a few members also rely on hydrozoan prey, similar to Embletonia.
Few studies addressed the phylogenetic relationship of Embletoniidae using molecular data [17, 31, 60]. Bleidissel [31] focussed on Aeolidida and included Embletonia, because of its putative assignment to this group. Bleidissel’s analyses, based on three genes, inferred a sister group relationship of Embletoniidae with Notaeolidiidae, with the latter again being sister to all remaining Aeolidida. In the only study based on a large data set, Embletonia was inferred, along with Unidentia, within Aeolidida as sister to the remaining Fionoidea, thus excluding a closer relationship with Notaeolidia [17]. Martin and colleagues [15] included characters of the cnidosac into the morphological data matrix published by Wägele and Willan [18], and their analysis resulted in an assignment of Embletonia to Aeolidida (tree not shown in the publication). Likewise, our unpublished morphological analyses render Embletonia as a sister taxon to Aeolidida. However, it is more likely the lack of data that constrains the position than apomorphic characters of high phylogenetic information.
In our analyses comprising the unreduced and strict data set, Embletonia pulchra is inferred as sister to Proctonotoidea with full support in the unreduced data set (100 BS, 100 aLRT, 1 aBayes), but with negligible support in the strict data set (65 BS, 50.1 aLRT, 1 aBayes). When assuming that Embletonia is a sister taxon of the Proctonotoidea (see split 9 in Fig. 1 and position i in Fig. 2) and taking into consideration the studies on the evolution of prey preferences [37] and cnidocyst incorporation [17], we have to conclude that (1) feeding on Hydrozoa is an ancestral trait within Cladobranchia and has not changed in Embletonia (in contrast to Proctonotoidea) and (2) the evolution of the cnidosac might have started in the stemline of the clade Aeolidida/Proctonotoidea/Embletoniidae, with Janolus and Dirona probably representing a condition where the ability to store cnidocysts was lost due to a food switch to bryozoan prey. Both an independent evolution of cnidosacs and cnidocyst storage (in the genus Hancockia) as well as a loss or strong reduction of these complex structures has occurred within Dendronotoidea [17] and Aeolidioidea [34, 60].
In our results from the intermediate data set, the unpartitioned strict data set analysed using a mixture model approach, and the strict SOS data set, Embletonia is a sister group to all remaining Aeolidida, but again with ambiguous support (intermediate data set: 51 BS, 33.1 aLRT, 1 aBayes; strict data set analysed with a mixture model approach (unpartitioned): 88 BS, 85.9 aLRT, 1 aBayes; strict SOS data set: 88 BS, 92.3 aLRT, 1 aBayes). Considering this relationship as a possible evolutionary scenario (Figs. 1, 2, position ii) means that the evolution of the cnidosac would have had to start in the stemline of Embletoniidae/Aeolidida, while the typical character of Dendronotoidea and Tritonioidea, the rhinophoral sheaths, had already been lost and oral tentacles had not yet evolved.
However, both discussed possibilities (see Figs. 1, 2, positions i and ii) are neither supported statistically by classical bootstrap values nor by our quartet analyses: Frequencies of the three possible quartet topologies are almost equal (33% vs. 35% vs. 31% of all non-uncertain quartets, split 9 in Fig. 1 and Additional file 2: Table S12), which indicates a highly complex evolution or rapid radiation.
Morphological analyses of important characters, like the positions of the anus, jaws, and radula also contradict both relationships discussed above with apomorphic features lacking for both hypotheses [60]. Instead, Embletoniidae shows an uniserial radula with central teeth more similar to various aeolidids [60].
Evaluation of alternative positions of Embletoniidae and possible confounding signal
To gain more insights into one of the obtained positions of Embletonia and to investigate alternative positions (see Fig. 2), further analyses were conducted. Note, that we consider the strict data set as most reliable, since it has full gene coverage for all species, following the rationale of Dell’Ampio and colleagues [67] and Misof and colleagues [47], who showed that inferred positions with high statistical support can be simply due to non-phylogenetic signal, e.g., the distribution of missing data. In addition, the strict data set shows the highest completeness scores of all data sets (see Additional file 2: Table S11). However, we also performed some of the analyses on the intermediate data set, the unpartitioned strict data set, and the strict SOS data set as described in the following.
We applied approximately unbiased (AU) tests [45] for alternative positions of Embletonia on the intermediate data set, both strict data sets (partitioned and unpartitioned), and the strict SOS data set. An AU test always takes the complete tree topology into account and not only single splits. Further, it does not test whether or not confounding signal is inherent in the data set, e.g., due to non-randomly distributed data and/or among-lineage heterogeneity violating SRH conditions. We therefore also applied Four-cluster Likelihood Mapping (FcLM) [46] along with a permutation approach on the strict (partitioned) data set. By testing all three possible quartet topologies around Embletonia we evaluated whether or not there was an alternative signal. Further, we checked for any sign of confounding signal (see [47]). To this end, we defined four groups (Additional file 2: Table S13) considering group 4 as an outgroup. We performed separate analyses for two outgroup variations: first, with 19 species including Anthobranchia and Pleurobranchomorpha and second, only with the 15 remaining cladobranch species. We drew quartets on the original data set and on three artificial data sets, from which any existing phylogenetic signal was removed in three different ways (see “Methods”, Additional file 1, and [47]): (a) by destroying the phylogenetic signal but leaving the distribution of missing data and the compositional heterogeneity, which can lead to violating SRH conditions, untouched; (b) by leaving the distribution of missing data untouched but making the data set completely homogenous (no SRH model violation possible), and (c) by randomizing the missing data distribution and making the data set completely homogenous. For all details see Additional file 1.
Interestingly, the results of the phylogenetic trees and the results of the FcLM (Additional file 2: Table S14) and AU tests (Additional file 2: Table S15) were quite contradicting:
(i) Although the best ML trees of the unreduced data set and strict (partitioned) data set suggest that Embletonia is sister to Proctonotoidea and although the AU test was unable to reject this topology (p > 0.05), it received the lowest proportion of quartets (< 20%) in the FcLM approach. Thus, this relationship can only be explained by confounding signal (see original and permutation results in Additional file 2: Table S14).
(ii) Although the best ML trees of the intermediate data set, the unpartitioned strict data set analysed with a mixture model approach, and the strict SOS data set suggest Embletonia to be sister to all remaining Aeolidida, a position that is not rejected by the AU test (p > 0.05), the FcLM results indicate only minimal support for such a relationship: the proportion of supporting quartets, excluding those that can be explained by confounding signal, was only around 3%. This also implies that AU tests, irrespective of whether or not a topology for the data set is significantly rejected, should not only be used to check if the signal is confounding.
(iii) A sister group relationship of Embletonia to a clade Aeolidida + Proctonotoidea, which received strongest support in the FcLM analyses (8–16% of all quartets after excluding the proportion of supporting quartets that can be explained by confounding signal, see Additional file 2: Table S14), was equally rejected by all AU tests.
There is only very little signal that is not confounding (around 3–8%, compare quartets of original with permuted approaches, Additional file 2: Table S14), which would support either Embletonia + Aeolidida (position ii in Fig. 2) or Embletonia as sister to a clade Aeolidida + Proctonotoidea (position iii in Fig. 2). Thus, these results clearly indicate that the position of Embletonia as a sister taxon of Proctonotoidea is most likely not due to informative phylogenetic signal, but mainly due to confounding signal in our data set, and again leaves the phylogenetic position of Embletonia as an enigma.
In order to analyse further possibilities of putative relationships of Embletonia, we tested four alternative positions (iv–vii, see Fig. 2) of Embletonia, which have been discussed in the literature before, by applying the AU test on the (partitioned) strict data set (see Fig. 2 and see below). Note that none of these positions were inferred in any of our ML analyses.
(iv) Since Embletonia exhibits characters, which are shared with the Dendronotoidea, we analysed a putative sister group relationship with this superfamily.
(v) Although an assignment to Tritonioidea is very unlikely, because Embletonia does not share all the characters special for this superfamily, the position of the Arminoidea is variable within the various published phylogenies [17, 68, 69] when including this superfamily. Nevertheless, we tested this possibility.
The last two tests imply a closer relationship of Embletonia with Fionoidea, a relationship that was assumed in former times and reflects the current systematics [57]. Therefore, we tested (vi) a position of Embletonia as sister to Fionoidea and (vii) Embletonia as sister to Unidentiidae and this clade being again sister to the remaining Fionoidea in restricted sense [17, 60].
AU tests significantly rejected (p < 0.05) all four alternative positions (iv–vii, see Fig. 2) of Embletonia (see Additional file 2: Table S15).
Despite our extensive molecular data sets and tests, we still cannot unambiguously assign Embletonia to one of the superfamilies in our tree. Beyond only small putative phylogenetic signal as indicated by our FcLM analyses, which is also in line with the negligible support considering classical statistical support, a reason could be the lack of relevant taxa in our data set that could positively influence the position of Embletoniidae in the cladobranch tree (e.g., Doridomorpha, Curnonidae, Notaeolidiidae). Achieving congruence between morphological and molecular data within the Cladobranchia is a task for future research.