A robust unrooted phylogeny of the Archaea
We retrieved homologues of the 81 protein families from the study by Raymann et al. (2015) from 435 archaeal proteomes representative of current archaeal diversity. Nine protein families presented complex evolutionary patterns combining multiple horizontal gene transfers (HGT), gene duplications, and gene losses, leading to the non-monophyly of known archaeal orders, and were therefore removed from further analysis. Finally, to avoid taxonomic bias, we selected 218 out of the 435 archaeal genomes by keeping at least three representatives for each archaeal order when possible and only one strain per species. The retaining 72 protein families were concatenated into a supermatrix gathering 16,006 amino acid positions (A supermatrix). The BI and ML analyses of the A supermatrix provided consistent and robust unrooted trees of the archaeal domain (Fig. 1, Additional file 1: Figs. S1 and S2). Notably, they recover a number of clades for which we proposed placeholder names [6, 26]. These names are not meant to identify taxonomic ranks but will help scientific communication about these clades. Remarkably, neither of the two trees recovered the monophyly of Euryarchaeota, but instead showed a clear distinction between Cluster I and Cluster II archaea. In fact, Methanomada (i.e., Methanopyri, Methanobacteria, and Methanococci) and Acherontia (Thermococci, Theionarchaea, and Methanofastidosa) are not monophyletic with other euryarchaeota (i.e., Diaforarchaea, Archaeoglobi, Methanonatronarchaeia, and Stenosarchaea) and branch with Cluster I lineages (Fig. 1).
A detailed comparison of supports for these clades and their relationships by the BI and ML analyses and the different supermatrices is presented in Fig. 2A. More precisely, all major lineages (orders, classes, superclasses, phyla, and superphyla) were strongly supported as well as their relationships (most posterior probabilities (PP) = 1 and bootstrap values (BV) > 90%) (green dots in Fig. 2A) in agreement with some recent reports [6, 9, 18]. Regarding Cluster I, both trees support the early emergence of Korarchaeota within the TACK superphylum, the clustering of Crenarchaeota with Verstraetarchaeota, the sisterhood of Bathyarchaeota with Thaumarchaeota and Aigarchaeota, and the branching of Stygia in the stem leading to the TACK and Asgard. Concerning the DPANN, both trees inferred a specific relationship with the Altiarchaea, as proposed elsewhere [33]. Within Cluster II, both trees strongly support the close relationship between Halobacteria and Methanomicrobiales, while Methanocellales and Methanosarcinales formed two sister-lineages in agreement with a recent report [27]. Furthermore, Methanonatronarchaeia branched in the stem of Methanotecta in agreement with two recent studies showing that Halobacteria and Methanonatronarchaeia do not represent two sister-lineages [29, 30], as initially proposed [28]. Within DPANN, we have recovered placements consistent with previous works: (i) a basal branching of the Diapherotrites, (ii) the sistership of the Aenigmarchaeota and the Nanohaloarchaeota, and (iii) the grouping of the Pacearchaeota, the Woesearchaeota, and the Nanoarchaeota [31, 33]. The only areas of the archaeal tree that remain unresolved are the internal branching within Diaforarchaea, the base of Cluster I, and the relationships within the Asgard (light green and orange dots in Fig. 2A).
We investigated whether the relationships within Archaea are affected by tree-reconstruction artefacts by applying the desaturation method to the A supermatrix. The progressive removal of the fastest evolving sites did not reveal major changes in topology or branch support, even when excluding the DPANN (Fig. 2B, C and Additional file 1: Figs. S3 and S4). In particular, the monophyly of Euryarchaeota was never observed in these analyses due to the branching of Altiarchaea and/or DPANN between Cluster I and Cluster II (Fig. 2B and Additional file 1: Fig. S3) or within Cluster I, as the sister-lineage of Methanomada (purple stars, Fig. 2C and Additional file 1: Fig. S4).
Branching of Eucarya with respect to Archaea
To investigate further the branching of Eucarya with respect to Archaea, we analyzed a supermatrix containing the 64 protein families shared by Archaea and Eucarya (AE supermatrix, 13,468 amino acid positions). We obtained consistent and well-resolved unrooted phylogenies (Fig. 3, and Additional file 1: Figs. S5 and S6). The internal archaeal topology previously inferred by the A supermatrix was largely supported by the AE supermatrix (green dots in Fig. 2A), indicating that adding eukaryotes neither introduces tree reconstruction artifacts nor does it blur the phylogenetic signal contained in the data. In both trees the Eucarya displayed a very long stem and branched as sister of the Asgard superphylum (PP = 1, BV = 86%, Fig. 3, and Additional file 1: Figs. S5 and S6). To investigate this relationship in more detail and avoid potential artifacts due to the presence of very distantly related lineages or very long branches such as the ones leading to Halobacteria or DPANN, we subsampled the original dataset into one including only eukaryotes and their closest archaeal relatives. More precisely, the TACK, the Asgard, the Eucarya, and the Stygia represented the ingroup, while the Acherontia was used of outgroup. The resulting tree is fully resolved and consistently supports the Asgard as the sister-lineage of Eucarya (PP = 1, Fig. 4). A sister-grouping of Asgard and Eucarya was also observed when applying two different amino acid recoding schemes (here Dayhoff4 and Dayhoff6, Additional file 1: Figs. S7 and S8). These results suggest that the sistership of Eucarya and Asgard is robust and confirm their close relationships with TACK and Stygia within Cluster I.
Deep divergences of the Archaea inferred by using Bacteria as outgroup
We next sought to investigate the deepest divergences within Archaea, notably by resolving the branching of new lineages (Asgard, Verstraetearchaeota, Stygia, Methanonatronarchaeia, and Altiarchaea) that were not analyzed in Raymann et al. (2015). We therefore proceeded to root the archaeal tree by using a bacterial outgroup using the 41 protein families shared by Archaea and Bacteria (AB supermatrix, 7,853 amino acid positions). The inferred ML and BI trees were overall robust and consistent (Fig. 5, Additional file 1: Figs. S9 and S10). In particular, the monophyly of major bacterial groups such as the Chlorobi / Bacteroidetes, Spirochaetes, Chloroflexi, Deinococcus-Thermus, Cyanobacteria, Actinobacteria, Firmicutes, and Thermotogae, including Proteobacteria and PVC that are particularly difficult to infer, were robustly supported (most PP = 1, BV > 95%). Furthermore, the BI tree recovered the deep split between Terrabacteria (Deinococcus-Thermus, Chloroflexi, Cyanobacteria, Actinobacteria, and Firmicutes) and Thermotogae on the one hand, and other major bacterial lineages on the other hand (PP = 0.93, Fig. 5 and Additional file 1: Fig. S9), consistently with previous reports [16, 34, 35]. The ML tree is overall consistent with the BI tree, albeit relationships among bacterial lineages are less supported (Additional file 1: Fig. S10). In particular, the position of Deinococcus-Thermus was unresolved in the ML tree.
Most interestingly, the two trees are again incompatible with the monophyly of Euryarchaeota. In fact, both trees supported monophyly of Cluster I (PP = 0.91 and BV = 92%) and Cluster II (PP = 1 and BV = 100%). More precisely, the Methanomada represent the deepest branches of the Cluster I (Figs. 2A and 5, and Additional file 1: Figs. S9 and S10), whereas Acherontia appear to be the sister-lineage of the large clade encompassing the TACK, the Asgard and the Stygia (BI = 1 and BV = 100%, Fig. 5). The speciation between Cluster I and Cluster II occurred deeply in the BI and ML trees of Archaea, meaning these two lineages are ancient.
It is worth to notice that the ML and the BI trees are inconsistent regarding the phylogenetic position of DPANN. According to the BI tree, the root is located in-between Cluster I and Cluster II, while the DPANN and Altiarchaea branch within Cluster I, albeit with a non-significant support (PP = 0.8, Fig. 5). In contrast, according to the ML tree, the root of Archaea is located within the large clade encompassing DPANN and Altiarchaea, meaning that this group is paraphyletic (BV = 99%, Additional file 1: Fig. S10). More precisely, the clade grouping Aenigmarchaeota, Nanohaloarchaeota, Nanoarchaeota, Woesearchaeota, and Pacearchaeota represent the very first diverging lineage, while Diapherotrites and Altiarchaea emerge later. Even if the relationships among DPANN are not fully resolved, the ML tree suggests that the DPANN superphylum diverged before the speciation between Clusters I and II (BV = 100%). Because BI and trees differ in the placement of the root with respect to DPANN and because the inclusion of the bacterial outgroup impacts the position of DPANN, we wonder whether the DPANN lineages impact the placement of the root itself. To address this question, we removed the DPANN and Altiarchaea lineages from the AB supermatrix. The BI tree inferred with this new AB dataset is fully consistent with previous ML and BI trees and supports Cluster I and Cluster II (Additional file 1: Figs. S9-S11). More precisely, the bacterial outgroup branches in-between Cluster I and Cluster II (PP = 0.89), and more importantly, the grouping Acherontia with TACK, Asgard, and Stygia taxa is again strongly supported (PP = 1, Additional file 1: Fig. S11).
Based on these rooted archaeal trees, the placement of other recently described lineages can be robustly inferred. For example, the Verstraetearchaeota, the Asgard, and the Stygia reliably branch with Cluster I lineages (all PP = 1 and BV > 95%), while the Methanonatronarchaeia belong to Cluster II (PP = 1 and BV = 100%), in agreement with recent reports [28, 29].
Altogether, these results indicate that the root of Archaea can be confidently excluded from within Cluster I or II, although the precise phylogenetic position of DPANN lineages and their relationship with Altiarchaea remain to be assessed by future analyses including additional sampling from these lineages.