Identification of newly cloned C6ast genes in basal lineage fishes
We conducted a comprehensive search of the genome databases (Additional file 2: Table S1) and identified each relevant gene. We have found the C6ast genes other than HE in the genomes of the basal ray-finned fishes (gar, sturgeon, and reedfish), lobe-finned fish (coelacanth), and cartilaginous fishes, for the first time (Fig. 2). As shown in Fig. 2A, all of the C6ast genes in coelacanth, sturgeon, gar, reedfish and ghost shark retained six Cys residues, and the consensus sequences at the active site of the astacin-superfamily metalloprotease. The exon-intron structure of these genes was determined to be similar to that of teleost and tetrapod C6ast genes. Only teleost HE genes were lacking introns due to mRNA-mediated duplication, retrocopy. All of the newly identified genes have retained the characteristics of those in the primary sequences of the C6ast subfamily genes, which have already been identified.
We constructed a phylogenetic tree using representative astacin-superfamily, including BMP1, ovastacin and C6ast genes (Additional file 1: Fig. S1). Six cysteine conserved astacins were largely separated into C6ast4, C6ast5 (C6ast4/5), hatching enzyme (HE), nephrosin (npsn), pactacin (pac), and patristacin (pastn). HE formed a monophyletic clade with the other C6ast genes, indicating that the HE genes are members of the C6ast subfamily. The C6ast4/5 clade included the gar, sturgeon, and reedfish C6ast4/5, which are newly identified in this study. As shown in our previous study [10], pactacin genes constituted three subclades, and these subclades also comprised a single large clade, including pac and unknown genes, supported by high bootstrap values with unknown-C6ast in Otocephala and Protacanthopterygii. Since the bootstrap value at the node between unknown-C6ast and pac was low, this study did not reveal whether these unknown-C6ast are pac or not. Among the newly discovered genes, the C6ast genes of gar, sturgeon, and coelacanth fell into the outgroup of npsn, pac, and pastn, while the C6ast gene of cartilaginous fishes fell near C6ast4/5.
We have also identified new genes by analyzing their domain structure (Fig. 2B). The domain structures of HEs and other C6ast genes differ; the C-terminal of HE genes in bony fishes except teleosts (basal ray-finned fishes and tetrapods) was noted to possess one or two CUB domains, whereas none of other C6ast subfamily genes possess any C-terminal domains [3, 8]. Consistent with this feature, the novel gar and coelacanth C6ast genes did not have a CUB domain structure at the C-terminal region. Cartilaginous fishes, however, did have C6ast genes with a CUB domain at the C-terminal region.
In bony fishes, therefore, the characteristics of the novel C6ast genes—conservation of consensus sequences, the molecular phylogenetic tree, and the domain structure were consistent with the characteristics of previously identified C6ast genes [3, 9]. Cartilaginous fishes C6asts were found to retain the CUB domain structure, unlike the other C6ast found in other vertebrates. We have also analyzed the expression of newly identified genes using RT-PCR with RNA extracted from various adult tissues of gar, wherein we have found out that these genes are also expressed in several tissues (Additional file 1: Fig. S2). This expression pattern seems to be similar to that of the other C6ast genes expressed in various adult organs, rather than that of the HE genes, which are expressed only in embryos. These results suggested that the origin of C6ast subfamily genes dates back to at least the common ancestor of the jawed vertebrates. These results may also indicate that the common ancestor of the C6ast subfamily genes had one or more CUB domains.
Analysis of genomic synteny
We conducted a genomic synteny analysis of the C6ast-subfamily genes (Fig. 3). In Neotelestei, the genomic synteny of the pastn genes located in the vicinity of the c1qtnf4, ndufs3, and ptpmt1 gene set, named the “Pastn-synteny-set,” was well conserved (red triangles in Fig. 3). Thus, pastn genes could be clearly distinguished from other C6ast-family genes by their genomic location, in addition to the phylogenetic analysis. Lin et al. and Small et al. have considered pastn to be related to internal egg brooding, because of the high copy number of patristacin in Syngnathiformes and ovoviviparous Cyprinodontiformes and the change of its expression levels in the brood pouch of pipefish observed between before and after pregnancy [13, 14]. However, our study found that oviparous Cyprinodontiformes also has a multi-copy gene family of pastn (Additional file 1: Fig. S1). Since these multiple pastn genes in Cyprinodontiformes occur in tandem, these genes seem to have been generated by tandem gene duplication.
The genomic synteny of npsn and pac (except for pac3) was highly conserved among the Neoteleostei (blue triangles in Fig. 3). These “Npsn&Pac-synteny-set” consisted of mtch2, plekho2, fadd, and ano1 in tandem. “Npsn&Pac-synteny-set” and “Pat-synteny-set” were located on the same chromosome, at least in species in which the genome was assembled at the chromosomal level. Exceptionally, pac3 was located on another chromosome, but their genomic synteny was conserved among Euteleostei, which includes Neoteleostei pike and trout (Additional file 1: Fig. S3).
In non-Neoteleostean fishes (arowana, zebrafish, trout, and pike), some C6ast subfamily genes other than HEs were located close to each other (< 30 kbp distant in zebrafish) so they were placed between the Pastn- and Npsn&Pac-synteny-set (Fig. 3). The basal bony fishes (arowana, gar, sturgeon, and coelacanth) were noted to possess a single-copy C6ast gene at this locus. No C6ast genes, except for HE, were found here in reedfish, which belong to the Polypteriformes. Sturgeon exceptionally retained a single C6ast gene on each of the different chromosomes, and the surrounding synteny was also highly conserved (chr. 23 and 26). Since sturgeon underwent lineage-specific whole-genome duplication (WGD; 18), these gene pairs are thought to have arisen by WGD. However, we could not find any traces of C6ast gene pair derived from teleost specific WGD. These results suggested that the multiple copies of C6ast genes in teleostei were derived from a single C6ast gene. Cartilaginous fishes C6ast will be discussed later.
The C6ast4/5 genes were located tandemly between anksf1 and tra2b, an arrangement which was conserved throughout the ray-finned fishes, including novel C6ast4/5 genes in basal ray-finned fishes (gar, sturgeon, and reedfish) (Additional file 1: Fig. S4). However, the C6ast4/5 genes were not found in the genomes of cartilaginous fishes, coelacanth, or tetrapods. Since C6ast4 and C6ast5 could be clearly distinguished in the phylogenetic analysis (supported by 99% bootstrap values), these results indicated that C6ast4 and C6ast5 were already differentiated in the common ancestor of the ray-finned fishes.
We have previously predicted the sequence of the HE genes in the Sarcopterygii from the genomic sequence [3]. The synteny analysis in this study revealed that HE genes were located in the vicinity of syt13, prdm11, tp53l11, tspan18, and cd82 in Sarcopterygii, named the “HE-synteny-set” (Fig. 3). These results indicated that the genomic synteny of the HE genes was reasonably well conserved among sarcopterygian species (green triangle in Fig. 3), unlike the teleostean HE genes, which were translocated on the genome during evolution [4]. While most sarcopterygian HE genes were located close together on the same chromosome, their copy numbers varied from species to species, indicating that the variations of HE genes were caused by lineage-independent gene duplication.
The HE gene was not found in mammals, although the HE-synteny-set was noted to be well conserved. The HE gene was also not found in the platypus, which is an oviparous mammal. These results suggested that the HE gene was lost in the common ancestor of mammals. This loss of the HE gene is thought to be due to single gene loss, as it occurs despite the conservation of the surrounding synteny. On the other side, this syntenic relationship was consistent with that of the novel C6ast genes in cartilaginous fishes.
Expression analysis of duplicated HE genes in frogs
Teleostean HE genes form multi-copy gene families in several species. Sarcopterygii had multiple copies of genes in some species, and many HE genes were often detected, especially in frog (Western clawed frog: Xenopus tropicalis) (Fig. 3). In order to determine whether this large-scale duplication of the HE gene is specific to Western clawed frog, we searched the genomic data of three species of frogs, that is, Western clawed frog, African clawed frog X. laevis, and High Himalaya frog Nanorana parkeri, and one species of limbless amphibia (two-lined caecilian: Rhinatrema bivittatum), and performed a molecular phylogenetic analysis. We found that each species had 5 to 14 duplicated HE genes (Fig. 4A). Further genome synteny analysis revealed that all the HE genes were found in the HE-synteny-set in two-lined caecilian (Additional file 1: Fig. S5A), which is located in the phylogenetic outgroup of amphibians, while, in frogs, some HE genes were also located on different chromosomes (Additional file 1: Fig. S5B). The genus Xenopus is known to have undergone WGD during evolution, and African clawed frog is a heterotetraploid while Western clawed frog is a diploid, making it a model organism for examining the effects of WGD on molecular evolution [19]. However, Western clawed frog (19 genes), which did not experience WGD, retained a higher copy number of HE genes than African clawed frog (15 genes; Additional file 1: Fig. S6). These results seem to be inconsistent with WGD, but African clawed frog retained some traces of gene pairs acquired by doubling (Chr. 4L and 4S in Additional file 1: Fig. S5A), while Western clawed frog increased its copy number by tandem gene duplications.
We also investigated whether all these amphibian HE genes have six Cys residues in the protease domain. Although all vertebrate HE genes reported so far retain six Cys residues, African clawed frog HE1 exceptionally lacked two Cys residues at the N-terminal side of the protease domain [20]. Comparing all amphibian HE genes used in this study, most amphibian HE genes retained all six Cys residues, while it was revealed that some HE genes lost the Cys residues at the N-terminal side (Additional file 1: Fig. S6). These results indicate that some HE genes, including African clawed frog HE1, independently lost Cys residues. In African clawed frog, both HE2, which retains these Cys residues [21], and HE1, which has lost them [20], are known to function as hatching enzymes. In the future, comparing the protein interaction between these HEs and egg membrane may clarify the role of these two Cys resides that characterize the C6ast genes.
We then conducted an expression analysis of the diversified HE genes in frogs (Fig. 4B–H). In African clawed frog, XlHE1 (also known as UVS.2) [20] and XlHE2 [21] have already been identified as HE genes, and their spatial pattern of mRNA expression has already been reported. Our analysis also detected the expression of both XlHE1 and XlHE2 in hatching gland cells localized as inverted-Y shapes in the dorsal head region before the hatching stage (NF stage 36; Fig. 4B). Although we failed to detect the expression of some hatching enzyme-like genes, similar patterns of expression were detected in XlHE5, 6, 9, and 11 (Fig. 4B). In contrast, the expression of XlHE13 was detected in the posterior fin after the hatching stage (NF stage 40; Fig. 4C). We also confirmed the expression localization in this posterior fin from the single-cell transcriptome (scRNAseq) data of Western clawed frog (Fig. 4E–H)—from phylogenetic analysis and synteny analysis, XlHE13 is homologous to XtHE14 (X. tropicalis HE14). After hatching, the embryos are not covered by the egg membrane, which is a substrate for the HEs. Since XlHE13 was expressed in the epidermal tissue layer, rather than in the head where the hatching gland cells are localized, it seems that XlHE13 has a different role from that of the HEs. These results indicate that the duplicate gene derived from the HE genes has acquired new functions. Cell lineage analysis using scRNAseq data from Western clawed frog showed that hatching gland cells and posterior fins follow a similar cell lineage, derived from non-neural ectoderm (Fig. 4I). Gene duplication may have caused subtle changes in the transcriptional regulation, leading to the acquisition of new functions by the frog’s HE gene.