摘要
Report the first chromosome level genome of myriapod Scolopendra mutilans. Reveal gene expansions for importance to adapt. Annotate nine Hox cluster genes in this genome. Arthropoda, a diverse phylum encompassing Myriapoda, Crustacea, Chelicerata, and Insecta, constitutes more than 80% of the total number of documented species (Qu et al. 2020). Within these groups, centipedes (Chilopoda) hold a unique position as one of the oldest terrestrial venomous groups, with a fossil history spanning 430 million years. This lineage encompasses five orders and includes more than 3500 species (Undheim & King 2011). Centipedes, characterized by having a pair of legs per segment, have drawn significant attention in the biomedical field (Undheim et al. 2015) due to their production of venoms in specialized venom glands that have modified the first pair of trunk legs (Giribet & Edgecombe 2019). Centipedes are carnivorous, soil-dwelling invertebrates with a predatory nature, playing a pivotal role as indicators of soil biodiversity and serving as vital tools for evaluating ecosystem health (Dugon 2017; Halpin et al. 2021). Despite their ecological value, the evolutionary ecology of myriapods has received relatively less attention compared to other arthropods. However, recent advances have been made, with nonchromosomal-level genomes of five millipedes and four centipedes having been published, shedding more light on their evolutionary history (Chipman et al. 2014; Kenny et al. 2015; Qu et al. 2020; So et al. 2022). The genus Scolopendra, comprising a diverse range of scolopendromorph centipedes, inhabits various regions spanning from tropical to warmer temperate areas worldwide (Lewis 1981). This genus comprises more than 90 documented species (Bonato et al. 2016), exhibiting a wide range of coloration, segment variations, and sizes. Scolopendra species primarily prey on insects and other invertebrates, playing a crucial role in soil ecosystems (Siriwut et al. 2016). Notably, female Scolopendra exhibit parental care, an intriguing behavior within the genus (Cabanillas et al. 2019). The venom of Scolopendra is rich in peptides, contributing to their effectiveness as predators and their ecological success (Yoo et al. 2014; Hakim et al. 2015). Through peptidomics and cDNA library analysis, 79 distinct peptide toxins were identified in Scolopendra subspinipes mutilans (Rong et al. 2015), thereby contributing to the enrichment of centipede venom diversity (Yang et al. 2012). Among these toxins, 29 were identified as neurotoxins that target Na+, K+, and Ca2+ channels (Yang et al. 2012; Rong et al. 2015). These peptide toxins were systematically classified into 17 families based on their sequences and cysteine arrangement (Luo et al. 2022). In a comparative proteomic analysis of different body tissues, a total of 1639 proteins were identified in S. s. mutilans employing proteotranscriptomics. This included 1204 unique proteins in the torso tissue and 165 unique proteins in the venom gland, with 100 unique proteins overlapping between the two tissues (Zhao et al. 2018). However, due to the lack of a genome for this species, studies on the biosynthesis of its venom have been limited. Prior to assembling the genome of Scolopendra mutilans using sequencing data, estimates of its characteristics were conducted. The genome size of S. mutilans was estimated to be approximately 1036.6 Mb, based on 196.2 Gb of sequencing data at 189.27 fold coverage. Heterozygosity was 0.57% and GC content at 30.86%, using k-mer (k = 19) methods (Fig. S1 and Table S1-1, Supporting Information). Additionally, flow cytometry estimation showed the genome size of S. mutilans from 1018.75 to 1023.57 Mb (Fig. S2 and Table S1-2, Supporting Information). The subsequent genome assembly produced a genome of 903.91 Mb, containing 352 contigs with a high contig N50 of 24.13 Mb and longest contig of 69.05 Mb, derived from 163.80 Gb of clean bases at 181.21 fold coverage (Tables S1-4,S2 and Fig. S3, Supporting Information). Notably, the contig N50 achieved in this assembly surpassed that of other myriapods (Chipman et al. 2014; Kenny et al. 2015; Qu et al. 2020; So et al. 2022). By combining data from the Illumina, ONT, and Hi-C technologies, we obtained a total of 170.26 Gb of clean data with a 188.36-fold coverage (Tables S1-3,S1-4, Supporting Information). This resulted in 903.93 Mb of genome sequences anchored into 14 chromosomes, with an N50 of 62.8 Mb (Fig. 1a,b). This outcome aligns with previous chromosome karyotype studies, which reported a chromosome count of 2n = 28 (Ogawa 1953). The anchored sequences accounted for 95.64% of the total contig length (Fig. 1a; Table S1-4, Supporting Information). A heatmap generated from Hi-C data supported the division of all data bins into 14 chromosomes (Fig. 1b). The estimated genome size of S. mutilans falls within the range typical for centipedes, which ranges from 176.2 Mb to 3.2 Gb (So et al. 2022). According to the Benchmarking Universal Single-Copy Orthologs (BUSCO) assessment, an impressive 97.0% of genes were identified as complete, comprising 96.3% single-copies genes and 0.7% duplicated genes, while 2.1% were classified as fragmented genes in S. mutilans. Remarkably, this achievement represents the highest value among all sequenced genomes of Myriapoda species (Table S2, Supporting Information) (So et al. 2022). The combination of a lengthy contig N50 and the high BUSCO score suggests that the current assembly represents a high-quality genome for S. mutilans. Centromere location and length were accurately predicted (Fig. S4a and Table S3, Supporting Information). The analysis revealed the presence of nine acrocentric chromosomes and four submetacentric chromosomes (Table S3, Supporting Information), consistent with the previous findings (Ogawa 1953). Remarkably, these results corroborate the earlier karyotype study, indicating that the chromosomes in S. mutilans exhibit a straightforward rod-shaped structure. Based on a combined strategy using ab initio, homology-based, and RNA sequence-based prediction, we successfully identified 20 153 protein-coding genes in S. mutilans (Table S4, Supporting Information). These genes exhibited an average gene length of 18423.87 bp, with an average of 7.89 exons per gene (Tables S4-1,S5, Supporting Information). Among the 20 153 predicted genes, 17 823 genes (88.44%) were annotated in the Kyoto Encyclopedia of Genes and Genomes (KEGG, 23.95%), KEGG pathway (17.60%), NR (73.91%), Uniport (75.37%), gene ontology (GO, 56.88%), Pfam (61.40%), and Interproscan (84.07%) databases, respectively (Table S6 and Fig. S4b, Supporting Information). As previously reported, transposable elements (TEs) significantly contribute to the genome size of arthropods (So et al. 2022). In the case of S. mutilans, 424.98 Mb, constituting 47.01% of the genome, consists of repeat sequences (Table S7-1, Supporting Information). The most abundant categories of TEs include DNA transposons (14.25%), long interspersed nuclear elements (5.38%), and long terminal repeats (5.36%) (Table S7-1 and Fig. S4c, Supporting Information). It is noteworthy that for myriapods, the length of TEs exhibited a positive correlation with genome size, except for short interspersed elements (SINEs) (Fig. S3, Supporting Information). In addition to protein-coding genes, non-coding genes were also identified, which include 83 microRNAs (miRNAs), 955 rRNAs, 5054 tRNAs, and 1207 snRNAs (Table S7-2, Supporting Information). A comparative genomic analysis was carried out using genomic data from 13 sequenced arthropod species and S. mutilans. With Naja naja as an outgroup, a genome-wide phylogenetic analysis was conducted on 293 single-copy genes from 14 arthropods (Fig. 1c). The results revealed that S. mutilans forms a sister group with Rhysida immarginata within Scolopendridae (Scolopendromorpha) and their divergence occurred at approximately 204 Mya. Surprisingly, S. maritima (Geophilomorpha) was found to be a sister group with Lithobius niger (Lithobiomorpha), with their divergence occurring around 300 Mya. This topological structure is consistent with a previous study based on the mitochondrial genomes, which indicated that Lithobiomorpha and Geophilomorpha are sister groups (Hu et al. 2020; Ding et al. 2022; Yang et al. 2023). This finding contrasts with the traditional view that Geophilomorpha and Scolopendromorpha are sister groups based on morphological similarities (Fernández et al. 2016). Within Chilopoda, a divergence between the Notostigmophora clade and the Pleurostigmophora clade occurred approximately 423 Mya, closely aligning with the fossil record (Undheim & King 2011). Gene family clustering analysis of the genome of S. mutilans revealed 1177 single-copy genes, 1727 multiple-copy genes, 4139 unique genes, and 13 110 other genes (Fig. 1d). S. mutilans shares 7916 (64.97%) gene families with Thereuonema tuberculata, 7228 (59.32%) gene families with Anaulaciulus tonginus, and 6791 (55.73%) gene families with Oedothorax gibbosus, respectively (Fig. 1e). Moreover, 5962 gene families are common to all four arthropods (Fig. 1e). Interestingly, the number of shared gene families exhibits a positive correlation with their phylogenetic relationships. The genome synteny analysis reveals a high level of synteny relationship between T. tuberculata and S. mutilans (Fig. 1f), supporting previous work that demonstrates significant genome collinearity among myriapod (So et al. 2022). The gene family evolutionary analysis unveiled that 255 gene families (comprising 1466 genes) had expanded, accounting for only 2.09% (255 out of 12 185) of all families, while 1388 gene families (1553 genes) had contracted in the S. mutilans genome compared to its most recent common ancestor (Fig. 1c; Table S4, Supporting Information). The expanded gene families of S. mutilans exhibited significant enrichments in 23 GO terms and 14 KEGG pathways (Tables S8,S9, Supporting Information). For the unique gene families of S. mutilans, these enrichments predominantly involved catalytic activity (including GTPase activity, hydrolase activity, and O-acyltransferase activity), transporter activity (as channel activity, hexosyltransferase activity, and transmembrane transporter activity), binding (consisting ATP binding, DNA binding, metal ion binding, and protein dimerization activity) (Fig. S5a and Table S10, Supporting Information), and Biosynthesis of unsaturated fatty acids, Fatty acid elongation, Fatty acid metabolism, and Dorso-ventral axis formation (Table S11, Supporting Information). Additionally, these expanded gene families displayed enrichments in pathways such as Steroid biosynthesis, Toll and Imd signaling pathway, Taurine and hypotaurine metabolism, Ubiquitin medicated proteolysis, and Dorso-ventral axis formation (Fig. S5b and Table S8, Supporting Information). In terms of biological processes, genes from significantly expanded gene families were enriched in the metabolism process, such as lipid catabolic process, glutathione catabolic process, and DNA integration (Fig. S5b and Table S9, Supporting Information). Considering the enrichment of orthologs in several of these pathways related to lipid metabolism, immunity, and signal transduction, it is plausible that these genetic adaptations may have contributed to sensory and locomotory adaptations, facilitating the ecological shift to predation in S. mutilans, similar to other centipedes (So et al. 2022). The results are similar to the findings in Mesobuthus martensii (Cao et al. 2013) and Whitmania pigra (Tong et al. 2022). In the branch model, S. mutilans was assigned as the foreground branch, with other myriapods serving as the background branches. A significant difference was observed in the evolution rate of 38 genes between S. mutilans and other myriapods. These genes demonstrated significant enrichments in four GO terms (Table S12, Supporting Information), specifically carbohydrate metabolic process, nucleotide-excision repair, regulation of transcription, DNA-templated, and cell periphery. In animals, the ancestral Hox homologs have been confirmed in arthropods, including labial (lab), proboscipedia (pb), deformed (Dfd), sex combs reduced (Scr), fushi tarazu (ftz), antennapedia (Antp), ultrabithorax (Ubx), abdominal-A (Abd-A), abdominal-B (Abd-B), and zerknüllt (zen) (Akam 1998). Within S. mutilans, we identified nine Hox cluster proteins, including the antennapedia complex (lab, pb, Dfd, Scr, ftz, and Antp) (Calhoun et al. 2002) and the bithorax complex (Ubx, Abd-A, and Abd-B) (Maeda & Karch 2006). Notably, all the Hox cluster genes are exclusively present on a single chromosome and exhibit a conserved order in centipedes (Figs S6,S7 and Table S13, Supporting Information). These structures play a key role in maintaining the relationship between phenotype and genomes (Holland 2013; So et al. 2022). Similar to L. niger, R. immarginata, and T. tuberculata, S. mutilans lacks Hox3, but the Hox3 family has dynamic changes in different myriapod lineages (So et al. 2022). This work was jointly supported by the Hubei Provincial Natural Science Foundation and Innovative Development of Chinese Medicine—of China (2023AFD144) and the Funds for Distinguished Young Scholars of Hubei University of Chinese Medicine (2022ZZXJ1003) to L.Z. The authors declare no conflict of interest. Table S1-1 The result of k-mer analysis Table S1-2 Estimated genome size by flow cytometry Table S1-3 Sequencing and quality filtering statistics Table S1-4 Scolopendra subspinipes mutilans genome assembly statistics and assembly completeness Table S4-1 Basic statistical results of gene prediction Table S4-2 Statistics results of gene fiamly cluster Table S7-1 Statistical results for transposable elements Table S7-2 Non-coding RNA content of the S. mutilans genome Figure S1 GenomeScope k-mer profile plot for the genome of S. mutilans, based on 19-mers in Illumina reads. Figure S2 Flow cytometry analysis for DNA contents of S. mutilans. Figure S3 The relationship between genome size and TE length in myriapod. Figure S4 Genome phylogeny and genome evolution. Figure S5 Enrichment of unique (A) and expanded (B) gene families. Figure S6 Synteny relationship. Figure S7 The phylogenetic trees for the Hox cluster. Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.