生物
环境DNA
生物多样性
潜在Dirichlet分配
稳健性(进化)
生态学
主题模型
人工智能
计算机科学
生物化学
基因
作者
Guilhem Sommeria‐Klein,Silvia G. Acinas,Éric Coissac,Amaia Iribar,Heidy Schimann,Pierre Taberlet,Jérôme Chave
标识
DOI:10.1111/1755-0998.13109
摘要
High-throughput sequencing of amplicons from environmental DNA samples permits rapid, standardized and comprehensive biodiversity assessments. However, retrieving and interpreting the structure of such data sets requires efficient methods for dimensionality reduction. Latent Dirichlet Allocation (LDA) can be used to decompose environmental DNA samples into overlapping assemblages of co-occurring taxa. It is a flexible model-based method adapted to uneven sample sizes and to large and sparse data sets. Here, we compare LDA performance on abundance and occurrence data, and we quantify the robustness of the LDA decomposition by measuring its stability with respect to the algorithm's initialization. We then apply LDA to a survey of 1,131 soil DNA samples that were collected in a 12-ha plot of primary tropical forest and amplified using standard primers for bacteria, protists, fungi and metazoans. The analysis reveals that bacteria, protists and fungi exhibit a strong spatial structure, which matches the topographical features of the plot, while metazoans do not, confirming that microbial diversity is primarily controlled by environmental variation at the studied scale. We conclude that LDA is a sensitive, robust and computationally efficient method to detect and interpret the structure of large DNA-based biodiversity data sets. We finally discuss the possible future applications of this approach for the study of biodiversity.
科研通智能强力驱动
Strongly Powered by AbleSci AI