亲爱的研友该休息了!由于当前在线用户较少,发布求助请尽量完整地填写文献信息,科研通机器人24小时在线,伴您度过漫漫科研夜!身体可是革命的本钱,早点休息,好梦!

Intricacies of single-cell multi-omics data integration

生物 模式 抽象 计算生物学 细胞 基本事实 变化(天文学) 生物学数据 数据集成 计算机科学 数据科学 生物信息学 人工智能 数据挖掘 遗传学 物理 哲学 社会学 认识论 天体物理学 社会科学
作者
Pia Rautenstrauch,Anna Hendrika Cornelia Vlot,Sepideh Saran,Uwe Ohler
出处
期刊:Trends in Genetics [Elsevier BV]
卷期号:38 (2): 128-139 被引量:33
标识
DOI:10.1016/j.tig.2021.08.012
摘要

Identifying cell-to-cell correspondences between unpaired datasets from different single cell protocols promises to provide a more comprehensive view of cellular states. Integration of unpaired data from multiple modalities is more complicated than single-omics integration due to a lack of feature correspondence across modalities and ground truth information about biological differences between modalities. Retention of biological variation during multi-omic data integration has been insufficiently addressed to date, but is essential to leverage complementary information from different omics layers. Ground truth data can now be provided by new paired multi-omics assays. This will inform robust associations between features of different modalities and reveal modality-specific biological patterns that may also help to improve methods for multimodal integration of unpaired data. A wealth of single-cell protocols makes it possible to characterize different molecular layers at unprecedented resolution. Integrating the resulting multimodal single-cell data to find cell-to-cell correspondences remains a challenge. We argue that data integration needs to happen at a meaningful biological level of abstraction and that it is necessary to consider the inherent discrepancies between modalities to strike a balance between biological discovery and noise removal. A survey of current methods reveals that a distinction between technical and biological origins of presumed unwanted variation between datasets is not yet commonly considered. The increasing availability of paired multimodal data will aid the development of improved methods by providing a ground truth on cell-to-cell matches. A wealth of single-cell protocols makes it possible to characterize different molecular layers at unprecedented resolution. Integrating the resulting multimodal single-cell data to find cell-to-cell correspondences remains a challenge. We argue that data integration needs to happen at a meaningful biological level of abstraction and that it is necessary to consider the inherent discrepancies between modalities to strike a balance between biological discovery and noise removal. A survey of current methods reveals that a distinction between technical and biological origins of presumed unwanted variation between datasets is not yet commonly considered. The increasing availability of paired multimodal data will aid the development of improved methods by providing a ground truth on cell-to-cell matches. a low-dimensional representation of the high-dimensional data. a quantifiable characteristic of a cell. For example, in the context of scRNA-seq, the expression level of each gene is a feature. For scATAC-seq, the features are the accessibilities of defined genomic regions. features from two or more datasets refer to the same entities (e.g., genes). a matrix that aggregates quantitative genome-level data (e.g., chromatin accessibility or DNA methylation data) to the gene level. process in which the information encoded in genes is transformed into functional gene products, such as proteins or functional RNA molecules. In the context of single-cell analysis it often refers to steady-state mRNA levels in the cell measured by scRNA-seq (i.e., an intermediate step of the gene expression process). a parameter that specifies a part of the method setting and often needs to be selected by the user. combining data from different sources into a unified view. transferring cell or cluster labels to a different dataset based on similarities to the source dataset. a topological space that preserves the neighborhood structure of a dataset. A manifold can be used to represent high-dimensional biological data in a lower-dimensional space that is easier to analyze while maintaining the original dataset information. a mode in which the cell exists (i.e., gene expression space or chromatin accessibility space). The term modality is often used to refer to different data types that assay these very modes. a specific aspect of the cell’s molecular biology that is represented by a set of biomolecules or their state. Examples of molecular layers include the chromatin state, gene expression levels, and protein levels. involving information from two or more modalities. Also see modality. data where different modalities are measured in the same single cell. a linear dimensionality reduction technique that reduces the number of features of a dataset while preserving most of the variation in the original dataset. a training strategy for machine learning where at least a small amount of labeled data is required. a function that provides a similarity measure between vectors (i.e., gene expression vectors for two cells). methods that profile the entire gene expression profile of individual cells. methods for the genome-wide profiling of open chromatin regions in individual cells. a statistic that describes feature importance for a specific sample (i.e., how important a particular open chromatin region is for a specific cell). a nonlinear dimensionality reduction technique that reduces the number of features of a dataset while preserving the similarity between data points from the original dataset. involving information from a single modality. Also see modality. data where different modalities are measured in distinct cells. a training strategy for machine learning that only uses unlabeled data.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
PDF的下载单位、IP信息已删除 (2025-6-4)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
王杉杉完成签到 ,获得积分10
刚刚
89完成签到,获得积分10
1秒前
追寻绮玉完成签到,获得积分10
3秒前
SiboN发布了新的文献求助10
3秒前
yuan完成签到,获得积分10
8秒前
wangermazi完成签到,获得积分0
19秒前
田様应助玖生采纳,获得10
20秒前
23秒前
梁33完成签到,获得积分10
28秒前
cccttt发布了新的文献求助10
28秒前
激动的55完成签到 ,获得积分10
34秒前
米其林完成签到,获得积分10
39秒前
44秒前
桐桐应助lililili采纳,获得10
45秒前
李桂芳发布了新的文献求助10
49秒前
56秒前
zly完成签到 ,获得积分10
59秒前
lililili发布了新的文献求助10
1分钟前
CipherSage应助阿迪采纳,获得10
1分钟前
1分钟前
vicky完成签到 ,获得积分10
1分钟前
redstone完成签到,获得积分10
1分钟前
阿迪发布了新的文献求助10
1分钟前
研友_VZG7GZ应助xuj1245采纳,获得10
1分钟前
qiuyu发布了新的文献求助10
1分钟前
酷波er应助SiboN采纳,获得10
1分钟前
1分钟前
米其林发布了新的文献求助20
1分钟前
阿迪完成签到,获得积分20
1分钟前
小蛇玩完成签到,获得积分10
1分钟前
李桂芳发布了新的文献求助10
1分钟前
无语的诗柳完成签到 ,获得积分10
1分钟前
科研通AI5应助科研通管家采纳,获得10
1分钟前
科研通AI2S应助科研通管家采纳,获得10
1分钟前
田様应助科研通管家采纳,获得10
1分钟前
1分钟前
称心妙竹应助科研通管家采纳,获得20
1分钟前
1分钟前
韧战发布了新的文献求助10
1分钟前
顾矜应助LL采纳,获得10
1分钟前
高分求助中
Pipeline and riser loss of containment 2001 - 2020 (PARLOC 2020) 1000
Comparing natural with chemical additive production 500
Machine Learning in Chemistry 500
Phylogenetic study of the order Polydesmida (Myriapoda: Diplopoda) 500
A Manual for the Identification of Plant Seeds and Fruits : Second revised edition 500
The Social Work Ethics Casebook: Cases and Commentary (revised 2nd ed.) 400
Refractory Castable Engineering 400
热门求助领域 (近24小时)
化学 医学 生物 材料科学 工程类 有机化学 内科学 生物化学 物理 计算机科学 纳米技术 遗传学 基因 复合材料 化学工程 物理化学 病理 催化作用 免疫学 量子力学
热门帖子
关注 科研通微信公众号,转发送积分 5198303
求助须知:如何正确求助?哪些是违规求助? 4379340
关于积分的说明 13637951
捐赠科研通 4235367
什么是DOI,文献DOI怎么找? 2323346
邀请新用户注册赠送积分活动 1321439
关于科研通互助平台的介绍 1272342