乳腺癌
比例危险模型
计算机科学
计算生物学
自编码
人工智能
深度学习
机器学习
生物信息学
癌症
医学
生物
内科学
作者
Jong Ho Jhee,Min‐Young Song,Byung Gon Kim,Hyunjung Shin,Soo‐Youn Lee
标识
DOI:10.1109/bigcomp57234.2023.00033
摘要
Various deep learning approaches using big multiomics data of cancer patients are being applied to identify biomarkers of diverse cancer types these days. Because multiomics data generally have a character with high dimensions compared with relatively few patient samples, this imbalance is a recognized bottleneck to apply integrated characteristics of multiomics in cancer research. Among the dimensionality reduction techniques, deep learning-based approaches, such as autoencoder, are known to have strength in handling high dimensional data with few samples. However, the black box model makes it difficult to explain which genes are essential. In this study, we develop a transformer-based representative Central tendency Gene score considering Central Dogma process information (CGCD) model to predict optimized potential anti-breast cancer therapeutic target genes. It is based on a unified representation applying the compressed features learned through Transformer using multiomics data of 105 breast cancer patients from The Cancer Genome Atlas (TCGA). Unlike other autoencoder-based models, CGCD can derive gene scores from the self-attention mechanism in the transformer model. The significant encoding genes were selected by computing the p-value per each gene based on the scores for all the patients. To verify CGCD score ability for predicting target genes, we estimated hazard ratio and p-value per gene by conducting survival analysis using Cox proportional hazard model and calculated area under the curve (AUC) with CGCD score and the p-value per patient, and performed biological functional analysis including Gene Set Enrichment Analysis (GSEA). As the CGCD score became higher, the results showed a pronounced increasing trend in the retention rate of breast cancer marker genes and pathways. From this point of view, the CGCD score that reflects harmony of multi-omics data in a gene is considered suitable as a criterion for predicting cancer diagnostic markers.
科研通智能强力驱动
Strongly Powered by AbleSci AI