作者
Yu‐Hsuan Tseng,Chia-Hsin Wu,Chia-Yu Sung,Huang Kevin Chih Yang,Mong‐Hsun Tsai,Liang‐Chuan Lai,Tzu‐Pin Lu,K. S. Clifford Chao,Eric Y. Chuang,Chien‐Yueh Lee
摘要
Abstract Tumor neoantigens are highly immunogenic. Two types of neoantigens have been reported for their ability to be shared among patients. One is a mutated tumor-specific antigen (mTSA) derived from somatic mutations in tumor cells. The other is an aberrantly expressed TSA (aeTSA), influenced by epigenetic changes or abnormal RNA splicing. Tumor neoantigens can bind with the major histocompatibility complex (MHC) and be recognized by the T cell receptor (TCR). Consequently, they can trigger the immune system to attack cancer cells. Until now, some neoantigen databases have been available, they mostly focus on the Western population and primarily contain peptides derived from mTSAs. In contrast, our database provides peptides not only from mTSA but also from aeTSA with a strong emphasis on the Taiwanese population. Initially, we obtained public sequencing raw data from the NCBI database and employed a neoantigen pipeline for analysis, identifying potential neoantigens. The data collection criteria included samples from Taiwanese individuals, with paired DNA-seq or RNA-seq from both normal and tumor tissues of the same patients. RNA sequencing datasets were utilized to identify aeTSAs and mTSAs, whereas DNA sequencing datasets served for mTSA identification. Additionally, human leukocyte antigen (HLA) genotyping was performed for every sample. The identified peptides were further compared to previously validated data available on IEDB. This data was used to develop a web-based database with several functionalities. Users can search for specific peptides and download relevant data from the website. The website also included data cross-referenced and validated with IEDB. In addition, we incorporated clinically validated peptides capable of stimulating T cells to release cytotoxins or interferons. A machine-learning-based LightGBM model was trained to predict immunogenicity for these peptides through a series of cross-validations based on random data splitting. Users can access comprehensive information on tumor-specific peptides in the online database. We collected sequencing data from 243 patients, spanning five different types of cancer. The predominant HLA genotype is HLA-A*11:01, a common allele in the Taiwanese population. Peptide characteristics, such as hydrophobicity, binding affinity, and binding stability, have been calculated and stored in the database. Notably, the LightGBM model excelled in predicting immunogenicity, achieving an AUC of 0.95 on the training dataset and 0.8 on the testing dataset. Implemented in the online database, this model allows users to forecast their own candidates. The state-of-the-art database serves as a comprehensive platform for gathering Taiwanese-specific neoantigens, contributing to the advancement of personalized cancer vaccines and immunotherapies. Citation Format: Yu-Hsuan Tseng, Chia-Hsin Wu, Chia-Yu Sung, Huang Kevin Chih Yang, Mong-Hsun Tsai, Liang-Chuan Lai, Tzu-Pin Lu, K.S. Clifford Chao, Eric Y. Chuang, Chien-Yueh Lee. TWNeoDB: A web-based database for tumor neoantigens in the Taiwanese population [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 3543.