计算机科学
散列函数
转录组
弦(物理)
数据挖掘
生物
基因
基因表达
遗传学
数学
数学物理
计算机安全
标识
DOI:10.1016/j.compbiomed.2019.103539
摘要
Accurate and efficient read-alignment is one of the fundamental challenges in RNA-seq analysis. Due to the increasingly large number of reads generated from the RNA-seq experiments, read-alignment is a time-consuming task. Many mappers adopted various strategies to look for potential alignment locations for reads in a tolerable time, and provide adequate information for downstream analysis. But in some transcript analysis tasks, such as transcriptome quantification, the mapping information about the transcripts and positions for reads is sufficient. Thus the original alignment problem can be simplified to a string searching problem since the reads can be mapped contiguously to the transcriptome. Some models for transcript analysis adopt more efficient strategies to solve this simplified problem, but the efficiency is still restricted by handling RNA-seq data in the original read space. We propose a method, bit-mapping, based on learning to hash algorithm for mapping reads to the transcriptome. It learns hash functions from the transcriptome and generates binary hash codes of the sequences, then maps reads to the transcriptome according to their hash codes. Bit-mapping accelerates mapping problems in RNA-seq analysis by reducing the dimension of the read. We evaluate the performance of bit-mapping based on simulated data and real data, and compare it with other popular and state-of-the-art methods, STAR, RapMap, Bowtie 2 and HISAT 2. The comparative results of simulated and real data show that the accuracy of our method is competitive to the existing mappers in terms of mapping efficiency, especially for longer reads (¿ 100 bp).
科研通智能强力驱动
Strongly Powered by AbleSci AI