计算机科学
序列(生物学)
算法
编码
信使核糖核酸
数学
生物
遗传学
基因
作者
He Zhang,Liang Zhang,Ziyu Li,Kaibo Liu,Boxiang Liu,David H. Mathews,Liang Huang
出处
期刊:Cornell University - arXiv
日期:2020-04-21
被引量:8
摘要
A messenger RNA (mRNA) vaccine has emerged as a promising direction to combat the current COVID-19 pandemic. This requires an mRNA sequence that is stable and highly productive in protein expression, features which have been shown to benefit from greater mRNA secondary structure folding stability and optimal codon usage. However, sequence design remains a hard problem due to the exponentially many synonymous mRNA sequences that encode the same protein. We show that this design problem can be reduced to a classical problem in formal language theory and computational linguistics that can be solved in O(n^3) time, where n is the mRNA sequence length. This algorithm could still be too slow for large n (e.g., n = 3, 822 nucleotides for the spike protein of SARS-CoV-2), so we further developed a linear-time approximate version, LinearDesign, inspired by our recent work, LinearFold. This algorithm, LinearDesign, can compute the approximate minimum free energy mRNA sequence for this spike protein in just 11 minutes using beam size b = 1, 000, with only 0.6% loss in free energy change compared to exact search (i.e., b = +infinity, which costs 1 hour). We also develop two algorithms for incorporating the codon optimality into the design, one based on k-best parsing to find alternative sequences and one directly incorporating codon optimality into the dynamic programming. Our work provides efficient computational tools to speed up and improve mRNA vaccine development.
科研通智能强力驱动
Strongly Powered by AbleSci AI