算术编码
编码(社会科学)
算术
计算机科学
可变长度代码
编码(内存)
上下文自适应二进制算术编码
自适应编码
数学
算法
数据压缩
人工智能
统计
无损压缩
作者
Yingxin Hu,Yanjun Liu,Yuefei Yang
标识
DOI:10.1089/cmb.2024.0697
摘要
With the rapid advancement of big data and artificial intelligence technologies, the limitations inherent in traditional storage media for accommodating vast amounts of data have become increasingly evident. DNA storage is an innovative approach harnessing DNA and other biomolecules as storage mediums, endowed with superior characteristics including expansive capacity, remarkable density, minimal energy requirements, and unparalleled longevity. Central to the efficient DNA storage is the process of DNA coding, whereby digital information is converted into sequences of DNA bases. A novel encoding method based on adaptive arithmetic coding (AAC) has been introduced, delineating the encoding process into three distinct phases: compression, error correction, and mapping. Prediction by Partial Matching (PPM)-based AAC in the compression phase serves to compress data and enhance storage density. Subsequently, the error correction phase relies on octal Hamming code to rectify errors and safeguard data integrity. The mapping phase employs a "3-2 code" mapping relationship to ensure adherence to biochemical constraints. The proposed method was verified by encoding different formats of files such as text, pictures, and audio. The results indicated that the average coding density of bases can be up to 3.25 per nucleotide, the GC content (which includes guanine [G] and cytosine [C]) can be stabilized at 50% and the homopolymer length is restricted to no more than 2. Simulation experimental results corroborate the method's efficacy in preserving data integrity during both reading and writing operations, augmenting storage density, and exhibiting robust error correction capabilities.
科研通智能强力驱动
Strongly Powered by AbleSci AI