AGEseq: Analysis of Genome Editing by Sequencing

生物 计算生物学 基因组 DNA测序 基因组编辑 遗传学 进化生物学 基因
作者
Liang-Jiao Xue,Chung-Jui Tsai
出处
期刊:Molecular Plant [Elsevier]
卷期号:8 (9): 1428-1430 被引量:45
标识
DOI:10.1016/j.molp.2015.06.001
摘要

Knockout experiments are critical for the evaluation of gene function. Researchers have increasingly relied on genome editing technologies for precise mutagenesis at loci of interest, using engineered nucleases such as Zinc finger nucleases, transcription activator-like effector nucleases (TALENs), and CRISPR (clustered regularly interspaced short palindromic repeats)-associated proteins. Sequence-specific targeting and cleavage by these systems generate double-stranded breaks and trigger endogenous repair machineries, resulting in small indels that can disrupt reading frames and gene function. These methods have been successfully applied to plants; the CRISPR system is particularly powerful for non-model species (Belhaj et al., 2013Belhaj K. Chaparro-Garcia A. Kamoun S. Nekrasov V. Plant genome editing made easy: targeted mutagenesis in model and crop plants using the CRISPR/Cas system.Plant Methods. 2013; 9: 39Crossref PubMed Scopus (409) Google Scholar, Lozano-Juste and Cutler, 2014Lozano-Juste J. Cutler S.R. Plant genome engineering in full bloom.Trends Plant Sci. 2014; 19: 284-287Abstract Full Text Full Text PDF PubMed Scopus (65) Google Scholar). Several tools, such as TALENT (Cermak et al., 2011Cermak T. Doyle E.L. Christian M. Wang L. Zhang Y. Schmidt C. Baller J.A. Somia N.V. Bogdanove A.J. Voytas D.F. Efficient design and assembly of custom TALEN and other TAL effector-based constructs for DNA targeting.Nucleic Acids Res. 2011; 39: e82Crossref PubMed Scopus (1560) Google Scholar) and CRISPR-P (Lei et al., 2014Lei Y. Lu L. Liu H.Y. Li S. Xing F. Chen L.L. CRISPR-P: a web tool for synthetic single-guide RNA design of CRISPR-system in plants.Mol. Plant. 2014; 7: 1494-1496Abstract Full Text Full Text PDF PubMed Scopus (401) Google Scholar), have been developed to facilitate the design of genome editing experiments. However, few tools are available to evaluate the outcome of genome editing. Amplicon sequencing is commonly employed for genome editing analysis where genomic sequences that span the target loci are amplified, sometimes cloned, and sequenced. A number of programs have been developed to decode heterozygous chromatograms from direct sequencing of PCR products for identification of sequence polymorphisms (Crowe, 2005Crowe M.L. SeqDoC: rapid SNP and mutation detection by direct comparison of DNA sequence chromatograms.BMC Bioinformatics. 2005; 6: 133Crossref PubMed Scopus (31) Google Scholar, Dmitriev and Rakitov, 2008Dmitriev D.A. Rakitov R.A. Decoding of superimposed traces produced by direct sequencing of heterozygous indels.PLoS Comput. Biol. 2008; 4: e1000113Crossref PubMed Scopus (102) Google Scholar, Ma et al., 2015Ma X. Chen L. Zhu Q. Chen Y. Liu Y.-G. Rapid decoding of sequence-specific nuclease-induced heterozygous and biallelic mutations by direct sequencing of PCR products.Mol. Plant. 2015; https://doi.org/10.1016/j.molp.2015.02.012Abstract Full Text Full Text PDF Scopus (98) Google Scholar). However, the throughput of Sanger sequencing, even without cloning, is not amenable to screening large numbers of transgenic lines, especially with increasingly sophisticated multiplex targeting (Xie et al., 2015Xie K.B. Minkenberg B. Yang Y.N. Boosting CRISPR/Cas9 multiplex editing capability with the endogenous tRNA-processing system.Proc. Natl. Acad. Sci. USA. 2015; 112: 3570-3575Crossref PubMed Scopus (764) Google Scholar). No open-source programs are currently available for analysis of amplicon-sequencing data from high-throughput sequencers. After quality-control filtering and demultiplexing, amplicon sequence analysis usually involves alignment with target/reference sequences and detection of editing events, such as indels or single nucleotide polymorphisms (SNPs). Much bioinformatic effort is required, unless commercial software is available. A web-based tool for amplicon-sequencing data analysis was recently reported (Guell et al., 2014Guell M. Yang L.H. Church G.M. Genome editing assessment using CRISPR Genome Analyzer (CRISPR-GA).Bioinformatics. 2014; 30: 2968-2970Crossref PubMed Scopus (98) Google Scholar). However, only one reference sequence is accepted at a time, which makes application to large datasets cumbersome. Here, we report a versatile and user-friendly tool, Analysis of Genome Editing by Sequencing (AGEseq), to address this limitation. AGEseq is available from AspenDB (http://aspendb.uga.edu) as a standalone program or a Galaxy (Goecks et al., 2010Goecks J. Nekrutenko A. Taylor J. Team T.G. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences.Genome Biol. 2010; 11: R86Crossref PubMed Scopus (2670) Google Scholar)-based web tool. AGEseq supports both Sanger and deep-sequencing reads. For deep sequencing, degenerate primers can be designed to amplify both alleles of the target gene as well as closely related gene(s). Amplicons from unrelated genes or across samples are then barcoded and pooled for sequencing (Figure 1A). For data analysis, AGEseq requires a design file and a directory of read files as inputs. The design file describes the reference sequences, usually containing 30–40 bp flanking regions of the target editing site(s) (Figure 1B). The read files are stored in a directory named “reads” by default, and multiple file formats are accepted (Figure 1A). AGEseq uses BLAT to align reference and read sequences. Aligned reads are assigned to the best hit among the reference sequences provided in the design file, and matching regions are extracted for indel or SNP calling. The output file reports the aligned (target and read) sequences and detection frequency for each editing event (Figure 1C and 1D). Our laboratory has recently applied CRISPR-based genome editing to lignin biosynthesis perturbations in Populus. A gene-specific guide RNA (gRNA) was designed to target 4-coumarate:CoA ligase 1 (4CL1), but not the paralogous 4CL5 (Zhou et al., 2015Zhou X. Jacobs T.B. Xue L.-J. Harding S.A. Tsai C.-J. Exploiting SNPs for biallelic CRISPR mutations in the outcrossing woody perennial Populus reveals 4-coumarate:CoA ligase specificity and redundancy.New Phytol. 2015; https://doi.org/10.1111/nph.13470Crossref Scopus (200) Google Scholar). Degenerate primers were designed to amplify both 4CL1 (target) and 4CL5 (off-target) sequences from independent transgenic lines to assess editing specificity. AGEseq successfully distinguished the duplicates as well as their alleles (Figure 1B), and confirmed biallelic mutations in all transgenic lines examined, with no off-target cleavage of 4CL5 (Figure 1E). In support of a null 4CL1, all primary transformants exhibited a reddish-brown wood discoloration (Figure 1F) known to be associated with lignin modification (Zhou et al., 2015Zhou X. Jacobs T.B. Xue L.-J. Harding S.A. Tsai C.-J. Exploiting SNPs for biallelic CRISPR mutations in the outcrossing woody perennial Populus reveals 4-coumarate:CoA ligase specificity and redundancy.New Phytol. 2015; https://doi.org/10.1111/nph.13470Crossref Scopus (200) Google Scholar). As a further test, AGEseq was applied to amplicon data of soybean with DDM1 (Decrease in DNA Methylation) editing in one or two homoeologous loci as described in Jacobs et al., 2015Jacobs T.B. LaFayette P.R. Schmitz R.J. Parrott W.A. Targeted genome modifications in soybean with CRISPR/Cas9.BMC Biotechnol. 2015; 15: 16Crossref PubMed Scopus (351) Google Scholar. The editing patterns detected by AGEseq were consistent with those obtained by Geneious R7 (Biomatters Ltd.) used in that study, ranging from small indels (<5 nt) to large deletions (>10 nt), with varying (1–98%) editing efficiencies (Supplemental Table 1) (Jacobs et al., 2015Jacobs T.B. LaFayette P.R. Schmitz R.J. Parrott W.A. Targeted genome modifications in soybean with CRISPR/Cas9.BMC Biotechnol. 2015; 15: 16Crossref PubMed Scopus (351) Google Scholar). AGEseq flags events with a long stretch of indels and/or mismatches as “strange events” that require manual examination, and three such cases were identified. Manual inspection confirmed a large (44 nt) deletion in one case, while the other two were found by Jacobs et al., 2015Jacobs T.B. LaFayette P.R. Schmitz R.J. Parrott W.A. Targeted genome modifications in soybean with CRISPR/Cas9.BMC Biotechnol. 2015; 15: 16Crossref PubMed Scopus (351) Google Scholar to harbor unusual insertions from the Agrobacterium rhizogenes root-inducing plasmid after additional cloning and sequencing. These results demonstrate the versatility of AGEseq in detecting or flagging genome editing patterns across a wide range of data scenarios. Detailed instructions on AGEseq are provided for all operating systems (Supplemental Text). The analysis sensitivity can be adjusted by two user-configurable parameters: mismatch allowance (default at 10%) and minimum read coverage (default at 0). Systematic errors introduced during amplicon library preparation and sequencing that involve PCR or by base-calling algorithms are common in deep-sequencing data, and they will appear as “SNPs” in the AGEseq report (Figure 1C and Supplemental Text). For this reason, AGEseq considers indels as potential genome editing events by default, although SNPs are also reported. If SNPs are of interest, setting a minimum read coverage is recommended to reduce random errors. A known limitation of BLAT and similar aligners is their inconsistent gap handling in the presence of homo-nucleotides, as shown for both 4CL1 alleles in Figure 1C (red boxes, 1-nt deletion at position 56 or 57). AGEseq does not consider these differences and reports, by default, the sum of all indel reads as well as wild-type (WT)-like (non-edited) reads from each sample in the summary (Figure 1D). User inspection is therefore recommended. As mentioned, AGEseq also facilitates identification of unusual events that require manual inspection, and sometimes follow-up experiments to confirm the editing patterns. The ability of AGEseq to effectively discriminate allelic sequences of duplicated genes suggests that it can support analysis with polyploid genomes. When only one reference sequence is provided, the AGEseq output can be mined for allelic variations, if any, in the target region. As a standalone software, AGEseq is (1) easy to use; no command line or programming skill is required for Windows or Mac users; (2) versatile; multiple sequencing platforms and file types are supported for assessing genome editing, allelic variation and/or off-target cleavage; and (3) extensible; the Perl script can easily be exported to other bioinformatics pipelines. As an example, we adapted AGEseq as a utility in the Galaxy platform (Goecks et al., 2010Goecks J. Nekrutenko A. Taylor J. Team T.G. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences.Genome Biol. 2010; 11: R86Crossref PubMed Scopus (2670) Google Scholar) to support web-based analysis. It is accessible at AspenDB (http://aspendb.uga.edu/ageseq) or through the Galaxy Tool Shed (https://toolshed.g2.bx.psu.edu) for installation in local instances. A limitation of the web tool is that only one sequence read file can be processed at a time. For a multiplexed dataset with a large number of samples, the use of the standalone AGEseq program is recommended. Although developed for genome editing analysis, AGEseq can be adapted for SNP genotyping, metagenomic analysis, or other amplicon-sequencing applications.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
MHB应助叫滚滚采纳,获得10
刚刚
wzxxxx发布了新的文献求助10
刚刚
斯文败类应助勤劳傲晴采纳,获得10
1秒前
shilong.yang发布了新的文献求助10
1秒前
momo完成签到,获得积分10
2秒前
wxp_bioinfo完成签到,获得积分10
3秒前
3秒前
桐桐应助wangg采纳,获得10
3秒前
Jun完成签到,获得积分10
4秒前
芝士的酒发布了新的文献求助50
4秒前
5秒前
赘婿应助复杂的问玉采纳,获得30
5秒前
6秒前
6秒前
7秒前
端庄白开水完成签到,获得积分10
7秒前
吕春雨发布了新的文献求助10
7秒前
大个应助wxp_bioinfo采纳,获得10
8秒前
yqq完成签到 ,获得积分10
8秒前
9秒前
10秒前
芝士发布了新的文献求助10
10秒前
橘子发布了新的文献求助10
11秒前
11秒前
11秒前
晨曦发布了新的文献求助10
12秒前
12秒前
kobiy完成签到 ,获得积分10
12秒前
wu完成签到 ,获得积分10
13秒前
蛋泥完成签到,获得积分10
13秒前
顾矜应助mingjie采纳,获得10
14秒前
zhaowenxian发布了新的文献求助10
14秒前
勤劳傲晴发布了新的文献求助10
15秒前
15秒前
橘子完成签到,获得积分10
17秒前
可耐的从安完成签到 ,获得积分10
18秒前
zho应助背后的诺言采纳,获得10
18秒前
粥粥完成签到,获得积分10
18秒前
19秒前
打打应助陈杰采纳,获得10
20秒前
高分求助中
Continuum Thermodynamics and Material Modelling 3000
Production Logging: Theoretical and Interpretive Elements 2700
Social media impact on athlete mental health: #RealityCheck 1020
Ensartinib (Ensacove) for Non-Small Cell Lung Cancer 1000
Unseen Mendieta: The Unpublished Works of Ana Mendieta 1000
Bacterial collagenases and their clinical applications 800
El viaje de una vida: Memorias de María Lecea 800
热门求助领域 (近24小时)
化学 材料科学 生物 医学 工程类 有机化学 生物化学 物理 纳米技术 计算机科学 内科学 化学工程 复合材料 基因 遗传学 物理化学 催化作用 量子力学 光电子学 冶金
热门帖子
关注 科研通微信公众号,转发送积分 3527884
求助须知:如何正确求助?哪些是违规求助? 3108006
关于积分的说明 9287444
捐赠科研通 2805757
什么是DOI,文献DOI怎么找? 1540033
邀请新用户注册赠送积分活动 716904
科研通“疑难数据库(出版商)”最低求助积分说明 709794