摘要
The clustered regularly interspaced short palindromic repeat (CRISPR) - CRISPR-associated protein 9 (CRISPR-Cas9) system has emerged as a versatile molecular tool for genome editing in various organisms in recent years (Tsai and Joung, 2016Tsai S.Q. Joung J.K. Defining and improving the genome-wide specificities of CRISPR-Cas9 nucleases.Nat. Rev. Genet. 2016; 17: 300-312Google Scholar). In this system, the endonuclease of Cas9 is directed to DNA targets by a synthetic guide RNA (sgRNA). The ribonucleoprotein complex of Cas9 and sgRNA recognizes the DNA sequence that is complementary to the 5′-end guide sequence (also referred to as the spacer) of the sgRNA and the presence of a protospacer adjacent motif (PAM) preceding the targeting site. This simple RNA-guided DNA targeting system enables many innovative applications in genome engineering. The major concern about this technology is that Cas9 has off-target effects. Recent studies also showed that different guide sequences in sgRNAs have variable efficiency in genome editing (Fu et al., 2014Fu Y. Sander J.D. Reyon D. Cascio V.M. Joung J.K. Improving CRISPR-Cas nuclease specificity using truncated guide RNAs.Nat. Biotechnol. 2014; 32: 279-284Google Scholar, Doench et al., 2016Doench J.G. Fusi N. Sullender M. Hegde M. Vaimberg E.W. Donovan K.F. Smith I. Tothova Z. Wilen C. Orchard R. et al.Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9.Nat. Biotechnol. 2016; 34: 184-191Google Scholar, Liang et al., 2016Liang G. Zhang H. Lou D. Yu D. Selection of highly efficient sgRNAs for CRISPR/Cas9-based plant genome editing.Sci. Rep. 2016; 6: 21451Google Scholar). Therefore, the choice of targeting sites (the same as the guide sequence of sgRNA) is the critical step in CRISPR-Cas9 technology. To date, dozens of bioinformatic tools have been developed to optimize sgRNA design for various organisms (Ding et al., 2016Ding Y. Li H. Chen L.L. Xie K. Recent advances in genome editing using CRISPR/Cas9.Front. Plant Sci. 2016; 7: 703Google Scholar) (https://omictools.com/crispr-cas9-category), including CRISPR-P, which we previously developed for plant sgRNA design (http://cbi.hzau.edu.cn/crispr/) (Lei et al., 2014Lei Y. Lu L. Liu H.Y. Li S. Xing F. Chen L.L. CRISPR-P: a web tool for synthetic single-guide RNA design of CRISPR-system in plants.Mol. Plant. 2014; 7: 1494-1496Google Scholar). In this study, we introduce CRISPR-P version 2.0 (CRISPR-P 2.0) with updates according to the recent developments in CRISPR-Cas9 technology and feedback from users in the past 2 years. With these new features, CRISPR-P 2.0 provides a robust bioinformatic platform for various applications of CRISPR-Cas9 in plants. CRISPR-P 2.0 provides web services for computer-aided design of sgRNA with minimal off-target potentials. It has the similar interface as the previous version but includes many new features for guide sequence analysis. The main features added in CRISPR-P 2.0 are as follows. (1) It supports sgRNA design for 49 plant genomes, covering almost all available plant species that have well-assembled genomes so far (Supplemental Table 1). We will continuously update this web tool to include the high-quality genomes of more plant species when available. (2) CRISPR-P 2.0 uses a modified scoring system to rate the off-targeting potential and on-targeting efficiency of sgRNAs for Streptococcus pyogenes Cas9, which is the most widely used CAS9 protein to date. The scoring system in CRISPR-P 2.0 is based on the latest studies on SpCas9 specificity and efficiency in genome editing. (3) It supports the design of guide sequences for various CRISPR-Cas systems, including Cpf1 (Zetsche et al., 2015Zetsche B. Gootenberg J.S. Abudayyeh O.O. Slaymaker I.M. Makarova K.S. Essletzbichler P. Volz S.E. Joung J. van der Oost J. Regev A. et al.Cpf1 is a single RNA-guided endonuclease of a class 2 CRISPR-Cas system.Cell. 2015; 163: 759-771Google Scholar) and various Cas9 endonucleases. (4) A comprehensive analysis of the guide sequence is provided in CRISPR-P 2.0, including the GC content, restriction endonuclease site, microhomology sequence flanking the targeting site (microhomology score), and the secondary structure of sgRNA. (5) Identification of sgRNA from custom sequences is also provided. If the user's genome/sequence is not listed in the selectable genomes, it allows users to upload custom sequences and identify sgRNAs. With these updates, CRISPR-P 2.0 provides a more efficient and straightforward bioinformatic tool to design CRISPR-Cas9 genome editing in plants. The overall procedure of sgRNA design in CRISPR-P 2.0 is shown in Figure 1A (see the online manual for details). CRISPR-P 2.0 accepts the gene locus tag, genomic position, or genomic sequence as input for 49 plant genomes. Compared with CRISPR-P, version 2.0 adds 23 plant genomes sequenced in recent years, including Arachis duranensis (v1.0), Arachis ipaensis (v1.0), Brassica napus (v4.1), Brassica oleracea (v1.0), Capsella rubella (v1.0), Citrullus lanatus (v1.0), Coffea canephora, Cucumis melo (v3.5), Fragaria vesca (v2.0.a1), Gossypium hirsutum (v1.1), Lentinula edodes (W1-26), Lentinula edodes (B17), Lotus japonicus (v3.0), Marchantia polymorpha (v3.1), Nicotiana benthamiana (v0.4.4), Panicum virgatum (v1.1), Ricinus communis, Utricularia gibba, Zea mays (AGPv4), Oryza sativa subsp. japonica subsp. Nipponbare (IRGSP-1.0 pseudomolecules), O. sativa subsp. indica Zhenshan 97 and Minghui 63, and another indica rice cultivar kasalath (Supplemental Table 1). These plant genomes were added according to user feedback, and all the genomes have been updated to the latest version. We included several optional parameters to customize guide sequence analysis, including guide sequence (also referred as the spacer) length, PAM sequence, and small nucleolar RNA (snoRNA) promoters to express sgRNA. The guide sequence length can affect target efficiency and off-target potential, i.e., sgRNAs with a truncated guide sequence (<20 nt) could improve Cas9 specificity (Fu et al., 2014Fu Y. Sander J.D. Reyon D. Cascio V.M. Joung J.K. Improving CRISPR-Cas nuclease specificity using truncated guide RNAs.Nat. Biotechnol. 2014; 32: 279-284Google Scholar). Therefore, we enable customized guide sequence length in the range of 15–22 nt in CRISPR-P 2.0. PAM is a short sequence immediately following the guide complementary region in the targeted DNA. The PAM is absolutely necessary for Cas9-gRNA binding, and its exact sequence is dependent on the Cas9 species. CRISPR 2.0 supports gRNA guide sequence design for different CRISPR-Cas9 PAM sequences that have been developed in recent years (Leenay and Beisel, 2016Leenay R.T. Beisel C.L. Deciphering, communicating, and engineering the CRISPR PAM.J. Mol. Biol. 2016; 429: 177-191Google Scholar, Leenay et al., 2016Leenay R.T. Maksimchuk K.R. Slotkowski R.A. Agrawal R.N. Gomaa A.A. Briner A.E. Barrangou R. Beisel C.L. Identifying and visualizing functional PAM diversity across CRISPR-Cas systems.Mol. Cell. 2016; 62: 137-147Google Scholar), including SpCas9 from Streptococcus pyogenes (NGG), SpCas9 from Streptococcus pyogenes (NRG), StCas9 from Streptococcus thermophiles (NNAGAAW), NmCas9 from Neisseria meningitides (NNNNGMTT), SaCas9 from Staphylococcus aureus (NNGRRT), AsCpf1 from Acidaminococcus (TTTN), LbCpf1 from Lachnospiraceae (TTTN), FnCpf1 from Francisella (TTN), YCN from Pyrococcus furiosus, CCW from Clostridium difficile, YYC from Bacillus halodurans, AWG from Escherichia coli, CC from Pseudomonas aeruginosa and MMA from Pyrococcus furiosus (R = A, G; W = A, T; M = A, C; Y = T, C). These customized settings in CRISPR-P 2.0 could facilitate gRNA design for various CRISPR-Cas9 systems in plant genome editing. After submission of a job, the CRISPR-P 2.0 server will process the gene sequence and analyze all CRISPR-Cas9 targetable sites of an input chromosomal segment. A typical result from CRISPR-P 2.0 is shown in Figure 1B–1D. All targetable sites in the DNA segment are shown in a genome browser. The details of each guide sequence are displayed, including the position, on-target score, off-target score, and alignments with the potential off-targeting sequences (Figure 1B). The on-target and off-target score system is adapted from previous research that analyzed sgRNA editing efficiency by a high-throughput analysis in mammalian cells (Doench et al., 2014Doench J.G. Hartenian E. Graham D.B. Tothova Z. Hegde M. Smith I. Sullender M. Ebert B.L. Xavier R.J. Root D.E. Rational design of highly active sgRNAs for CRISPR-Cas9-mediated gene inactivation.Nat. Biotechnol. 2014; 32: 1262-1267Google Scholar, Doench et al., 2016Doench J.G. Fusi N. Sullender M. Hegde M. Vaimberg E.W. Donovan K.F. Smith I. Tothova Z. Wilen C. Orchard R. et al.Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9.Nat. Biotechnol. 2016; 34: 184-191Google Scholar), and we employed their models to predict the on- and off-target activities of sgRNAs in CRISPR-P 2.0. In addition, CRISPR-P 2.0 also displays the GC content of the guide sequence (Ren et al., 2014Ren X. Yang Z. Xu J. Sun J. Mao D. Hu Y. Yang S.J. Qiao H.H. Wang X. Hu Q. et al.Enhanced specificity and efficiency of the CRISPR/Cas9 system with optimized sgRNA parameters in Drosophila.Cell Rep. 2014; 9: 1151-1162Google Scholar, Liang et al., 2016Liang G. Zhang H. Lou D. Yu D. Selection of highly efficient sgRNAs for CRISPR/Cas9-based plant genome editing.Sci. Rep. 2016; 6: 21451Google Scholar), the restriction enzyme site in the targeting region, and synthetic DNA oligos to make the sgRNA construct (Figure 1B). These data on the guide sequences could facilitate the design of sgRNAs for plant genome editing using CRISPR-Cas systems. CRISPR-P 2.0 also provides the microhomology score and the secondary structure of sgRNA for assessing sgRNA efficiency (Figure 1C and 1D). The microhomologous sequence flanking the CRISPR-Cas9 cutting site may impair the outcome and efficiency of DNA repair (Bae et al., 2014Bae S. Kweon J. Kim H.S. Kim J.S. Microhomology-based choice of Cas9 nuclease target sites.Nat. Methods. 2014; 11: 705-706Google Scholar). Thus, we also added a microhomology scoring module in CRISPR-P 2.0. Binding between Cas9 and sgRNA is dependent on the stem-loop structure of sgRNA (Nishimasu et al., 2014Nishimasu H. Ran F.A. Hsu P.D. Konermann S. Shehata S.I. Dohmae N. Ishitani R. Zhang F. Nureki O. Crystal structure of Cas9 in complex with guide RNA and target DNA.Cell. 2014; 156: 935-949Google Scholar), while various guide sequences may impair the folding of sgRNAs. Therefore, we integrated RNAfold (Lorenz et al., 2016Lorenz R. Luntzer D. Hofacker I.L. Stadler P.F. Wolfinger M.T. SHAPE directed RNA folding.Bioinformatics. 2016; 32: 145-147Google Scholar) in CRISPR-P 2.0 to predict all sgRNA secondary structures. These advanced features of CRISPR-P 2.0 will enable better design of sgRNA in practice. Considering that the CRISPR-Cas9 system is also widely used in many organisms whose whole-genome sequences are not available yet, we include a simple guide sequence and PAM extraction program in CRISPR-P 2.0. This tool could also be used to estimate the on-target score of sgRNAs in addition to identifying and analyzing the guide sequence from given sequences. In the Design page, users can input custom sequences in FASTA format, and the server can provide the potential sgRNAs. In summary, we present CRISPR-P 2.0, a robust and straightforward bioinformatics tool for plant genome editing using CRISPR-Cas9, which is freely available at http://cbi.hzau.edu.cn/CRISPR2/. This work was supported by the National Key Research and Development Program of China (2016YFD0100904) and the National Natural Science Foundation of China (31571351 and 31571374).