摘要
The complete nucleotide sequence of bacteriophage T7 DNA, 39,936 base-pairs, has been determined by the techniques of Maxam & Gilbert. All previously known T7 genes and several unsuspected genes have been identified in the sequence. T7 DNA carries genetic information very efficiently: the coding sequences of 50 genes are close-packed but essentially not overlapping, and occupy almost 92% of the nucleotide sequence. This arrangement strongly suggests that all 50 of these closepacked genes are expressed, although there is as yet evidence for expression of only 38 of them. In addition, five potential overlapping genes have been identified, and there is preliminary evidence that one of them is expressed. Where gaps between coding sequences are found, they usually are less than 100 basepairs long, and usually contain one or more transcription signals, RNAase III cleavage sites, or origins of replication. Transcription signals in the T7 DNA include the three strong early promoters and the early termination site for Escherichia coli RNA polymerase, and 17 promoters and one termination site for T7 RNA polymerase. Ten RNAase III cleavage sites have been located, five in the early region and five in the late region. The primary transcripts are processed at these sites to provide the messenger RNAs observed in vivo. Almost all of the T7 messenger RNAs are polycistronic, but there are few polar effects at the level of transcription or translation, and most T7 proteins seem to be initiated independently, each from its own ribosome-binding and initiation site. The initiation codon for most T7 proteins is AUG, but a few proteins are predicted to begin at GUG. Certain T7 genes specify pairs of overlapping proteins. The two proteins specified by gene 4 are made in about equal amounts, beginning at two different ribosome-binding and initiation sites in the same reading frame and ending at a common termination codon. The two proteins specified by gene 10 are made in very different amounts. They begin at the same initiation site, but the minor gene 10 protein appears to be produced by a shift in translational reading frame just ahead of the normal termination codon, thereby adding 53 amino acids to the COOH-terminal end of the major protein. Gene 10 specifies the major capsid protein of the phage particle, and both the major and minor gene 10 proteins are incorporated into the phage particle. One or two other T7 genes appear to utilize translational frameshifting to produce unequal amounts of proteins that differ at their COOH-terminal ends. The amino acid sequences and compositions predicted for all of the T7 proteins (except the proteins produced by frameshifting) are given. T7 DNA begins and ends with a perfect direct repeat of 160 base-pairs. Immediately adjacent to this terminal repetition, at both ends of the mature DNA, lie very similar, regular arrays of 12 imperfect copies of a seven-base sequence. These arrays occupy about 160 base-pairs, starting about 15 basepairs from the terminal repetition. In the concatemeric form of T7 DNA, a single copy of the terminal repetition is flanked by these two arrays of repeated sequences, and it seems likely that this arrangement is involved somehow in formation of the ends of mature T7 DNA.