摘要
We report the generation and analysis of functional data from multiple, diverse experiments performed on a targeted 1% of the human genome as part of the pilot phase of the ENCODE Project. These data have been further integrated and augmented by a number of evolutionary and computational analyses. Together, our results advance the collective knowledge about human genome function in several major areas. First, our studies provide convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts, including non-protein-coding transcripts, and those that extensively overlap one another. Second, systematic examination of transcriptional regulation has yielded new understanding about transcription start sites, including their relationship to specific regulatory sequences and features of chromatin accessibility and histone modification. Third, a more sophisticated view of chromatin structure has emerged, including its inter-relationship with DNA replication and transcriptional regulation. Finally, integration of these new sources of information, in particular with respect to mammalian evolution based on inter- and intra-species sequence comparisons, has yielded new mechanistic and evolutionary insights concerning the functional landscape of the human genome. Together, these studies are defining a path for pursuit of a more comprehensive characterization of human genome function. The ENCODE project — standing for ENCyclopedia Of DNA Elements — has set out to identify all the functional elements in the human genome. With the genome sequence now established, the next challenge is to discover how the cell actually uses it as an instruction manual. The ENCODE consortium has completed the 'proof-of-principle' pilot phase of the project, an analysis of functional elements in a targeted 1% of the human genome. The results, published this week, suggest that most bases in the genome are found in primary transcripts, including non-protein-coding transcripts and those that overlap. Examination of transcriptional regulation has yielded new understanding about transcription start sites, and a more sophisticated view about chromatin structure. Integration of these data, in particular with respect to mammalian evolution, reveals new insights about how the information coded in the DNA blueprint is turned into functioning systems in the living cell. The next step after sequencing a genome is to figure out how the cell actually uses it as an instruction manual. A large international consortium has examined 1% of the genome for what part is transcribed, where proteins are bound, what the chromatin structure looks like, and how the sequence compares to that of other organisms.