100 likes | 213 Vues
The ENCODE (Encyclopedia of DNA Elements) Project aims to create a detailed catalog of all sequence features in the human genome. Initiated with lessons from the Human Genome Project, the approach comprises three phases: starting with a pilot project using existing technologies, developing new methodologies for lesser-studied elements, and soliciting applications for expanded research. The project concentrates on identifying functional elements, enhancing computational methods, and integrating experimental data to annotate the human genome thoroughly. Future planning explores scalable technologies and standards for genome completeness.
E N D
ENCODEEncyclopedia of DNA Elements GENCODE Gene Finding Workshop February 7, 2005 Sanger Institute, Hinxton UK
Challenge Compile a comprehensive encyclopedia of all of the sequence features in the human genome. Approach: • Apply lessons learned from the success of the Human Genome Project • Start with well-defined pilot project • Develop and test high-throughput technologies
Phased Approach to ENCODE Phase 1: Pilot Project using Existing Technologies Research Consortium focused on identification of known functional elements with existing technologies Phase 2: Technology Development Focus on less well-studied functional elements and novel technologies Phase 3:Expanded Pilot Project Solicit new applications (Currently being planned)
ENCODE Consortium Goals • Test and compare existing and new methods for exhaustive identification and verification of functional sequence elements • Identify gaps in ability to annotate genome • Set a clear path for scaling this effort to the entire human genome
ENCODE Consortium Goals • Iterative cycle of computational predictions and experimental validations • Improve computational methods of identifying functional elements • Integration of both experimental and computational data to expand the annotation of the human genome
Selection of ENCODE Targets • 30 Mb (1%) of the human genome • Pick multiple regions to sample different genome landscapes • Half are selected manually - targets with a lot of baseline information • Half are selected randomly • Select targets with different gene density and conservation
r112 r221 r121 r231 r113 m002 r212 5 4 3 r331 r131 1 2 m011 m010 m009 r334 r223 r123 r332 r114 m013 r323 m012 m001 m003 r222 r321 m014 r312 r232 11 8 9 10 7 12 6 m008 r111 r211 r213 r233 r311 r313 r122 r322 18 16 17 r132 15 14 13 r333 m004 m005 m007 r133 r324 20 21 22 Y 19 m006 X
DNase Hypersensitive Sites DNA Replication Epigenetic Genes and Transcripts Cis-regulatory elements (promoters, transcription factor binding sites) Long-range regulatory elements (enhancers, repressors/silencers, insulators)
ENCODE Future Planning • What technologies can efficiently scale to the whole genome? • Are there “Gold Standards” to determine specificity and sensitivity? • If not, what is a measure for “completeness” • Model Organism ENCODE • Drosophila, Worm, Yeast
Project Description Paper Published in 2004 The ENCODE (ENCylopedia of DNA Elements) Project. The ENCODE Consortium. Science, Vol. 306, 636-640, 22 October 2004