The Human Genome Some interesting facts
Biological system overview Genes have variability, which causes a phenotype Proteins and RNAs interact in pathways and networks ~8 interactions pp Genes need to be expressed at the right time in the right place ~ 5k – 10k genes per tissue Genes encode proteins which may be processed or modified -100k – 500k proteins
The human genome • Genome size: 3200 Mbp 24 chromosomes + mitochondrion http://www.ensembl.org
Sequencing the genome • In 1953 James Watson and Francis Crick discovered the structure of DNA - the code of instructions for all life on earth • 50 years later the human genome was sequenced by hierarchical shotgun sequencing
Sequencing the genome • The human genome was sequenced by: • The International Human Genome Sequencing Consortium • Celera Genomics • Technique –hierarchical shotgun sequencing • Draft sequences release in early 2001, but ~10% euchromatin missing and 150 000 gaps! • After finishing -rereleased in 2004 with 341 gaps and covering 99% of euchromatic genome
Sequencing time period First human genome took ~5 years and cost ~$3 billion Now, can sequence in a few weeks for ~$5,000 BUT: doesn’t consider cost and time for data analysis! International Human Genome Sequencing Consortium 2001. Nature 409, 860 – 921.
Size of the genome • There are 100 trillion (100,000,000,000,000) cells in your body. • There are three billion (3,000,000,000) base pairs in the DNA code within each cell. • The genome requires more than 3 gigabytes of computer storage space • Full genome done by NGS costs $100/genome per year to store http://www.pbs.org/wgbh/nova/genome/facts.html
Interesting facts • If all the DNA in your body was put end to end, it would reach to the sun and back over 600 times (100 trillion times six feet/92 million miles).\ • If unwound and tied together, the strands of DNA in one cell would stretch almost six feet but would be only 50 trillionths of an inch wide. • It would take a person typing 60 words per minute, eight hours a day, around 50 years to type the human genome. • If all three billion letters in the human genome were stacked one millimeter apart, they would reach a height 7,000 times the height of the Empire State Building. http://www.pbs.org/wgbh/nova/genome/facts.html
Some statistics • Only 1.5% of genome is coding • Other non-protein coding sequence is for other kinds of “genes” or “lost genes” • A proportion of our genome is not our own! • 50% repeat regions, most of viral origin! • single most common protein is the "recipe" for making Reverse Transcriptase • 99.9% of our sequences are identical
Number of human genes • First estimates of between 20 000 and 150 000 genes • Seems to be between 20 000 and 30 000 genes • Expansion of the number of different protein molecules due to: • (a) alternative splicing (30 to 50% increase); • (b) post-translational modifications (5 to 10 fold increase) • There could be about 1 million different protein molecules in the human body
Gene numbers 21000 14000 genes 22000 19000 genes 2000-5000 genes 6000 genes 24000 genes
Latest genome build • Known protein-coding genes: 20,442 • Novel protein-coding genes: 434 • Pseudogenes: 15,007 • RNA genes: 12,523 • Gene exons: 649,964 • Gene transcripts: 181,744
Protein coding genes Many of the genes are alternatively spliced Human genes have short exons (50 codons) and long introns (10k) Average gene length is 3000bp, max is 2.4 mill We know the function of less than half of all the genes
Comparative genomics • Comparing the human genome to others:
Genes in common with other organisms About 75% of human genes have non-human homologues, ~70% match mouse proteins International Human Genome Sequencing Consortium 2001. Nature 409, 860 – 921.
Functional composition Humans have more multifunctional genes, and genes involved in cell-cell communication and signalling International Human Genome Sequencing Consortium 2001. Nature 409, 860 – 921.
Human genome resources • Ensembl • UCSC Genome Browse • OMIM –human genes and inherited disorders • dbSNP -single nucleotide polymorphisms • Genetic Map at NCBI • Etc.