Gencode Mar '10 Meeting: Pseudogene Project Update Mark Gerstein

Gencode Mar '10 Meeting: Pseudogene Project UpdateMark Gerstein Illustration from Gerstein & Zheng (2006). Sci Am.

Overall Approach Overall Pipeline runs at Yale and UCSC, yielding raw pseudogenes Extraction of coherent subsets for further analysis and annotation Passing to Sanger for detailed manual analysis and curation Incorporation into final GENCODE annotation Pipeline modification Chronology of Sets Encode Pilot 1% Ribosomal Protein pseudogenes Unitary pseudogenes (Hard) Glycolytic Pseudogenes Polymorphic Pseudogenes Pseudogenes Associated with SDs Overall Flow:Pipeline Runs, Coherent Sets, Annotation, Transfer to Sanger

Specific Pseudogene Assignments: Glycolytic Pseudogenes (completed)

Number of pseudogenes for each glycolytic enzyme [Liu et al. BMC Genomics ('09)] Large numbers of processed GAPDH pseudogenes in mammals comprise one of the biggest families but numbers not obviously correlated with mRNA abundance. GAPDH Processed/Duplicated GAPDH

Number of pseudogenes for each glycolytic enzyme [Liu et al. BMC Genomics ('09)] Large numbers of processed GAPDH pseudogenes in mammals comprise one of the biggest families but numbers not obviously correlated with mRNA abundance. GAPDH Processed/Duplicated GAPDH 60 Proc/2 Dup

Distribution of human GAPDH pseudogenes Large numbers of processed GAPDH pseudogenes in mammals comprise one of the biggest families but numbers not obviously correlated with mRNA abundance. 60 Proc/2 Dup [Liu et al. BMC Genomics ('09, in press)]

Burst of Retrotran-spositional Activity Aproximate Age of GAPDH pseudogenes Age calculated based on Kimura-2 parameter model of nucleotide substitution [Liu et al. BMC Genomics ('09)]

Synteny of GAPDH pseudogenes Synteny derived based on local gene orthology [Liu et al. BMC Genomics ('09)]

Specific Pseudogene Assignments: Unitary Pseudogenes (completed)

Pseudogenes { Unitary pseudogene • Pseudogenes: nongenic DNA segments with high sequence similarity to functional genes Duplicated pseudogenes Duplication + Processed pseudogenes Transcription Transposition • Unitary pseudogenes: unprocessed pseudogenes with no functional counterparts Unitary pseudogenes In situ pseudogenization

Identification pipeline { Unitary pseudogene ~16k human-mouse orthologs ~23k mouse proteins ~6k mouse proteins without human orthologs HG ~600 candidate human unitary pseudogene loci 76 human unitary pseudogenes [Zhang et al. GenomeBiology (in press, '10)]

Relativity of unitary pseudogenes { Unitary pseudogene [Zhang et al. GenomeBiology (in press, '10)]

Unitary Pseudogene Families

Dating the pseudogenization events { Unitary pseudogene

Specific Pseudogene Assignments: Polymophic Pseudogenes (in process)

11 Polymorphic Pseudogenes

Polymorphic pseudogenes (3 with allele frequency data) [Zhang et al. GenomeBiology (in press, '10)] 3 SNPs not found to be under recent positive selection....

Fst hierarchical clustering for rs4940595 in SERPINB11 ....but population structure at rs4940595—the difference in the allelic frequencies in different populations—could be result of different selective regimes that the same allele at rs4940595 is subjected to in different population subdivisions.

Specific Pseudogene Assignments: SD-associated Pseudogenes (in process)

Segmental duplications (SDs) • Regions of the genome with  90% sequence identity and  1kb in length • Based on neutral divergence correspond to last ~40 million years of human evolution • Comprise ~5-6% of the human genome • Enriched with genes (~18%) and pseudogenes (duplicated ~45%, processed ~22%) Can the study of ψgenes in SDs provide information not obvious from individual dataset ? Bailey et al, Science, 2002

Nucleotide substitutions in ψgenes and SDs containing them Parent gene Duplicated ψgene K2m : Nucleotide substitutions per site computed using Kimura’s two parameter model • Most ψgenes show the same number of substitutions as larger SD region containing them • - Duplication accompanied by disablement • - Followed by neutral rate of evolution

Acknowledgements Z Zhang E Khurana Y J Liu YK LamS Balasubramanian G Fang N Carriero R RobilottoP Cayting M Wilson A Frankish M Diekhans R HarteT HubbardJ Harrow Pseudogene.org

Gencode Mar '10 Meeting: Pseudogene Project Update Mark Gerstein

Gencode Mar '10 Meeting: Pseudogene Project Update Mark Gerstein

Presentation Transcript

WELCOME TO THE 2006 JPA PLANNING COMMITTEE MEETING

IMAGE GENTLY AND NEW JERSEY CT DOSE PROJECT UPDATE

QualiFLY – 2 nd Project Meeting Malta, 13-15 Feb 2006

IU DLP Infrastructure Update

Welcome

The SPCC Update and Implications to Wind Power Generation

2011 IPGA SPRING MEETING

ALCO Update

Leadership group meeting august 6 th , 2013

Sunflower Project Change Agent Network Meeting #8

Purchasing Directors’ Meeting November 9, 2004 3:00 P.M.

TAG Meeting May 18, 2010

Trade Mark Examination

PSFIN V8 Project Quarterly Update Meeting

Project Name Kick Off Meeting

BIOINFORMATICS Sequences #1 (up to P-values)

Portal User Group Meeting

NC FAST and Work Support Strategies Operational Readiness Workshop

Access Project 3