1 / 50

Functional genomics approaches to disease genomics

Functional genomics approaches to disease genomics. Biological information and organisation Genomics approaches to identifying disease-relevant enrichment Candidate gene approaches. Biological information increases rapidly. Everyday hundreds of articles are published We can’t read them all

tess
Télécharger la présentation

Functional genomics approaches to disease genomics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Functional genomics approaches to disease genomics • Biological information and organisation • Genomics approaches to identifying disease-relevant enrichment • Candidate gene approaches

  2. Biological information increases rapidly • Everyday hundreds of articles are published • We can’t read them all • We can’t remember them all • Our memories are subjective anyway • To make use of this incredible research output, we need some ways to bring this information together and summarise it • If we could make it readable by a computer then our power to use it increases hugely

  3. OMIM Home Pagehttp://www.ncbi.nlm.nih.gov/omim/

  4. OMIM • Online Mendelian Inheritance in Man (OMIM) is a catalog of human genes and genetic disorders, with links to literature references, sequence records, maps, and related databases • Annotates 325 genes associated with human disease • 2,710 disorders with a known molecular basis • 1,634 genetic disorders with an unknown basis • The OMIM entries are made by experienced annotators • Even the best annotators are not wholly consistent

  5. What is Ontology? • Dictionary: A branch of metaphysics concerned with the nature and relations of being. • Barry Smith:The science of what is, of the kinds and structures of objects, properties, events, processes and relations in every area of reality. 1606 1700s Slide from the GO website www.geneontology.org

  6. Ontologies • Formalising our knowledge into a structured and defined vocabulary is essential for genomics approaches • The benefits from an agreed language enable rapid progress (e.g. Species classification) • Recently, biological research communities have been defining a common language for describing everything from protein function through to phenotype

  7. From a practical view, ontology is the representation of something we know about. “Ontologies" consist of a representation of things, that are detectable or directly observable, and the relationships between those things. is part of Slide taken from GO (www.geneontology.org)

  8. Gene Ontology (GO) • The Gene Ontology project was set up to provide a controlled vocabulary that describes a gene and its products (principally its product) • GO describes genes in 3 separate ontologies • Molecular function, biological process and cellular location • Genes can be annotated with many terms in each category

  9. Molecular Function GO term: Malate dehydrogenase. GO id: GO:0030060 (S)-malate + NAD(+) = oxaloacetate + NADH. Cellular Component GO term: mitochondrion GO id: GO:0005739 Definition: A semiautonomous, self replicating organelle that occurs in varying numbers, shapes, and sizes in the cytoplasm of virtually all eukaryotic cells. It is notably the site of tissue respiration. Biological Process GO term: tricarboxylic acid cycle Synonym: Krebs cycle Synonym: citric acid cycle GO id: GO:0006099 GO

  10. GO Biological Process Is_a • Directed Acyclic Graph (DAG) • Allows a child node to have more than one parent Physiological Process Is_a Metabolism Is_a Is_a Primary Metabolism Is_a Biosynthesis Protein Metabolism Is_a Is_a Protein Biosynthesis

  11. Mammalian Phenotype Ontology Really the mouse phenotype ontology Annotators take each published mouse gene knock-out experiment and annotate the phenotype with the MPO

  12. Human Medical Ontologies • Human Phenotype Ontology www.human-phenotype-ontology.org • The HPO provides a standardized vocabulary of phenotypic abnormalities encountered in human genetic syndromes • London Dysmorphology Databasewww.human-phenotype-ontology.org Abn. of the cardiac septa Organ abnormality Cardiac malformation Cardiac abnormality Cardiovascular abnormality Abn. of the cardiac atria

  13. Model Organisms • Excellent functional genomics resources • The comparison between a human phenotype and a mouse phenotype is often very readily interpretable. • Other useful organisms include the fly, the worm and even yeast • Useful as they have well-curated data for many genes

  14. Kyoto Encyclopaedia of Genes and Genomes (KEGG) • Pathway database • manually-curated information from literature

  15. High-throughput functional resources • Tissue-expression • Where and when genes are expressed may be relevant to the disease • Interactions • genes that interact may be involved in the same biological process • E.g. protein-protein interactions or genetic interactions (coordinated regulation) • Sequence patterns (coding or regulatory) • Similar sequence can infer common functionality

  16. Different data sources have different types of error • Literature sources (GO, model organism data, etc) have poor coverage and a lack of true negatives • We publish “A is an X” more than “A is not a Y” • All genes have not been subject to the same studies • High-throughput sources often have high-error rates • False-positives are particularly a problem for gene/protein interactions when you’re considering all pairs

  17. The value of mouse phenotypic data Ability to predict Human Phenotype Ontology terms

  18. Forming interesting gene sets • If you can’t identify a single gene/loci, may be you can form a subset of genes likely to contain gene(s) of interest • Genes in large intervals identified by linkage studies • Genes near SNPs with low, but not genome-wide significant, p-values from GWAS studies • Genes in de novo or rare CNVs seen in cases • Power is important • Bringing together many similar cases enriches for disease genes associated with that disease

  19. Testing for enrichments • Compare to the genome • Pulling balls (genes) from a bag (genome) is sampling without replacement, hypergeometric distribution • Compare to controls • If chosen well, may account for biases • Contingency tables, Chi2 tests • If controls are unavailable, you can randomise to help address potential biases like gene length and function

  20. Rare de novo copy number variant (CNV) associated with learning disability 2.8 Mb 2.8 Mb How does this CNV relate to the etiology of the disease? Which gene(s) underlie the phenotype?

  21. Rare de novo CNVs are frequent in learning disability Collect a list of 148 rare de novo CNVs • Rare de novo CNVs > 100kb present in ~10% of LD cases • Occur all over genome • 80% unique, non-recurrent

  22. CNVs are common in all people Collect a list of 26,472 benignCNVs Redon et al. Nature 2006 Apparently benign, mostly inherited CNVs occur all over genome

  23. Mutations at different loci can give a similar phenotype SYMPTOM/PHENOTYPE

  24. Method Interesting intervals in patients Available Mouse KO phenotypes Mouse Genes Human Genes ORTHOLOGY Mouse models relevant to the human disorder Disease phenotype Significantly over-represented phenotype

  25. Significant enrichments of genes associated with particular mouse phenotypes within de novo CNVs identified in patients with Intellectual disability 15 200 200 300 300 10 250 250 150 150 5 200 200 % change 150 150 over 0 100 100 expected 100 100 -5 50 50 50 50 -10 0 0 0 0 -15 Benign CNVs All LD CNVs LD CNVs - benign CNVs Loss LD CNVs Loss LD CNVs - benign CNVs * * * * * * * * * * * % change over expected Abnormal dopaminergic neuron morphology Abnormal axon morphology Nervous System category FDR < 5% *

  26. Human brain-specific genes corroborates mouse findings * * “Brain-specific” genes are defined as those whose expression in human whole brain is > 4 x median expression across all other tissues Provides ~ 3.75% of human genes as “brain-specific” Benign CNVs * * All LD CNVs All LD CNVs minus benign CNVs Loss LD CNVs Loss LD CNVs minus benign CNVs Brain-specific Genes

  27. Autism Spectrum Disorders – the ‘triad’ of symptoms Impaired communication Impaired social interaction Restrictive, repetitive behaviours and interests Autism.org.uk

  28. Behavioural model phenotypes associated with Autism Spectrum Disorder (ASD) de novo CNVs “Difficulty processing and retaining verbal information” “Difficulty understanding social language” “Difficulty coping with changes in routine”

  29. Behavioural model phenotypes associated with Autism Spectrum Disorder (ASD) de novo CNVs “Difficulty understanding social language” “Difficulty with empathy and friendships”

  30. Behavioural model phenotypes associated with ASD de novo CNVs “Restricted and Repetitive Behaviours and Interests” 60-80% of individuals with ASD exhibit poor motor planning and coordination

  31. Candidate genes • The genes that constitute significant enrichments become candidate disease genes • While the enrichment issignificantly associated with the intervals, the individual genes are not, and each requires further proof individually • Experimental follow-up is costly and thus the genes taken forward need to be considered carefully

  32. Annotations vary in coverage and specificity Mouse phenotypes Abnormal Axon/Neuron GO Transcription Brain- Specific KEGG Neuro KEGG Parkinson’s Number of candidate genes % change over expected % of CNVs with a candidate gene

  33. The better the patients are classified the more power we have to identify enrichments Tremor phenotype Benign CNVs Patients +/- seizures LD CNVs in 6 patients with cleft palate 142 without cleft palate Abnormal myelination phenotype Patients +/- brain abnormality Enrichment for KO phenotype cleft palate 6 of 148 LD patients have a cleft palate

  34. Some associations found for the main cohort may be more relevant to associated, or co-occurring, symptoms – ASD

  35. Mutation databases are a rich source of discovery: DECIPHER • DECIPHER is a database that holds genetic information about patients who present with congenital abnormalities Proband 1 Proband 2 Proband 3 Very similar phenotype Single gene

  36. DECIPHER patients are annotated with London Medical Database terms Level 1 Level 2 Level3

  37. Cranium, General abnormalities Formed groups CNVs associated with each human phenotype 7 CNVs 121 CNVs 18 CNVs ENSEMBL genes assigned to CNVs 132 CNVs 692 genes 3320 genes 3036 genes Remove copy number variable genes observed in healthy individuals 633 genes 3030 genes 2767 genes

  38. Many enrichments are readily interpretable Human Symptom: Short Stature, Prenatal Onset Human Symptom: Cupid bow shape of mouth * * * Mouse Phenotype: Decreased Fetal Size Mouse Phenotype: Abnormal Palate Development Human Symptom: Malocclusion Human Symptom: Syndactyly of toes * * * Mouse Phenotype: Malocclusion Mouse Phenotype: Syndactyly Gain Loss * Statistically Significant FDR < 0.05 All

  39. Others identify less obvious relationships Human Symptom: Psychotic Behaviour Human Symptom: Complex Partial Seizures * * * Mouse Phenotype: Abnormal pre-pulse inhibition Mouse Phenotype: Abnormal circadian rhythm KEY Gain Loss All * Statistically Significant FDR < 0.05

  40. Mutations can be dissected to identify the contributions of individual genes Patient id: 248772 ATG7 OXTR ATP2B2 Intellectual disability/ developmental delay candidate genes Short stature, prenatal onset candidate gene FANCD2 Patient id: 785 Camptodactyly candidate gene SNX2 Mental retardation/ developmental delay candidate gene FBN2

  41. Gene set enrichment analysis Aravind Subramanian et al, 2005 • Start with some list of ranked genes • Genes ranked by expression cases vs controls (Microarrays) • Genes ranked by nearby SNP p-values • Score genes + or – according to some property • Ask, are genes with this property more focussed towards the top of this list that I would expect by chance?

  42. Gene Prioritisation for disease • Given a list of genes, which are most likely to be involved in this disease? • We just want a ranking, not a significant association • Commonly employed approaches involve supervised learning methodologies • Collect data points from one or more sources • Take a “Gold Standard” set of genes for this disease • Train a method using known true +ives (and true –ives if known) • Given a list of genes, which ones “look” most similar to the known disease genes?

  43. Linkage networks can infer missing values – “guilt by association”

  44. From pubmed ID: 19728866

  45. Linkage network for human disorders using the Human Phenotype ontology (PMID 18950739)

  46. Conserved co-expression of disease genes (Ala et al. ,PLoS Genetics 2008) • 850 OMIM entries where a phenotype was mapped to a loci but specific genes unknown • Used conserved human-mouse co-expression data as other interaction or pathway data can bias towards studied genes • Generated single species gene co-expression networks • Calculated Pearson’s cor. coef. between all pairs of gene expression data. Formed a network edge if 2 genes’ exp. correlation was in the top 1% either gene. • Clustered OMIM phenotypes using MimMiner • A text-mining tool

  47. Using this methodology, they were able to predict 321 candidates across 81 disease-associated loci at an FDR of <10%

  48. Human phenome-interactome network for predicting disease candidate genes(Lage et al., Nature Biotech. 2007) • 2 data networks • Phenotypic similarity, consisting of detecting words that are common to two phenotype descriptions and do not occur frequently among all phenotype description. • Human interactome, consisting of several large human sets and sets transferred from model organisms, weighted according to observation frequency.

  49. a given positional candidate is queried for high-scoring interaction partners (“virtual pull-down”). These are interaction partners for the candidate complex. • proteins known to be involved in disease are identified in the candidate complex, and pairwise scores of the phenotypic overlap between disease of these proteins and the candidate phenotype are assigned. • Based on the phenotypes represented in the candidate complex, a Bayesian predictor awards a probability to the candidate in the complex. The score is used to form the ranking.

More Related