Computational tools for disease gene identification - PowerPoint PPT Presentation

computational tools for disease gene identification n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Computational tools for disease gene identification PowerPoint Presentation
Download Presentation
Computational tools for disease gene identification

play fullscreen
1 / 88
Computational tools for disease gene identification
282 Views
Download Presentation
kasia
Download Presentation

Computational tools for disease gene identification

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Computational tools for disease gene identification Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis

  2. Summary • How could we identify genes involved in human disorders? • Positional cloning in the pre-genomic era. • Monogenic/multifactorial diseases. • Computational tools: Positional cloning in the post genomic era.

  3. Monogenic versus Complex Diseases : Genes & Environment Environmental Effect Genetic Component Hemophilia Cystic Fibrosis Stroke Asthma Lung Cancer Skin Cancer Alzheimer’s Cardiovascular Disease Motor Vehicle Accident Schizophrenia Familial Colon or Breast Cancer Type 2 Diabetes Bipolar Disorder S.K. Brahmachari, GENOMED-HEALTH meeting

  4. What could we learn from disease gene identification? • Better understanding of the underlying biology of the trait in question • Serve as direct targets for better treatments • Pharmacogenetics • Interventions • Predictions of susceptibility to the disease • Predictions of the course of the disease • Knowledge for treatment or prevention

  5. “SIMPLE” MENDELIAN GENETIC DISEASES • Diseases of Simple Genetic Architecture • Can tell how trait is passed in a family: follows a recognizable pattern (Mendelian disease) • One gene altered per family (exceptions) • Usually quite rare in population (exceptions) • “Causative” gene

  6. Some examples of deleterious mutations Stop codon creation CAG Gln TAG

  7. Modes of inheritance • X linked • Duchenne muscular dystrophy

  8. Autosomal dominant • Huntington disease

  9. Autosomal recessive • Cystic fibrosis

  10. Mitochondrial • Leber Optic atrophy C

  11. Functional cloning versus positional cloning of genes Disease Chromosomal localisation Function/ Protein Gene Disease Chromosomal localisation Function/ Protein Gene

  12. Position-Independent Methods . • Gene-specific oligonucleotides: hemophilia A Factor VIII gene (most common form of hemophilia, X-linked) • Clotting factor purified from pig, and its N-terminal amino acids were sequenced. • This allowed a group of oligonucleotides to be synthesized. • These probes were used with colony hybridization against a cDNA library.

  13. Positional cloning of genes Disease Chromosomal localisation Function/ Protein Gene Disease Chromosomal localisation Function/ Protein Gene

  14. Identification of informative families Genetic mapping Physical mapping Identification of coding sequences (candidate genes) n o r m a l m u t é Mutation screening . . . C C T G A G G A G . . . . . . C C T G T G G A G . . . Functional analysis . . . P r o G l u G l u . . . . . . P r o V a l G l u . . .

  15. Genetic mapping What are the markers that are used for genetic mapping

  16. Polymorphisms used in Gene Mapping • 1980s – RFLP marker maps • 1990s – microsatellite marker maps

  17. IL-12p35AC F tggtggcagaaatcattgtctgaaaagtaattgttttacttttattcttttcgtgtgtgtgtgtgt gtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgcatgtgccagatttcttgtttgaaaggcaat gagcttcatccaagtatcaa 78.57% IL-12p35AC R IL-12p40AC F atttcaggtgtgagccactgtgcctggccagaactttttcaatgaatattcaagataattgtatacacattttatatatatatatatatatacacacacacacacacacacatatgtatacacacattatatatataatccatgttatatacatctctacattatatatatccactatatatattttacttatacatatagattttatttttatgaactaggatcaaattgta 69.23% 1 2 3 4 5 IL-12p40AC R 174 170 166 Identification de Polymorphismes de type microsatellites par analyse de séquence:

  18. SNPs in Genetic Analysis • Abundance – lots • Position – throughout genome • Haplotype patterns – groups of SNPs may provide exploitable diversity • Rapid and efficient to genotype • Increased stability over other types of mutation

  19. Gene mapping: Linkage analysis Do marker alleles co-segregate with the disease by chance or are there linked to the underlying gene?

  20. Crossing over and Recombination

  21. Recombination Fraction •  = ½ : independent assortment (Mendel) •  < ½ : linked loci •  = 0 : tightly linked loci (no recombination)

  22. LOD Score Analysis The likelihood ratio as defined by Morton (1955): L(pedigree| = x) L(pedigree |  = 0.50) where  represents the recombination fraction and where 0 x  0.49. When all meioses are “scorable”, the LR is constructed as: L.R. = : z() is the lod score at a particular value of the recombination fraction : z() is the maximum lod score, which occurs at the MLE of the recombination fraction  The LOD score (z) is the log10 (L.R.) H1: Linkage H0: Exclusion =0

  23. 1 to 10 years! Identification of informative families Cytogenetic anomalies Animal model Genetic mapping Physical mapping Identification of coding sequences (candidate genes) Functional candidate genes n o r m a l m u t é Mutation screening . . . C C T G A G G A G . . . . . . C C T G T G G A G . . . Functional analysis . . . P r o G l u G l u . . . . . . P r o V a l G l u . . .

  24. Branchio-oto-renal syndrome • Clinical features: deafness, renal anomalies, cervical cysts… • Mapped to 8q13. PAC contig 11083 9480 4405 10910 cDNA library screening, cDNA selection and exon trapping

  25. PAC (P1 derived) Sonication or partial digestion T 7 T 3 subcloning in pBCSK+ Selection of clones Sequencing T7, T3 Sequence assemble and analysis

  26. A G C T A T The different steps used for sequence analysis Quality assessment Elimination of contaminating sequences Blastn against vector, bacteria, yeast… databases Assemble using Phred, Phrap, Consed Identification of candidate genes by blastx and tblastx, Gene prediction tool: GRAIL

  27. 11083 9480 4405 10910 BLASTX 1.4.7 [19-Dec-94] [Build 07:11:56 Jun 16 1995] Query= w1g9t7.Seq (743 letters) Translating both strands of query sequence in all 6 reading frames Database: ../../databases/fasta/nrprot 244,544 sequences; 71,258,360 total letters. Searching..................................................done Smallest Sum Reading High Probability Sequences producing High-scoring Segment Pairs: Frame Score P(N) N pir|S|A45174 eyes absent (eya) protein (alternatively... -2 173 5.6e-15 1 >pir|S|A45174 eyes absent (eya) protein (alternatively spliced) - fruit fly (Drosophila melanogaster) >gp||DRONOEYE_ Length = 760 Minus Strand HSPs: Score = 173 (79.6 bits), Expect = 5.6e-15, P = 5.6e-15 Identities = 29/36 (80%), Positives = 34/36 (94%), Frame = -2 Query: 169 LCLPXGVRGGVDWMRKLAFRYRRVKEIYNTYKNNVG 62 LCLP GVRGGVDWMRKLAFRYR++K+IYN+Y+ NVG Sbjct: 586 LCLPTGVRGGVDWMRKLAFRYRKIKDIYNSYRGNVG 621

  28. EYA1 gene structure 1 2 1 4 - 1 1 1 ' 2 3 4 5 6 7 8 9 1 0 1 1 1 3 1 5 1 6 - I I I ' I I I I I I V V V I V I I V I I I I X X X I X I I X I V X V X I I I Identification of a new gene family EYA1, EYA2, EYA3, ….

  29. COMPLEX (MULTIFACTORIAL) GENETIC DISEASE • Diseases of Complex Genetic Architecture • No clear pattern of inheritance • Moderate to strong evidence of being inherited • Common in population: cancer, heart disease, dementia etc. • Involves many genes and environment • “Susceptibility” genes

  30. Complex disease loci mapping Linkage Analysis Large Families Small Families Association Studies Family-Based Case-Control

  31. Study Designs Linkage Analysis Large Families Small Families Association Studies Family-Based Case-Control

  32. 1 2 (B-C)2 TDT= (B+C) TDT calculation Transmitted 2 1 12 12 Non-Transmitted 11 With > 5 per cell, this follows a 2 distribution with 1 df

  33. Examples: Alzheimer’s • Alzheimer’s disease and ApoE The E4 allele appears to be positively associated with Alzheimer’s disease: Odds Ratio = (58/16)/(33/55) = 6

  34. February 2001 « Finished » sequence April 1953-April 2003

  35. Identification of informative families Genetic mapping Physical mapping Identification of coding sequences (candidate genes) n o r m a l m u t é Mutation screening . . . C C T G A G G A G . . . . . . C C T G T G G A G . . . Functional analysis . . . P r o G l u G l u . . . . . . P r o V a l G l u . . .

  36. Genetic mapping Physical mapping Cytogenetic abnormalities Animal models Positional and functional candidates Genome databases and genome browsers Comparative Genome Hybridization. Comparative Genomics Microarray analysis Past and present tools

  37. NCBI genome browser Visualize all the genes in an interval

  38. UCSC genome browser

  39. Ensembl genome browser

  40. NCBIgenome browser showing candidate region for EV

  41. How to collect and interpret all the data? • How to choose the best “candidate” gene?

  42. Strategies and adapted tools for gene selection are urgently needed! • Find candidate genes for the trait (time and cost!) • WHAT genes are there? • WHAT do they do? • How could they play a role in the disease • = Data mining and integration!! • Visualization of the whole picture • Global view • Option to zoom into detail

  43. http://www.esat.kuleuven.be/endeavour.

  44. Disease Gene Finding (Center for Biological Sequence Analysis) Combining network theory and phenotype associations in an automated large scale disease gene finding platform Networks – deducing functional relationships from network theory Phenotype association Grouping disorders based on their phenotype.