1 / 42

Comparative Genome Maps

Comparative Genome Maps. CSCI 7000-005: Computational Genomics Debra Goldberg debg@hms.harvard.edu. What is a comparative map?. Why construct comparative maps?. Identify & isolate genes Crops: drought resistance, yield, nutrition... Human: disease genes, drug response,…

soyala
Télécharger la présentation

Comparative Genome Maps

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Comparative Genome Maps CSCI 7000-005: Computational Genomics Debra Goldberg debg@hms.harvard.edu

  2. What is a comparative map?

  3. Why construct comparative maps? • Identify & isolate genes • Crops: drought resistance, yield, nutrition... • Human: disease genes, drug response,… • Infer ancestral relationships • Discover principles of evolution • Chromosome • Gene family • “key to understanding the human genome”

  4. Why automate? • Time consuming, laborious • Needs to be redone frequently • Codify a common set of principles • Nadeau and Sankoff: warn of “arbitrary nature of comparative map construction”

  5. Definitions • Marker: identifiable chromosomal locus • Homology: genes with common ancester • Homeology: chromosomal regions derived from a common ancestral linkage group • Synteny: loci on the same chromosome • Colinearity: syntenic regions with conserved gene order

  6. Input/Output • Input: • genetic maps of 2 species • marker/gene correspondences (homologs) • Output: • a comparative map • homeologies identified

  7. 3S 8L 10L 3L Map construction Go from this to this Maize 1 (target), Rice (base) Wilson et al. Genetics 1999

  8. Maize 1 Rice 3S 8L 10L 3L Chromosome labeling Maize 1 (target), Rice (base) Wilson et al. Genetics 1999

  9. Maize 1 Rice 3S 8L 10L 3L A natural model? Maize 1 (target), Rice (base) Wilson et al. Genetics 1999

  10. m s Scoring 10L 3L

  11. Accept published marker order All linkage groups of base are unique Simplistic homeology criteria At least one homeologous region Assumptions

  12. A natural model?

  13. A natural model?

  14. A natural model?

  15. A natural model?

  16. Dynamic programming • li = location of homolog to marker i • S[i,a] = penalty (score) for an optimal labeling of the submap from marker ito the end, when labeling begins with label a a 1 ... i ... n

  17. a b ... ii+1 ... n lili+1 ln Recurrence relation S[n,a] = m (a, ln)S[i,a] = m (a, li) +min(S[i+1,b] + s (a,b) ) a ... n ... ln bL

  18. a-b-c motif: a b c score: 2s = 4 a a abbbc c c a-b-a motif: a score: 3m = 3 a a abbba a a Problem with linear model s = 2

  19. The stack model • Segment at top of the stack can be: • pushed (remembered), later popped • replaced • Push and replacecost s -- pop is free. d c c e f a b b b

  20. uaz265a (7L) isu136 (2L) isu151 (7L) rz509b (7L) cdo59c (7L) rz698c (9L) bcd1087a (9L) rz206b (9L) bcd1088c (9L) csu40 (3S) cdo786a (9L) csu154 (7L) isu113a (7L) csu17 (7L) cdo337 (3L) rz530a (7L) 7L m m m 9L “free” pop s 7L Scoring

  21. Dynamic programming • S[i,j,a] = score for an optimal labeling of: • submap from marker ito marker j • when labeling begins with label a -- i.e., marker iis labeleda a 1 ... i ... j ... n

  22. a 1 ... ii+1 ... n a b 1 ... ii+1 ... n a a 1 ... i ... k+1... j ... n Recurrence relation • S[i,i,a] =m (a, li) • S[i,j,a] =min: m (a, li) +min(S[i+1,j,b] + s (a,b) ) minS[i,k,a] + S[k+1,j,a] bL i<k<j

  23. Stack Results: infers evolutionary events Wilson et al. Maize 1 (target) Rice (base)

  24. 8p 8p 19p = 19p Problem: Incomplete input • Gene order not always fully resolved. • Co-located genes can be ordered to give most parsimonious labeling.

  25. The reordering algorithm • Uses a compression scheme • Within a megalocus, group genes by location of related gene. • Order these groups • First, last groups interact with nearby genes • Any ordering of internal groups is equally parsimonious

  26. The reordering algorithm

  27. The reordering algorithm

  28. Definitions  extended to distance to a set A of labels 0 if a  A, 1 otherwise S = the set of indices of supernode start elements For simplicity, call supernode i  S (a, A) =

  29. Definitions For i  S: • ni = # markers in i • ni(a) = # markers in i with a homolog on a • li = set of labels matching markers in i • li = {a  L |ni(a)  1},

  30. s : mni(c)  s mni(c) : mni(c)  s pi(c) = Definitions • pi(c) gives mismatched marker and segment boundary penalties for label c

  31. Definitions • p(i,a,b) gives the total mismatched marker and segment boundary penalties attributed to “hidden markers”  (pi(c)) + m i(a,b) : for iS, ab p(i,a,b) =  (m ni(c)) + m i(a,b) : for iS, a=b 0 : otherwise. c  a,b c  a

  32. Definitions For i  S: • i(a,b) = # labels in {a,b} without matching marker in i • i(a,b) = (a, li) + (b, li) • i(a,b)  {0,1,2}

  33. Definitions • i (a,b) corrects if mismatch marker penalties assigned twice for same marker; in the recurrence and in p(i,a,b) • For example: • i (a,b) = 0 if i(a,b) = 0(if a, b are both represented in supernode) • i (a,a) = -2 if i(a,a) > 0(if a is not represented in supernode)

  34. Recurrence relation S[i,i,a] =m (a, li) • S[i,j,a] = min: • m (a, li) + min (S[i+1,j,b] + s (a,b) + p(i,a,b)) • minS[i,k,a] + S[k+1,j,a] bL i<k<j k  S

  35. Results: Fewer mismatches stack reordering Mouse 5 (target) Human (base)

  36. Results: Mismatches placed between segments stack reordering Mouse 8 (target) Human (base)

  37. Results: Detects new segments stack reordering Mouse 13 (target) Human (base)

  38. Summary • Finds optimal comparative map • Arranges markers in most parsimonious way • First algorithm to use megalocus data • Fast, objective, simple to use • Biologically meaningful results

  39. Summary • Global view • Biologically meaningful results • Provides testable hypotheses • Robust • not species-specific • high/low resolution, genetic/physical maps • stable to errors in marker order

  40. Future Directions • Algorithmic extensions • 3rd species • polyploidy • search for ancient duplications • Deduce history of evolutionary events • makes genome rearrangement measures tractable and robust • infer common ancestor

  41. Future Directions • Block-segmental sequence comparisons • non-local sequence alignment • protein domains • 2D block-segmental comparisons • comparison of regulatory networks • image processing

  42. Acknowledgments • NSF • AAUW • David and Lucile Packard Foundation • USDA • Cooperative State Research Education and Extension Service • ONR • Jon Kleinberg • Susan McCouch • Chris Pelkie • Sandra Harrington • Sam Cartinhour • Dave Schneider

More Related