620 likes | 840 Vues
Rearrangements and Duplications in Tumor Genomes. Chromosomal aberrations Structural : translocations, inversions, fissions, fusions. Copy number changes : gain and loss of chromosome arms, segmental duplications/deletions. Tumor Genomes. Mutation and selection. Compromised genome
E N D
Chromosomal aberrations Structural: translocations, inversions, fissions, fusions. Copy number changes: gain and loss of chromosome arms, segmental duplications/deletions. Tumor Genomes Mutation and selection Compromised genome stability
Rearrangements in Tumors Change gene structure, create novel fusion genes • Gleevec (Novartis 2001) targets ABL-BCR fusion
Rearrangements in Tumors Alter gene regulation Burkitt lymphoma translocation IMAGE CREDIT: Gregory Schuler, NCBI, NIH, Bethesda, MD, USA Regulatory fusion in prostate cancer (Tomlins et al.Science Oct. 2005)
Complex Tumor Genomes • What are detailed architectures of tumor genomes? • What genes affected? • What processes produce these architectures? • Can we create custom treatments for tumors based on mutational spectrum? (e.g. Gleevec)
Common Alterations across Tumors • Mutations activate/repress circuits. • Multiple points of attack. • “Master genes”: e.g. p53, Myc. • Others probably tissue/tumor specific. activation repression Duplicated genes Deleted genes
Human Cancer Genome Project • What tumors to sequence? • What to sequence from each tumor? • Whole genome: all alterations • Specific genes: point mutations • Hybrid approach: structural rearrangements etc.
Human Cancer Genome Project • What tumors to sequence? • What to sequence from each tumor? • Whole genome: all alterations • Specific genes: point mutations • Hybrid approach: structural rearrangements etc.
End Sequence Profiling (ESP)C. Collins and S. Volik (UCSF Cancer Center) • Pieces of tumor genome: clones (100-250kb). Tumor DNA 2) Sequence ends of clones (500bp). 3) Map end sequences to human genome. x y Human DNA Each clone corresponds to pair of end sequences (ES pair) (x,y). Retain clones that correspond to a unique ES pair.
ValidES pairs • l ≤ y – x ≤ L, min (max) size of clone. • Convergent orientation. End Sequence Profiling (ESP)C. Collins and S. Volik (UCSF Cancer Center) • Pieces of tumor genome: clones (100-250kb). Tumor DNA 2) Sequence ends of clones (500bp). L 3) Map end sequences to human genome. x y Human DNA
End Sequence Profiling (ESP)C. Collins and S. Volik (UCSF Cancer Center) • Pieces of tumor genome: clones (100-250kb). Tumor DNA 2) Sequence ends of clones (500bp). L 3) Map end sequences to human genome. x y Human DNA • InvalidES pairs • Putative rearrangement in tumor • ES directions toward breakpoints
Outline • Identify locations of rearrangements. • Reconstruct genome architecture, sequence of rearrangements. What does ESP reveal about tumor genomes? 3. In combination with other genome data (CGH).
ESP Data (Jan. 2006) • Coverage of human genome: ≈ 0.34 for MCF7, BT474 Clones Breast Cancer Cell Lines BT474 5031 9580 MCF7 5267 SKBR3 7623 Brain 19831 9612 Breast1 Tumors 4246 Breast2 1756 9267 Ovary Prostate ES pairs Normal 3448 7994 3923 5588 12073 6785 3222 7300 1300
1. Rearrangement breakpoints MCF7 breast cancer • Known cancer genes (e.g. ZNF217, BCAS3/4, STAT3) • Novel candidates near breakpoints. • Small-scale scrambling of genome more extensive than expected.
Reference Human Human Variant inversion A B C A -B C t t s Structural Polymorphisms • Human genetic variation more than nucleotide substitutions • Short indels/inversions present • (Iafrate et al. 2004, Sebat et al. 2004, Tuzun et al. 2005, McCarroll et al. 2006, Conrad et al. 2006 etc.) • ≈ 3% (53/1570) invalid ES pairs explained by known structural variants. s 1.6 Mb inversion
2. Tumor Genome Architecture • What are detailed architectures of tumor genomes? • What sequence of rearrangements produce these architectures?
x1 x2 x3 x4 y1 y2 x5 y5 y4 y3 ESP Genome Reconstruction Problem Human genome (known) A C E B D Unknown sequence of rearrangements Tumor genome (unknown) Map ES pairs to human genome. Reconstruct tumor genome Location of ES pairs in human genome. (known)
A -C E -D B x1 x2 x3 x4 y1 y2 x5 y5 y4 y3 ESP Genome Reconstruction Problem Human genome (known) A C E B D Unknown sequence of rearrangements Tumor genome (unknown) Map ES pairs to human genome. Reconstruct tumor genome Location of ES pairs in human genome. (known)
E B Tumor -D A -C -D B E -C A ESP Genome Reconstruction: Comparative Genomics Tumor A B C D E Human
ESP Genome Reconstruction: Comparative Genomics E B Tumor -D -C A A B C D E Human
E B Tumor -D -C A A B C D E Human ESP Genome Reconstruction: Comparative Genomics
ESP Genome Reconstruction: Comparative Genomics E (x3,y3) B (x2,y2) Tumor -D (x4,y4) -C (x1,y1) A A B C D E x1 x2 x3 x4 y1 y2 y4 y3
ESP Plot E (x3,y3) (x4,y4) D (x2,y2) • 2D Representation of ESP Data • Each point is ES pair. • Can we reconstruct the tumor genome from the positions of the ES pairs? Human (x1,y1) C B A A B C D E Human
ESP Plot E D • 2D Representation of ESP Data • Each point is ES pair. • Can we reconstruct the tumor genome from the positions of the ES pairs? Human C B A A B C D E Human
A -C E -D B ESP Plot → Tumor Genome E E D -D Human C -C B B A A A B C D E Human Reconstructed Tumor Genome
E D • 2D Representation of ESP Data • Each point is ES pair. • Can we reconstruct the tumor genome from the positions of the ES pairs? Human C B A A B C D E Human
2D Representation of ESP Data • Each point is ES pair. • Can we reconstruct the tumor genome from the positions of the ES pairs? Human Human
Real data noisy and incomplete! • Valid ES pairs • satisfy length/direction • constraints • l ≤ y – x ≤ L • Invalid ES pairs • indicate rearrangements • experimental errors
Human Tumor inversion A B C A -B C t s t s translocation A B C D -C -B D A t s t s Computational Approach • Use known genome rearrangement mechanisms • Find simplest explanation for ESP data, given these mechanisms. • Motivation: Genome rearrangements studies in phylogeny.
ESP Sorting Problem • G = [0,M], unichromosomal genome. • Reversal s,t(x)= x, if x < s or x > t, t – (x – s), otherwise. A B C G x1 y1 t x2 y2 s A -B G’ =G x2 y2 x1 y1 • Given: ES pairs (x1, y1), …, (xn, yn) • Find: Minimum number of reversals • s1,t1, …, sn, tn such that if = s1,t1… sn, tn then (x1, y1), …, (xn, yn) are valid ES pairs.
A B C x1 x2 x3 y1 t y2 y3 s -B A -C x1 y1 y3 x3 x2 y2 t Sequence of reversals. s s t All ES pairs valid.
y x Filtering Experimental Noise • Pieces of tumor genome: clones (100-250kb). Tumor DNA 2) Sequence ends of clones (500bp). Rearrangement Chimeric clone Cluster invalid pairs Isolated invalid pair 3) Map end sequences to human genome. Human DNA
tumor human x1 x2 y1 y2 x3 y3 x1 x2 x3 y2 y1 y3 Sparse Data Assumptions • Each cluster results from single inversion. 2. Each clone contains at most one breakpoint. tumor
ESP Genome Reconstruction: Discrete Approximation Human • Remove isolated invalid pairs (x,y) Human
ESP Genome Reconstruction: Discrete Approximation Human • Remove isolated invalid pairs (x,y) • Define segments from clusters Human
ESP Genome Reconstruction: Discrete Approximation Human • Remove isolated invalid pairs (x,y) • Define segments from clusters • ES Orientations define links between segment ends Human
ESP Genome Reconstruction: Discrete Approximation (x2, y2) (x3, y3) t (x1, y1) s Human • Remove isolated invalid pairs (x,y) • Define segments from clusters • ES Orientations define links between segment ends Human
5 5 4 4 3 3 2 2 1 1 1 2 3 4 5 ESP Graph • Edges: • Human genome • segments • ES pairs Paths in graph are tumor genome architectures. Tumor genome (1 -3 -4 2 5) = signed permutation of (1 2 3 4 5)
Sorting permutations by reversals = 12…n signed permutation (Sankoff et al.1990) Reversal (i,j) [inversion] 1…i-1 -j... -ij+1…n • Problem: Given , find a sequence of reversals 1, …, t with such that: • ¢1¢2¢¢¢t = (1, 2, …, n) andt is minimal. Solution: Analysis of breakpoint graph ← ESP graph • Polynomial time algorithms • O(n4) : Hannenhalli and Pevzner, 1995. O(n2) : Kaplan, Shamir, Tarjan, 1997. • O(n) [distance t] : Bader, Moret, and Yan, 2001.O(n3) : Bergeron, 2001.
1 -3 -2 4 5 Sorting Permutations 1 -3 -4 2 5 1 2 3 4 5
Breakpoint Graph Black edges: adjacent elements of 1 -3 -4 2 5 end start Gray edges: adjacent elements of i = 1 2 3 4 5 1 2 3 4 5 start end Key parameter: Black-gray cycles
1 -3 -2 4 5 end start Breakpoint Graph Black edges: adjacent elements of 1 -3 -4 2 5 end start Gray edges: adjacent elements of i = 1 2 3 4 5 1 2 3 4 5 start end Key parameter: Black-gray cycles ESP Graph → Tumor Permutation and Breakpoint Graph Theorem: Minimum number of reversals to transform to identity permutation i is: d() ≥ n+1 - c() where c() = number of gray-black cycles.
MCF7 Breast Cancer Cell Line • Low-resolution chromosome painting suggests complex architecture. • Many translocations, inversions.
ESP Data from MCF7 tumor genome • Each point (x,y) is ES pair. • 6239 ES pairs (June 2003) • 5856 valid (black) • 383 invalid • 256 isolated (red) • 127 form 30 clusters • (blue) Coordinate in human genome
MCF7 Genome Sequence of Human chromosomes MCF7 chromosomes 5 inversions 15 translocations Raphael, Volik, Collins, Pevzner. Bioinformatics 2003.
3. Combining ESP with other genome data Array Comparative Genomic Hybridization (aCGH)
CGH Analysis • Divide genome into segments of equal copy number Copy number profile Genome coordinate Copy number
Copy number Genome coordinate CGH Analysis • Divide genome into segments of equal copy number Copy number profile Numerous methods (e.g. clustering, Hidden Markov Model, Bayesian, etc.) Segmentation • No information about: • Structural rearrangements • (inversions, translocations) • Locations of duplicated material in tumor genome.
CGH Segmentation How are the copies of segments linked??? 5 3 Copy number 2 Genome Coordinate ES pairs links segments Tumor genome
ESP + CGH ES near segment boundaries 5 3 Copy number 2 Genome Coordinate ESP breakpoint CGH breakpoint