1 / 45

Engineering a Scalable Placement Heuristic for DNA Probe Arrays

Engineering a Scalable Placement Heuristic for DNA Probe Arrays. A.B. Kahng, I.I. Mandoiu, P. Pevzner, S. Reda (all UCSD), A. Zelikovsky (GSU). Outline. DNA probe arrays and unwanted illumination Synchronous array design (2-D placement) Asynchronous array design (3-D placement)

yaron
Télécharger la présentation

Engineering a Scalable Placement Heuristic for DNA Probe Arrays

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Engineering a Scalable Placement Heuristic for DNA Probe Arrays A.B. Kahng, I.I. Mandoiu, P. Pevzner, S. Reda (all UCSD), A. Zelikovsky (GSU)

  2. Outline • DNA probe arrays and unwanted illumination • Synchronous array design (2-D placement) • Asynchronous array design (3-D placement) • Experimental results • Extensions • Conclusions

  3. Outline • DNA probe arrays and unwanted illumination • Synchronous array design (2-D placement) • Asynchronous array design (3-D placement) • Experimental results • Extensions • Conclusions

  4. DNA Probe Arrays • Used in wide range of genomic analyses • Gene expression monitoring, SNP mapping, sequencing by hybridization,… • Arrays with up to 1000x1000 probes in commercial use, 108 probes envisioned for next generation arrays • Highly scalable algorithms required for array design

  5. Simplified DNA Array Flow Probe Selection Mask Design: Placement & Embedding Mask Manufacturing Array Manufacturing Soft/Computational Domain Hybridization Experiment Analysis of Hybridization Intensities Hard/Biochemistry Domain Gene sequences, position of SNPs, etc.

  6. Array Manufacturing Process Very Large-Scale Immobilized Polymer Synthesis: • Treat substrate with chemically protected “linker” molecules, creating rectangular array • Site size = approx. 10x10 microns • Selectively expose array sites to light • Light deprotects exposed molecules, activating further synthesis • Flush chip surface with solution of protected A,C,G,T • Binding occurs at previously deprotected sites • Repeat steps 2&3 until desired probes are synthesized

  7. Photo-Deprotection Step Our concern: diffraction unwanted illumination yield decrease

  8. AC G CG G  M3 ACG AG AC AG C CG C  M2 Placed probes Nucleotide deposition sequence ACG A A A A A C C A  M1 C C C C G G G G G G Probe Synthesis

  9. AC G CG G  M3 ACG AG AC AG C CG border C  M2 Placed probes Nucleotide deposition sequence ACG A A A A A C C A  M1 C C C C G G G G G G Measuring Unwanted Illumination Unwanted illumination border length

  10. 4-group … T G G G C A T T G G C A T T T G C C C C A (a) (b) (c) (d) Synchronous vs. Asynchronous Synthesis (a) periodic deposition sequence (b) Synchronous embedding of CTG (c) Asynchronous leftmost embedding of CTG (d) Another asynchronous embedding

  11. Outline • DNA probe arrays and unwanted illumination • Synchronous array design (2-D placement) • Asynchronous array design (3-D placement) • Experimental results • Extensions • Conclusions

  12. H G2 site probe Problem Formulation (Synchronous Case) Synchronous Array Design (2-D Placement) Problem: • Minimize placement cost of Hamming graph H (vertices = probes, distance = Hamming) • On 2-dimensional grid graph G2 (N x N array, edges b/w distance 1 neighbors)

  13. H G2 probe 2-D Placement Lower Bound • Sum of Hamming distances to 4 closest neighbors minus weight of 4N heaviest arcs

  14. TSP+1-Threading Placement Hubbell 90’s • Find TSP tour/path over given probes w.r.t. Hamming distance • Thread TSP path in the grid row by row Hannenhalli,Hubbell,Lipshutz, Pevzner’02 • Place the probes according to 1-Threading • Further decreases total border by 20%

  15. 1 2 3 A A C A T A T A T G C G C G G Radix-sort the probes in lexicographical order Thread on the chip Lexicographical Sorting +1-Threading

  16. 1 2 2 3 Re-embed using optimal perfect matching 3 2 5 1 4 4 Select an independent (mutually nonadjacent) set of placed probes Total cost can only decrease or remain the same Matching Based Probe Placement Runtime: roughly proportional to square of independent set size

  17. Sliding Window Matching Iterate SlidingWindowMatching over the chip until improvement drops below 0.1% There is a trade-off between solution quality and size/overlap of windows

  18. Effect of Window Size on Solution Quality Increased window size/overlap decreases number of conflicts, but increases runtime

  19. Epitaxial Placement Algorithm • Simulates crystal-growth • Start with arbitrary probe placed at center • Maintain a best probe-candidate (i.e, a probe with min number of conflicts to the already placed neighbors) for each border site • Iteratively fill the border site with minimum increase in border length • - give priority to sites with more neighbors filled

  20. Tile- and Row- Epitaxial • Tile-epitaxial • Divide array into 100x100 tiles • Run Epitaxial within each tile • Take into account border of already placed tiles • Row-epitaxial • Place probes by a fast method, e.g., sort+1-thread • Re-place probes row by row, sequentially filling sites within a row • Assign to each site a probe with min number of conflicts among the unplaced probes from following K rows

  21. 2-D Placement Algorithm Comparison: Border Conflict

  22. 2-D Placement Algorithm Comparison: Runtime

  23. Outline • DNA probe arrays and unwanted illumination • Synchronous array design (2-D placement) • Asynchronous array design (3-D placement) • Experimental results • Extensions • Conclusions

  24. G2 H site probe Problem Formulation (Asynchronous Case) • Asynchronous synthesis: • Periodic nucleotide deposition sequence, e.g., (ACTG)p • Every probe grows asynchronously  Border length = Hamming distance between embedded probes • Asynchronous Array (3-D Placement) Design Problem: • Minimize placement cost of embedded-probe Hamming graph H (vertices=probes, distance = Hamming b/w embedded probes) • on 2-dimensional grid graph G2 (N x N array, edges b/w neighbors)

  25. Lower Bound • Sum of distances to 4 closest neighbors minus weight of 4N heaviest arcs • Distance between two probes of length p = 2p - |Longest Common Subsequence| • Non-tight bound: example with LB = 8 and best placement cost = 10 1 (c) AC GA 1 A A 1 1 1 1 G 1 G G CT TG Nucleotide deposition sequence S=ACTGA 1 T T T AC GA C C C CT TG A A Optimum placement

  26. Source Sink Optimal Probe Alignment • Find best alignment of probe wrt embedded neighbors • Dynamic Programming: • Source-sink paths corresponds to feasible embeddings • O[(probe length) x (deposition sequence length)] • Can be extended to simultaneous alignment of two adjacent probes (2x1) with increase by O(probe length) A C G A C G T T A C T

  27. 3-D Placement Flows • Simultaneous placement and alignment • asynchronous epitaxial (slow and low quality) • Synchronous placement followed by in-place probe alignment (analogous to standard for VLSI flow partition) • using previous DP to do in-place probe alignment • Synchronous placement followed by probe alignment with reshuffle (analogous to feedback loops in VLSI flows) • asynchronous sliding window matching

  28. Algorithms for In-Place Probe Alignment • Asynchronous re-embedding after 2-dim placement • Greedy Algorithm • While there exist probes to re-embed with gain • Optimally re-embed the probe with the largest gain • Batched greedy: speed-up by avoiding recalculations • Chessboard Algorithm • While there is gain • Re-embed probes in green sites • Re-embed probes in red sites

  29. Comparison of In-Place Probe Alignments • Post-placement LB = sum of distances to adjacent probes • Distance between two probes of length p = 2p - |LCS | • Useful for assessing quality of algorithms that change probe embeddings but do not change probe placement

  30. Outline • DNA probe arrays and unwanted illumination • Synchronous array design (2-D placement) • Asynchronous array design (3-D placement) • Experimental results • Extensions • Conclusions

  31. 3-D vs. 2-D Placement Results

  32. 3-D Placement Algorithm Comparison: Border Conflict

  33. 3-D Placement Algorithm Comparison: Runtime

  34. Outline • DNA probe arrays and unwanted illumination • Synchronous array design (2-D placement) • Asynchronous array design (3-D placement) • Experimental results • Extensions • Conclusions

  35. Practical Extensions • Distant-dependent border conflict weights • Take into account conflicts between 2-,3-hop neighbors rather than only immediate neighbors • Position-dependent border conflict weights • In alignment DP for two sequences take into account importance of conflicts in the middle of probes – alignment cost has weights on conflicts which depend on conflict position • Polymorphic probes • Chip contains SNP’s, e.g. pairs of probes different in a single position – they should be placed together and alignment DP should align them simultaneously

  36. Alignment DP for 2-SNP’s Optimal Embedding of A{C,T}T

  37. Simplified DNA Array Flow Probe Selection Mask Design: Placement & Embedding Mask Manufacturing Array Manufacturing Soft/Computational Domain Hybridization Experiment Analysis of Hybridization Intensities Hard/Biochemistry Domain Gene sequences, position of SNPs, etc.

  38. Enhanced DNA Array Design Flow Probe Selection Mask Design: Placement & Embedding

  39. Enhanced DNA Array Design Flow Probe Selection Probe Pools Mask Design: Placement & Embedding

  40. Enhanced DNA Array Design Flow Probe Selection Probe Pools Deposition Mask Design Mask Design: Placement & Embedding

  41. Enhanced DNA Array Design Flow Probe Selection Design Rules &Parameters Probe Pools Deposition Mask Design Mask Design: Placement & Embedding

  42. Enhanced DNA Array Design Flow Probe Selection Design Rules &Parameters Probe Pools Deposition Mask Design Conflict Map Mask Design: Placement & Embedding

  43. Enhanced DNA Array Design Flow Probe Selection Design Rules &Parameters Probe Pools Deposition Mask Design Test/Control Structure Design Conflict Map Mask Design: Placement & Embedding

  44. Summary • Contributions: • Epitaxial placement  reduces by extra 10% over the previously best known method • Asynchronous placement problem formulation • Postplacement improvement by extra 15.5-21.8% • Lower bounds • Scalable Placements (1000x1000 in 20min) • Ongoing work • Comparison on industrial benchmarks • Experiments with algorithms for extended formulations (SNPs, distance-dependent weights, etc.) • Future Directions • Design flow enhancements • Nucleotide deposition sequence design • Partitioning and integration for manufacturing cost reduction

  45. Thank you!

More Related