1 / 46

Consolidating Software Tools for DNA Microarray Design and Manufacturing

Consolidating Software Tools for DNA Microarray Design and Manufacturing. Mourad Atlas Nisar Hundewale Ludmila Perelygina Alex Zelikovsky. Agenda. Introduction DNA Array Flow (DAF) Benchmarks: Herpes B virus Experiments and Results Conclusion and Future Work. Motivation.

jayden
Télécharger la présentation

Consolidating Software Tools for DNA Microarray Design and Manufacturing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Consolidating Software Tools for DNA Microarray Design and Manufacturing Mourad Atlas Nisar Hundewale Ludmila Perelygina Alex Zelikovsky

  2. Agenda Introduction DNA Array Flow (DAF) Benchmarks: Herpes B virus Experiments and Results Conclusion and Future Work

  3. Motivation • Microarrays provide a tool for answering a wide variety of questions about the dynamics of cells: • In which cells is each gene active? • Under what environmental conditions is each gene active? • How does the activity level of a gene change under different conditions? • Stage of a cell cycle? • Environmental conditions? • Diseases? • What genes seem to be regulated together?

  4. Genome ID Reading genomic data Probe selection Physical design Mask and array manufacturing Hybridization experiment Analysis of hybridization intensities DNA Array Flow • Downloading genome sequence and extracting ORFs in FASTA format • For each gene G, find probes that hybridize to G at a given TM but do not hybridize to any other gene at that TM • Probe placement: determine for each probe a site on the array 2-D surface for it to be placed or synthesized. Probe embeddings: which embeds each probe into the deposition sequence • Photolithographic process used in sequence masking • Each probe bindsto its target using the complementary rules. • can be measured by a laser scanner and converted to a quantitative value that can be read

  5. Genome ID Reading genomic data Probe selection Physical design Mask and array manufacturing Hybridization experiment Analysis of hybridization intensities Reading genomic data

  6. Input the genome ID Download genome sequence Genome ID Downloading genome sequence from GenBank Bioperl ORF Extraction from genome GeneMark (Bordovsky GaTech) Or: ORF Finder Extracting Extra ORFs: ( ) ORF Parser: ORFs in FASTA format Probe selection Reading Genomic Data

  7. Genome ID Downloading genome sequence from GenBank Bioperl ORF Extraction from genome GeneMark (Bordovsky GaTech) Or: ORF Finder Extracting Extra ORFs: ( ) ORF Parser ORF Parser: ORFs in FASTA format Probe selection ORF Extraction

  8. What is ORF? • Open reading frame (ORF) is a subsequence of DNA that could potentially be transcribed into messenger RNA (mRNA) • Because of the differences between prokaryotic and eukaryotic transcription systems there are two types of ORF: • Prokaryotes: start and stop codon • Eukaryotic: stop codon

  9. ORF Parser Genome ID Downloading genome sequence from GenBank Bioperl ORF Extraction from genome GeneMark (Bordovsky GaTech) Or: ORF Finder Extracting Extra ORF: ( ) ORF Parser ORFs in FASTA format Probe selection

  10. Genome ID Reading genomic data Probe selection Physical design Mask and array manufacturing Hybridization experiment Analysis of hybridization intensities DNA Array Flow

  11. Reading genomic data ORF preprocessing Promide Choosing best melting temperature Ocand :find all candidate for given temperature Pools of probes Physical design Probe Selection

  12. Probe Selection Requirements • Homogeneity: • Ensure that the probes can bind to its target at the temperature of the experiment • Sensitivity: • Avoid self-hybridization: ensure that the probes will not form a secondary structure. (Such a structure will prevent the probes from binding to its target) • Specificity: • – the probes stay unique even after a few bases are changed • – Probe must hybridize to one particular gene: For each gene G, find probes that: • hybridize to G at a given temperature • do not hybridize to any other gene at that Temperature • – Avoid cross-hybridization

  13. Possible solutions: Li and Stormo 2001 Kaderali and Schliep 2002 Rahmann (Promide) 2003 They use the same data structure: Suffix array Promide handles truly large scale datasets in a reasonable amount of time Human GeneNest clusters: in 50 hours Neurospora Crassa: Promide: few hours Li and Stormo: 1 week Why Promide?

  14. ORF preprocessing Classes of Sequences: • A Master sequence is a sequence we wish to design oligos for. • A Background sequence is a sequence against which specificity is checked. • Every Master is also a Background

  15. Choosing best melting temperature • For each candidate oligo (substring) of a Master, do: – Check side constraints – Compute specificity: Optimal TM- alignment with every Background collection • Compute Matching Statistics: mims • Oligos Candidate Selection: ocand

  16. Genome ID Reading genomic data Probe selection Physical design Mask and array manufacturing Hybridization experiment Analysis of hybridization intensities Mask and array manufacturing

  17. Selectively expose array sites to light arrays are synthesized to a wafer Flush chip’s surface with solution of protected A, C, G, T Repeat last two steps until desired probes are synthesized Mask and Array manufacturing

  18. A  Mask 1 A A A A A Mask and Array manufacturing A 3×3 array CG AC G AC ACG AG AG C CG Nucleotide Deposition Sequence ACG array probes

  19. C  Mask 2 A C C A A C A C A C C Array manufacturing A 3×3 array CG AC G AC ACG AG AG C CG Nucleotide Deposition Sequence ACG array probes

  20. G  Mask 3 A Nucleotide DepositionSequence defines the order of nucleotide deposition A Probe Embedding specifies the steps it uses in the nucleotide sequence to get synthesized A C G G C A A C A G C G A C G G C Array manufacturing A 3×3 array CG AC G AC ACG AG AG C CG Nucleotide Deposition Sequence ACG array probes

  21. A  Mask 1 Border Reduction Border = 8  Unwanted illumination  Chip’s yield A A A A A BorderMinimizationChallenges A 3×3 array CG AC G AC ACG AG AG C CG Nucleotide Deposition Sequence ACG array probes

  22. Problem: Diffraction, internal reflection, scattering, internal illumination Occurs at sites near to intentionally exposed sites Lamp Mask Reduce Border  Increase yield  Reduce cost Design objective: Minimize the border Array Border Minimization Challenges Intentionally exposed sites Border Unwanted illumination

  23. Genome ID Reading genomic data Probe selection Physical design Mask and array manufacturing Hybridization experiment Analysis of hybridization intensities Physical design

  24. Probe Selection Deposition sequence design Test control 2D-probe placement 3D-probe embedding Mask and array manufacturing Physical Design

  25. Physical Design • Probe Placement • Similar probes should be placed close together • Constructive placement • Placement improvement operators • Probe Embedding • Degrees of freedom (DOF) in probe embedding • DOF exploitation for border conflict reduction

  26. T T Optimize A T C C T A T C T C Border = 4 Border Reduction with Probe Placement Probe Placement • Similar probes should be placed close together T T T G C A A Deposition Sequence T T G C C C A A T T Probes T C C Border = 8

  27. Border Reduction in Probe Embedding Probe Embedding T T G C A A A Deposition Sequence T T T T G Border = 4 Border = 2 C C C A A T Probes A T T C T C Synchronous embedding: deposit one nucleotide in each group of “ACGT” Asynchronous embedding: no restriction

  28. Physical Design Problem Give:n2 probes Find: Placement of probes in n x n sites Embedding of the probes Minimize: Total border cost

  29. H G2 site probe Problem formulation for placement • 2-dim (synchronous) Array Design Problem: • Minimize placement cost of Hamming graph H • (vertices=probes, distance = Hamming) Hamming Distance (P1, P2) = number of nucleotides which are different from its counterpart= border (synchronous embedding) • on 2-dim grid graph G2 (N x N array, edges b/w neighbors)

  30. 1 2 3 25 A A A A C C A A A A T T A A A A T T A A T T T T G G C C G G C C C C G G G G G G Placement Objective: Minimize Border Sorting the probes order reduces discrepancies between adjacent probes 1 2 3 25 Probe 1 Probe 2 Probe 3 Probe 5 Probe 4 Sort the probes in lexicographical order Problem: How to place the 1-D ordering of probes onto the 2-D chip?

  31. TSP+1-Threading Placement • Hubbel 90’s • Find TSP tour/path over given probes with Hamming distance • Place in the grid following TSP • Adjacent probes are similar • Hannenhalli,Hubbel,Lipshutz, Pevzner’02: • Place the probes according to 1-Threading • further decreases total border by 20%

  32. A A C A A T A A T A T T G C G C C G G G Placement By Threading 1 2 3 25 2 3 Probe 1 Probe 2 5 4 1 Probe 3 Probe 4 Probe 5 Thread on the chip

  33. Row-Epitaxial Placement Improvement (i, j) Switch For each site position (i, j): Move the best probe to (i, j) and lock it in this position Find the best probe which minimize border Row placement = sort + thread + row epitaxial

  34. T G G G C A T T G G C A T T T G C C C C A Synchronous Embedding Asynchronous Embedding Another Embedding Probe Embedding G Group T C Deposition Sequence Hypothetical Probe

  35. A A G T A A G G G T T T G G G A A Synchronous Embedding ASAP Embedding Embedding Determines BorderConflicts G T C A G T C Deposition Sequence A G T Probes C A A A G G T T T G C G A A

  36. Problem formulation • 2-dim (synchronous) Array Design Problem: • Minimize placement cost of Hamming graph H • (vertices=probes, distance = Hamming) • on 2-dim grid graph G2 (N x N array, edges b/w neighbors) • 3-dim (asynchronous) Array Design Problem: • Minimize cost of placement and embedding of Hamming graph H’ • (vertices=probes, distance = Hamming b/w embedded probes) • on 2-dim grid graph G2 (N x N array, edges b/w neighbors)

  37. Post-placement Optimization Methods • Asynchronous re-embedding after 2-dim placement • Greedy Algorithm • While there exist probes to re-embed with gain • Optimally re-embed the probe with the largest gain • Batched greedy: speed-up by avoiding recalculations • Chessboard Algorithm • While there there is gain • Re-embed probes in red sites • Re-embed probes in green sites

  38. Genome ID Reading genomic data Probe selection Physical design Mask and array manufacturing Hybridization experiment Analysis of hybridization intensities Analysis of hybridization intensities

  39. Experimental Study • In our experiment we have considered the following parameters and we measured the results for different values of these parameters. • Melting Temperature: • We choose the temperatures 60C and 65C as best melting temperatures for our DNA probe array. • Number of Candidates: • We experimented with different values of K (number of candidates) for each pools of probes: 1 and 2. • Chip Size: • We ran our Experiments with 2 different chip sizes. We experimented with 50x50 and 60x60. • We give the number of conflict and runtime for each algorithm for the Herpes B virus and simulated data

  40. Genome ID Bioperl Sequence in FASTA format GenMark ORF Extraction Pools of probes in Chip format Placements: TSP Select Probes: Pool pf Probes Read Pool/ Genpool ORF in Fasta format ORF Parser Promide Probe Parser # of Conflicts-CPU Time for all Algorithms Placements: Sorting Chip Placements: Row placement Embedding: Chessboard Experiments Outline

  41. Conclusion and Future work • Conclusion: • Our experiments show: • The genomic data follow the pattern predicted by simulated data • In case of Herpes B virus, like simulated data, increasing number of candidates per probe (k) decreases number of border conflicts during the probe placement algorithms • The number of border conflicts is several times smaller than for simulated data • The trade-off between number of border conflicts and the CPU time taken for the various algorithms that are defined in the physical design • We give a concatenate software solution for the entire DNA array flow • We explore all steps in a single automated software suite of tools • Future work: • The entire software suite be made available through web services • Users can enter name of organism or ID and with an option of choosing to set the required parameters the suite will produce the DNA probe micro-array chip layout

  42. Thank you

More Related