Sequencing By Hybridization – A Simulation Study of Performance on Genomic Sequences

Failure mechanism of SBH Methods Interleaved repeats: • Simulation on genomic data was done as follows: • Sequences were taken from random positions in the genomic sequence. • Unless specified otherwise, genomic origin was chromosome II of S. cerevisiae. • Random data was generated by one of the following methods: • Uniform Independent • A Markov Model whose parameters were derived from the genomic data. • All signatures are assumed to be perfect. a α b β c α d β e Triple repeat: a α b α c α d Conclusions Purpose • Simulation of SBH using uniformly drawn sequences does not represent the genomic reality. • Using Markov Models, even as simple as order 0, greatly improves the relevance of simulation studies designed to evaluate SBH and other measurement technologies. • Quantify performance of SBH on genomic sequences. • Propose and evaluate Markov Models for simulating data. Consecutive repeats: Repeats cause failure a α* b β* c Genomic data is more repetitive than random data and therefore will fail more often. k=8 k=10 k=12 Frequency of at least at least one interleaved repeat or one triple repeat in a sequence, comparing genomic (yeast) and randomly generated sequences. The bias in the frequency of triple repeats in genomic sequences may be attributed to regulatory motifs. Rate of unique reconstruction by complete k-mer arrays comparing sequences of various genomic origins to sequences derived randomly. k=8 k=10 k=12 Rate of unique reconstruction by complete k-mer arrays comparing genomic sequences to sequences derived randomly using Markov Models of various degrees. Estimated average number of consistent sequences reconstructed from complete k-mer arrays, comparing genomic and random sequences. Sequencing By Hybridization – A Simulation Study of Performance on Genomic Sequences Doron Lipson, Ziv Nevo, Ari Frank, Dolev Dotan, Zohar Yakhini Computer Science Department, Technion, Haifa, Israel GATACAACGTTAAGATTTAGTATTG…CACCTTAGATTTTTGTGAGTACTC…ATTAGCTTCGATTAAAAGTCGT…

Sequencing By Hybridization – A Simulation Study of Performance on Genomic Sequences

Sequencing By Hybridization – A Simulation Study of Performance on Genomic Sequences

Presentation Transcript

Introduction to Genomic Sequencing Assembly Annotation Diego Martinez

Finding approximate palindromes in genomic sequences.

Finding Regulatory Signals in Genomic Sequences

DNA sequencing: methods

DNA Sequencing

Comparison of Comparative Genomic Hybridization Technologies Across Microarray Platforms

Instructional Sequencing

V8 – Genomics data

Locus Reference Genomic (LRG) Sequences

DNA Sequencing

Southern Blotting Lab Results

Sequencing the MTP of Ae . tauschii

Interval Scores for Quality Annotated CGH Data

The University of Texas at Austin, Genomic Sequencing and Analysis Facility or for short

Update on the genomic mapping and sequencing initiative for banana and plantain

Genomic DNA Cloning

Progress in sequencing chromosome 6

Inferring Genomic Sequences

Sigma and Pi Bonding

Rapid Global Alignments

Detection of Genomic Rearrangements in K562 cells using Paired End Sequencing