1 / 19

Presented by: Deepti Malhotra Biological Sequence Analysis

Selection of optimal oligonucleotide probes for microarrays using multiple criteria, global alignment and parameter estimation Xingyuan Li, Zhili He1 and Jizhong Zhou1. 6114–6123 Nucleic Acids Research, 2005, Vol. 33, No. 19. Presented by: Deepti Malhotra Biological Sequence Analysis.

hakan
Télécharger la présentation

Presented by: Deepti Malhotra Biological Sequence Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Selection of optimal oligonucleotide probes formicroarrays using multiple criteria, globalalignment and parameter estimationXingyuan Li, Zhili He1 and Jizhong Zhou1.6114–6123 Nucleic Acids Research, 2005, Vol. 33, No. 19. Presented by: Deepti Malhotra Biological Sequence Analysis

  2. MICROARRAY - What is it? Analysis of the relative expression level of hundreds or thousands of genes simultaneously by determining the amount of messenger RNA (mRNA) that is present in a single experiment. Labeled Target Probe (gene of interest) matrix

  3. cDNA Microarray: NIEHS Tox Chip Nuwaysir E, et al., Molecular Carcinogenesis 24:153-159 (1999)

  4. * * * * * GeneChip® Probe Arrays Hybridized Probe Cell GeneChipProbe Array Single stranded, fluorescently labeled DNA target Oligonucleotide probe 24µm Each probe cell or feature contains millions of copies of a specific oligonucleotide probe 1.28cm Over 200,000 different probes complementary to genetic information of interest Courtesy: Affymetrix Image of Hybridized Probe Array

  5. * * * * * GeneChipProbe Arrays GeneChipProbe Array Probe Pair Probe Set PM MM Hybridized Probe Cell Probe Cell (feature) Image of Hybridized Probe Array

  6. Multiple Specific Probe Pairs per Gene (25-mers) (25-mer) nature genetics supplement • volume 21 • january 1999

  7. What’s the complexity? • More genes • More information per experiment Feature Size Features/Chip Genes/Chip* 100 µm 50 µm 20 µm 10 µm 16,384 65,538 409,600 1,638,400 409 1,638 10,240 40,960 * Using 20 probe pairs per gene

  8. Why So Many Probe Pairs? Probe Pairs • Point Mutations, Deletions, or Insertions will not effect the detection of the gene of interest. • Bioinformatics algorithm will account for expression across 11 different probe pairs to calculate expression of gene. Gene of Interest

  9. Redundancy of probe synthesis • Multiple Indicators for the Same Gene Ensures: • Quantitative accuracy • High sensitivity • Indicators of oligonucleotide Specificity: • Sequence identity to non-targets • Continuous stretch to non-targets • Free energy of Binding to the non-targets All these 3 criteria important for the selection of optimal probes

  10. Problems with probe synthesis – addressed by CommOligo • Representation of each sequence in a genome wide search • Liberal cut-offs and fewer non specifics • Generally use BLAST for local alignment or Suffix arrays for exact string search • Homologous sequence studies versus whole genome arrays  Applicability to experiments • Experimental threshold determination • Inherent variability

  11. Series of filters checking Oligos Cut offs based on CommOligo_PE Parameters and thresholds are user adjustable Iterative probe optimization All 3 criteria’s included

  12. Sequence alignment strategy Dynamic Programming Matrix • Uses bit scores from Myers algorithm during identity calculation • An alignment corresponds to the path from bottom row with high identity/ score to the top row. • Traverse path/ last path

  13. Best alignment path search

  14. Final optimization and scoring • Quality score is calculated as: • CommOligo_PE used to determine the thresholds and the probes are optimized for maximum coverage and correctness by calculating: • The goal is to maximize NPV and C • Cross validation by dividing into subsets of 10 randomly and using one as a test calibration is run 10 times.

  15. Results Training sets:

  16. Genome wide analysis Homologous sequence searches

  17. Take home message • CommOligo works well with Homologous sequences  3 stringent criteria's  cDNA • Still works well at the same thresholds for genome wide searches  Oligochip • Actual hybridization data is used • Better identity and minimum energy filters • Optimal Tm for the hybridization reaction is based on the oligos selected after having passed all the filters and not all the possible oligos • Iterative threshold optimization

More Related