1 / 22

Genomic Signal Processing

Genomic Signal Processing. Dr. C.Q. Chang Dept. of EEE. Outline. Basic Genomics Signal Processing for Genomic Sequences Signal Processing for Gene Expression Resources and Co-operations Challenges and Future Work. Basic Genomics. Genome.

pia
Télécharger la présentation

Genomic Signal Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE

  2. Outline • Basic Genomics • Signal Processing for Genomic Sequences • Signal Processing for Gene Expression • Resources and Co-operations • Challenges and Future Work

  3. Basic Genomics

  4. Genome • Every human cell contains 6 feet of double stranded (ds) DNA • This DNA has 3,000,000,000 base pairs representing 50,000-100,000 genes • This DNA contains our complete genetic code or genome • DNA regulates all cell functions including response to disease, aging and development • Gene expression pattern: snapshot of DNA in a cell • Gene expression profile: DNA mutation or polymorphism over time • Genetic pathways: changes in genetic code accompanying metabolic and functional changes, e.g. disease or aging.

  5. DNA transcription mRNA translation Protein Gene: protein-coding DNA CCTGAGCCAACTATTGATGAA CCUGAGCCAACUAUUGAUGAA PEPTIDE

  6. In more detail (color ~state)

  7. Signal Processing for Genomic Sequences

  8. The Data Set

  9. The Problem • Genomic information is digital letters A, T, C and G • Signal processing deals with numerical sequences, character strings have to be mapped into one or more numerical sequences • Identification of protein coding regions • Prediction of whether or not a given DNA segment is a part of a protein coding region • Prediction of the proper reading frame • Comparing to traditional methods, signal processing methods are much quicker, and can be even more accurate in some cases.

  10. Sequence to signal mapping

  11. Signal Analysis • Spectral analysis (Fourier transform, periodogram) • Spectrogram • Wavelet analysis • HMT: wavelet-based Hidden Markov Tree • Spectral envelope (using optimal string to numerical value mapping)

  12. Spectral envelope of the BNRF1 gene from the Epstein-Barr virus • 1st section (1000bp), (b) 2nd section (1000bp), • (c) 3rd section (1000bp), (d) 4th section (954bp) • Conjecture: the 4th quarter is actually non-coding

  13. Signal Processing for Gene Expression

  14. Biological Question Data Analysis & Modeling Samplepreparation Microarray Life Cycle MicroarrayDetection Microarray Reaction Taken from Schena & Davis

  15. excitation scanning cDNA clones (probes) laser 2 laser 1 PCR product amplification purification emission printing mRNAtarget) overlay images and normalise 0.1nl/spot Hybridise target to microarray microarray analysis

  16. Image Segmentation • Simple way: fixed circle method • Advanced: fast marching level set segmentation Advanced Fixed circle

  17. Clustering and filtering methods Principal approaches: • Hierarchical clustering (kdb trees, CART, gene shaving) • K-means clustering • Self organizing (Kohonen) maps • Vector support machines • Gene Filtering via Multiobjective Optimization • Independent Component Analysis (ICA) Validation approaches: • Significance analysis of microarrays (SAM) • Bootstrapping cluster analysis • Leave-one-out cross-validation • Replication (additional gene chip experiments, quantitative PCR)

  18. ICA for B-cell lymphoma data Data: 96 samples of normal and malignant lymphocytes. Results: scatter-plotting of 12 independent components Comparison: close related to results of hierarchical clustering

  19. Resources and Co-operations Resources: databases on the internet such as • GeneBank • ProteinBank • Some small databases of microarray data Co-operations in need: • First hand microarray data • Biological experiment for validation

  20. Challenges and Future Work • Genomic signal processing opens a new signal processing frontier • Sequence analysis: symbolic or categorical signal, classical signal processing methods are not directly applicable • Increasingly high dimensionality of genetic data sets and the complexity involved call for fast and high throughput implementations of genomic signal processing algorithms • Future work: spectral analysis of DNA sequence and data clustering of microarray data. Modify classical signal processing methods, and develop new ones.

More Related