Gene Expression and DNA Chips

Gene Expression and DNA Chips Based on slides by Ron Shamir http://www.bio.davidson.edu/courses/genomics/chip/chip.html

Monitoring Gene Expression • Goal: Simultaneous measurement of expression levels of all genes in one experiment. • 2 fundamental biological assumptions: • Transcription level indicates genes’ regulation. • Only genes that contribute to organism fitness are expressed. => Detecting changes in a gene’s expression level provides clues on the function of its product

Factors controlling expression Post-translational modifications Alternative splicing RNA interference / degradation Chromatin remodeling Pre-mRNA Mature mRNA DNA protein transcription splicing translation

TGAGGC | | | | | | ACTCCG Hybridization • DNA double strands form by “gluing” of complementary single strands • Complementarity rule: A-T, G-C Use probe to identify if target contains a particular sequence

DNA chips / Microarrays • Perform thousands of hybridizations in a single experiment • Variants: • Oligonucleotide arrays • cDNA microarrays • Another distinction • Single channel • Dual channel • Allow global view of cellular processes: Monitor transcription levels of numerous/all genes simultaneously.

Oligonucleotide Arrays

A single feature on the chip http://www.affymetrix.com/corporate/media/image_library/

Hybridization on chip

Reading off a chip

Manufacturing a GeneChip Array

Many probes for a single gene

cDNA Microarrays

For Flash animation of the technology, see http://www.bio.davidson.edu/Courses/genomics/chip/chip.html

cDNA Microarrays (2)

Affymetrix oligo arrays vs cDNA microarrays • Short oligos • Low specificity • High density • Many probes per gene • Synthetic oligos • Absolute exp values • Yield problems • “turnkey” solutions • Price: +++ • Long oligos • High specificity • Lower density • One probe per gene • Probes: cDNAs • Relative exp values • Spotting problems • Custom solutions • Price : ++

… and other technologies • Agilent: • In situ synthesized arrays using ink-jet technology • 60-mer arrays: more specific than Affy’s • Allows custom design without expensive masks • Differential measurements: target vs reference • Nimblegen • Illumina

Comparative genomic hybridization (CGH) microarrays Cells of Interest Known DNA sequences Isolate genomicDNA Glass slide Reference sample Flourescently labeled (almost identical to gene expression arrays, but genomic DNA is hybridized instead of mRNA)

Chromosomes with varying copy number fluctuations from analysis of the tumor cell line SK-BR-3 as compared with the normal reference Robert Lucito et al. Genome Res. 2003; 13: 2291-2305

Single nucleotide polymorphism (SNP) detection SNP: single base sequence variation A/G Target sequence: GCCATGCANGAGTTACTACAGTAGC CGGTACGTTCTCAATGATGTCATCG PM + 4 Allele A MM +4 Allele A CGGTACGTTCTCTATGATGTCATCG CGGTACGTCCTCAATGATGTCATCG PM +4 Allele B CGGTACGTCCTCTATGATGTCATCG MM + 4 Allele B (Affymetrix Human Mapping 500K Array)

Remember Gene Transcription? Transcription Factors (proteins) RNA polymerase (protein) C T A A T G T . . . 5’ 3’ 3’ 5’ G A T T A C A . . . Transcription factors recognize transcription factor binding sites and bind to them, forming a complex. RNA polymerase binds the complex. (eukaryotes)

Using microarrays to measure protein-DNA interactions ChIP-chip: Chromatin immunoprecipitation chip (microarray) (antibodies bind transcription factor of interest) (TF-bound sequences hybridized to microarray) Simon et al., Cell 2001

Mapping transcription factor binding sites in yeast with ChIP-chip Harbison C., Gordon B., et al. Nature 2004

Dynamic role of transcription factors Harbison C., Gordon B., et al. Nature 2004

yfg1D yfg2D yfg3D Other microarray applications:Competitive growth assays Barcode CTAACTC TCGCGCA TCATAAT … Deletion Strain: Growth 6hrs in minimal media Rich media Harvest and label genomic DNA

Measuring relative fitness with a barcode microarray Oligo barcodes matching each strain are also spotted on a DNA microarray

Protein Microarrays • Protein microarrays are lagging behind DNA microarrays • Same idea but immobilized elements are proteins instead of nucleic acids • Number of elements (proteins) on current protein microarrays are limited (approx. 500) • Antibodies for high density microarrays have limitations (cross-reactivities) • Aptamers or engineered antibodies/proteins may be viable alternatives (Aptamers:RNAs that bind proteins with high specificity and affinity)

Applications Screening for: • Small molecule targets • Post-translational modifications • Protein-protein interactions • Protein-DNA interactions • Enzyme assays • Epitope mapping

Label all Proteins in Mixture High-throughput proteomic analysis Haab et al. Genome Biology 2000;1:1-22

MIX IL-1  IL-6 IL-10 VEGF marker protein cytokine Cytokine Specific Microarray (Microarray version of ELISA) Detection system BIOTINYLATED MAb ANTIGEN CAPTURE MAb

Tissue Microarrays • Printing on a slide tiny amounts of tissue • Array many patients in one slide (e.g. 500) • Process all at once (e.g. immunohistochemistry) • Works with archival tissue (paraffin blocks)

Tissue Microarray Alizadeh et al. J Pathol 2001;195:41-52

Normalization is important!! How Gene Expression Data Looks • Entries of the Raw Data matrix: • Ratio values • Absolute values • Distributions… • Row = gene’s expression pattern / • fingerprint vector • Column = experiment/condition’s • profile conditions Expression levels, “Raw Data” genes

From the Raw Data matrix we compute the similarity matrix S. Sij reflects the similarity of the expression patterns of gene i and gene j. Data Preprocessing conditions Expression levels, “Raw Data” genes • Input: Real-valued raw data matrix. • Compute the similarity matrix (dot product/correlation/…)

DNA chips: Applications • Deducing functions of unknown genes • (similar expression pattern similar function) • Identifying disease profiles • Deciphering regulatory mechanisms • (co-expression co-regulation). • Classification of biological conditions • Genotyping • Drug development • … Analysis requires clustering of genes/conditions.

Pearson Correlation Coefficient, r. Values are in [-1,1] interval • Gene expression over d experiments is a vector in Rd, e.g. for gene C: (0, 3, 3.58, 4, 3.58, 3) • Given two vectors X and Y that contain N elements, we calculate r as follows: Cho & Won, 2003

Intuition for Pearson Correlation Coefficient r(v1,v2) close to 1: v1, v2 highly correlated. r(v1,v2) close to -1: v1, v2 anti correlated. r(v1,v2) close to 0: v1, v2 not correlated.

Pearson Correlation and p-Values When entries in v1,v2 are distributed according to normal distribution, can assign (and efficiently compute) p-Values for a given result. These p-Values are determined by the Pearson correlation coefficient, r, and the dimension, d, of the vectors. For same r, vectors of higher dimension will be assigned more significant (smaller) p-Value.

Spearman Rank Order Coefficient(a close relative of Pearson, non parametric) • Replace each entry xi by its rank in vector x. • Then compute Pearson correlation coefficients of rank vectors. • Example: X = Gene C = (0, 3.00, 3.41, 4, 3.58, 3.01) Y = Gene D = (0, 1.51, 2.00, 2.32, 1.58, 1) • Ranks(X)= (1,2,4,6,5,3) • Ranks(Y)= (1,3,5,6,4,2) • Ties should be taken care of, but: (1) rare (2) can randomize (small effect)

From Pearson Correlation Coefficients to a Gene Network • Compute correlation coefficient for all pairs of genes (what about missing data?) • Choose p-Value threshold. • Put an edge between gene i and gene j iff p-Value exceeds threshold.

Clustering: Objective • Group elements (genes) to clusters satisfying: • Homogeneity: Elements inside a cluster are highly similar to each other. • Separation: Elements from different clusters have low similarity to each other.

The Clustering Bazaar

Hierarchical clustering

An Alternative View • Form a tree-hierarchy of the input elements satisfying: • More similar elements are placed closer along the tree. • Or: Tree distances reflect element similarity • Note: No explicit partition into clusters.

Hierarchical Representations

Hierarchical Representations (2) Dendrogram: rooted tree, usually binary, and all root-leaf distances are equal 5.0 4.5 2.8 1 2 3 4 1 2 3 4

Neighbor Joining Algorithm Saitou & Nei, 87 • Input: Distance matrix Dij; Initially eachelement is a cluster. • Find min element Drs in D; merge clusters r,s • Delete elts. r,s, add new elt. t with Dit=Dti=(Dir+ Dis – Drs)/2 • Repeat • Present the hierarchy as a tree with similar elements near each other

Hierarchical Clustering: Average LinkageSokal & Michener 58, Lance & Williams 67 • Input: Distance matrix Dij; Initially eachelement is a cluster. nr- size of cluster r • Find min element Drs in D; merge clusters r,s • Delete elts. r,s, add new elt. t with Dit=Dti=nr/(nr+ns)•Dir+ ns/(nr+ns) •Dis • Repeat

A General FrameworkLance & Williams 67 • Input: Distance matrix Dij; Initially eachelement is a cluster. • Find min element Drs in D, merge clusters r,s • Delete elts. r,s, add new elt. t with Dit=Dti=rDir+ sDis + |Dir-Dis|

Gene Expression and DNA Chips

Gene Expression and DNA Chips

Presentation Transcript

DNA and Gene Expression

DNA, RNA and Gene Expression

DNA, Gene Expression, and Biotechnology

“Chunky and Chips” Gene Expression Group

Chapter 5: DNA, Gene Expression, and Biotechnology

Chapter 5: DNA, Gene Expression, and Biotechnology

Introduction to Gene Chips and Microarray Expression Data

Dna , Protein Synthesis, and gene expression

DNA Chips

Introduction to Gene Chips and Microarray Expression Data

Protein/DNA Interactions and Gene Expression

Gene Expression, Inheritance Patterns, and DNA Technology

Gene expression and DNA microarrays

Chapter 5: DNA, Gene Expression, and Biotechnology

Chapter 5: DNA, Gene Expression, and Biotechnology

Gene expression DNA  RNA  Protein

DNA and Gene Expression

Intelligent DNA Chips: Logical Operation of Gene Expression Profiles on DNA Computers

Intelligent DNA Chips: Logical Operation of Gene Expression Profiles on DNA Computers

DNA chips