HCS 825 Class Project

HCS 825 Class Project By: Jianyang Liu

Temporal gene expression mapping Data analysis of large-scale temporal gene expression mapping of central nervous system development

Many kinds of Gene expression data: • RT – PCR (used for temporal gene expression of CNS---Central Nervous System) • … • cDNA micro-array

Data DescriptionTime point can be any other conditions…

Basic Idea • Genes with similar functions should have similar expression profiles. • Sometimes the expression profile can tell us about function • Many approaches to clustering data…

Euclidean distance • A measure for the difference between gene expression patterns. D(A,B) = Sum (Ai - Bi)2, i = 1 .. N • + Easy to calculate, intuitive • - Affected by amplitude

Correlation1) Pearson’s correlation • Based on actual value and specified to look for linear relationship • r = SSP / √ SSX * SSY • + Both positive and negative relationships Scale invariant • - Sensitive to outliers, not intuitive

2) Spearman’s correlation • It is based on the ranks of the items rather than on their actual value. • R = 1- 6∑D2 / N (N2 – 1) R = Rank correlation coefficient D = Difference between the ranks of two items N = The number of observation. • + Non-parametric • - Less sensitive

Comparison of Pearson and Spearman corr r = Pearsons correlation coefficient for {(xi, yi)} = 0.249 rS= Spearman’s correlation = Pearsons correlation coefficient for {(ai, bi)} = 0.786

Before clustering… • Thesecorrelation coefficients need to be converted as {d = (1- r)}, because clustering is based on a distance matrix. After conversion… • typical algorithm called neighbor joining: • Start with each gene in its own cluster • Pick the two closest clusters and join them • Repeat until only one cluster remains

Hierarchical Classification • There are three typical algorithms to decide on the distance • between two clusters: • Choose the shortest distance between pairs of genes in the • two clusters (nearest neighbor, single linkage) • Choose the average distance (UPGMA) • Choose the longest distance (complete linkage) • Where to go? (Dxy < Dxz, Dxy < Dyz) • Z • X ____.____Y

Comparison of Distance Computing Approaches hSingle Linkage Can identify long, thin cluster Can be subject to “chaining” hComplete Linkage Identifies tight, spherical clusters hAverage Linkage Compromise between single and complete linkage Less sensitive to outliers

What Somogyi group got Fitch Clustering Graph

Wave 1 Wave 2 Wave3 Wave4 Constant

SAS code for Euclidian Distances • data one; • title'Cluster with Euclidian Distances'; • infile'c:\one.csv' delimiter = ','firstobs = 2; • input E11 E13 E15 E18 E21 P0 P7 P14 A gene $; • procclusterdata=one method=ave ***change for different distance computing (sin, com)*** /*SAS doesEuclidian Distances by default. If data is distance, type = distance is needed*/ • out=two *nosquare*; • id gene; • var E11 E13 E15 E18 E21 P0 P7 P14 A; • procTreedata = two horizontalvpages=2hpages=3maxh=2.5; • goptionshtext=.2fontres=presentation htitle=2 ; • id gene; • quit;

No graphic option modified

Part of Euclidian Distance (squared) Graph

Pearson and Spearman corr analysis • data one; • infile'c:one.csv' delimiter = ','firstobs = 2; • input E11 E13 E15 E18 E21 P0 P7 P14 A gene $; • proctransposedata=one out=two; • id gene; • proccorrnoprintoutp=twop; ***Use outs for Speqrman corr*** • var keratin--DD632; • data three (drop=_type_ _name_ type=distance); • set twop; • gene=_name_; ***Help retain gene name on graph***; • if _type_ eq 'CORR'; • array numbs keratin--DD632; • doover numbs; • numbs = 1-numbs; ** Using 1- r for similarity ** • end; • procclusterdata=three(type=distance) method=ave out=four; • var keratin--DD632; ********* (specify distance) • id gene; • procTreedata = four; • id gene; • title‘Tree graph of pearson correlation’; • quit;

Graph with and without id gene

Pearson correlation cluster

Spearman correlation cluster

Why principle components? • Measuring more variables allows for a more exact model, but makes the correct model exponentially harder to find. • Theory: The goal of a PC analysis is to explain the variance-covariance structure of the variables with a linear combination of the variables.

Proposed Methodology • From the linear combinations, the factor loadings for each component help explain which variables are contributing the most to the variance. • Hypothesis: If a gene has different expression levels for each class, then that gene will have a moderate to high degree variability. Therefore, I’m interested in those genes with high factor loadings for each component.

Principle component analysis SAS code data one; title'Example of Proc princomp with Expression Data'; infile'c:\one.csv' delimiter = ','firstobs=2; input E11 E13 E15 E18 E21 P0 P7 P14 A gene $; procprincompout=prin; var E11 E13 E15 E18 E21 P0 P7 P14 A; procplot; plot prin2*prin1 $ gene / vpos=28; plot prin2*prin3 $ gene / vpos=28; run;

Eigenvalues matrix

Prin1 by Prin2 graph • Different Genes like Actin, NFL, NFM, NMDA1, etc… are detected as distant group. • ‚ • Prin2 ‚ • ‚ > cellubr > actin • 4 ˆ • ‚ > nestin • ‚ SC1 • ‚ cyclinBcy2lMK2 • 2 ˆ NT3 2ODC^2vCTO2 • ‚ keraIG672^vH2ACCO1 • ‚ PDGTH^^22>2CDD6>2SC2 • ‚ nPnSGI^253>I>CGAP>3EGFR • 0 ˆ nNNM5p7^6>>2GGRgGRg2 GAT1 • ‚nAChGh2*772232ts^2anACh^a7 > NFL • ‚ n1mIbavvv<vvv>oGAD67 • ‚ 5HMOGsmGvvvv> ACHE • -2 ˆ NmAGRbmGluR5 • ‚ • ‚ > GFAP > NMDA1 > NFM • ‚ • -4 ˆ > GRg1 • Šˆƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒˆƒ • -5 0 5 10 15 • Prin1

Prin2 by Prin3 graph • Similar genes as previous are detected as distant group. ‚ ‚ actin < > cellubr 4 ˆ ‚ > nestin ‚ SC1 ‚ cycli^>CMK2 2 ˆ CCPTN33 NT3 ‚ CRISC<>DIDs2 ‚ G67I808^G<44keratin ‚ EGInT<E256TIODR1 0 ˆ BNMsySmv*5vSCFn ‚ NFL < nAprNGHL249*7tcFoFR1 ‚ mGtDGvn32>Sv100beta ‚ mACGvD6<vvMO>cNFH -2 ˆ mGluR5ACHEGRb1 ‚ ‚ > NMDA1 > GFAP > NFM ‚ -4 ˆ > GRg1 Šƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒ -5.0 -2.5 0.0 2.5 5.0 7.5 Prin3

Part of Euclidian Clustering Graph (Squared) Deviating Genes like Actin, NFL, NFM, NMDA1, etc… are also detected as distant group.

Acknowledgement Many thanks to: • Dr. Francis SHELIA J • Pro. BISHOP, BERT LUDVIG

The End Thank You!

HCS 825 Class Project

HCS 825 Class Project

Presentation Transcript

Class Project

Class Project

CLASS Project

Class Project

Ph:510-825-7563

Class project

CLASS Project

CLASS Project

Class Project

Class Project

Class Project

Class Project

Class Project

$825 Million Plan

Class Project

HCS 465 Complete Class Assignments DQs

HCS 320 Complete Class Assignments DQs

Inconel 825 tubes

Runwal Bliss New Project Mumbai 825 – 4000 sq ft

decreto 825

C2010-825 Dumps Questions

Class Project