1 / 58

I n t r o d u ct i on t o M i c r o a r r ay D a t a A na l ys i s

I n t r o d u ct i on t o M i c r o a r r ay D a t a A na l ys i s. O u t l i n e. I n t r o d u c ti o n M i c r o a r rays T e c h n o l o g y T y p e s a n d Us es o f M ic r o a r rays M i c r o a r rays f o r t h e S t u dy o f G e n e E x p r e s si o n F a b r i c a ti o n

brandy
Télécharger la présentation

I n t r o d u ct i on t o M i c r o a r r ay D a t a A na l ys i s

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. IntroductiontoMicroarrayDataAnalysis

  2. Outline • Introduction • MicroarraysTechnology • TypesandUsesofMicroarrays • MicroarraysfortheStudyofGeneExpression • Fabrication • Spottedmicroarrays • 2.Oligonucletidemicroarrays • ExperimentswithMicroarrays • Flowchartofaexperimentwithmicroarrays • SoftwareforMicroarrayDataAnalysis

  3. Introduction(1) Briefreviewofmolecularbiology... Mostlifeformsaremadeofcells.Eachindividualhasaverylargeindefinitenumberofcells.  Eachcell containschromosomes (e.g. human cells  contain23pairs ofchromosomes). These organized structuresofDNAandinheritedinformation. proteins arethecarriersof AchromosomeisasinglepieceofcoiledDNAcontainingmanygenes,regulatoryelementsandothernucleotidesequences. 

  4. Introduction(2) Whatelse? ThegenomeofanorganismisinscribedinDNAorRNAinsomevirus  AgeneisthebasicunitofheredityinalivingorganismandistheportionoftheDNAthatcodesforaproteinoranRNA Eachprotein-codinggeneisagenetranscribedintoRNAinsomemoleculesandinturnmRNAistranslatedintoatleastoneproteininsomecells  

  5. Introduction(3) TheCentralDogmaofMolecularBiology • Information flowfromDNAtoRNAto proteinoccursinfourstages Replication TheDNAreplicatesitsinformationinaprocessthatinvolvesmanyenzymes Transcription TheDNAcodesfortheproductionofmessengerRNA(mRNA) Splicing Ineucaryoticcells,themRNAisprocessedandmigratesfromthenucleustothecytoplasm. Translation MessengerRNAcarriescodedinformationtoribosomes.Theribosomesreadthisinformationanduseitforproteinsynthesis.

  6. Introduction(4) TechniquesinMolecularBiology • MolecularbiologyhasdevelopedmultipletechniquestomeasurelevelsofRNA,DNA,proteinsormetabolites,suchas – – – – – SouthernBlotNorthernBlotDifferentialdisplaySAGE … • Post-genomicseraisperformandtoanalyze characterized byitscapabilityto data sets from large-scale experimentssimultaneously

  7. Introduction(5) Theparadigmshift • Withthesameresourcesweobtainapicturewithlowerresolutionbutwithaviewofthewholecontext vs Basedon“Theparadigmshift”slidefromJ.Dopazo(CNIO)

  8. Introduction(6) Todrawananalogywithpre-genomicsera • Biologyusedto“spy”ongeneseverythingindeepandindividually(i.e.genebygene)

  9. Introduction(7) Todrawananalogywithpost-genomicsera Nowadays,alotofgenescanbe“spied”atthesametime...but...  …Howcanwesplitthewheatfromthechaff?

  10. MicroarraysTechnology(1) Broadlyspeaking... Microarraysareavarietyofplatforms  inwhichhighdensityassaysperformedinparallelonasupport. aresolid PublicationsinPubMedwithmicroarraywordinthetitle 10911080 1000 1000986 988 Thistechnologyhaschangedtheway  920 biologistsapproach problemsandnewchallengesfor hugeeach 800 introducesstatisticiansquantityofexperiment numberofpublications 747 becauseofthedatageneratedin 600 544 400 Theyhavebeenusedforallkindsofbiologicalproblems  259 200 171 83 24 5 0 Theliteraturecontainsalmost8000papersusingmicroarraywordinthetitle  1998 2000 2002 2004 2006 2008 2010 year

  11. MicroarraysTechnology(2) Thebiologicalprincipleofmicroarraysinvolvedin... • ItisthesameonethatallowsDNAdoublehelicesto providethebasisforheredity • SequencesofDNAorRNAmoleculescontainingcomplementarybasepairshaveanaturaltendencytobindtogether. ...AAAAAGCTAGTCGATGCTAG... ...TTTTTCGATCAGCTACGATC... • IfweknowthemRNAsequence,wecanbuildaprobeforitusingthecomplementarysequence.

  12. MicroarraysTechnology(3) But...Whatisamicroarray? Itconsistofalargeset(thousandstotenofthousands)ofspecificsequences(knownasprobesorfeatures)attachedinorder(array)tomicroscopicspotsonasolidsupport(nylonorsiliconglass,...).  ... moleculesample1 moleculesample2 moleculesampler Aprobe(thatcanbeagene,aprotein,ametabolite,...)isusedtohybridizeamoleculeofanucleicacidsample(calledtarget)underhigh-stringencyconditions.  probeprobeprobeprobe gene1gene2gene3gene4 1 2 3 4 probeprobeprobeprobe5678 spots probeprobeprobeprobe9101112 Probe-Target determinetherelative hybridizationis usedtoof  abundance ... nucleicacidsequencesinthetargets(e.g. todeterminesequences,to detect variationsingenesequences,levels,genemapping,...). expression probek-3 probeprobeprobek-2k-1k Microarray

  13. TypesandUsesofMicroarrays(1) Typesofmicroarrays Microarraysspatiallyarrangedonasolidsurfacearemostwidelyused.`  Beadarraysarecreatedby  • eitherimpregnatingbeadswithdifferentconcentrationsoffluorescentdye, • orsometypeofbarcodingtechnology. Thebeadsareaddressableandusedtobindingeventsthatoccurontheirsurface. identify specific 

  14. TypesandUsesofMicroarrays(2) UsesofMicroarrays(1) Expressionanalysis  –TheprocessofmeasuringgeneexpressionviaRNA(orcDNAafterreversetranscription)iscalledexpressionanalysisorexpressionprofiling. Inthisexperimentstheexpressionlevelsofthousandsofgenesaresimultaneouslymonitoredtostudytheeffectsofcertaintreatments,diseases,anddevelopmentalstagesongeneexpression. – ComparativeGenomicHybridization  – Comparativegenomichybridization(CGH)orChromosomalMicroarrayAnalysis(CMA)isusedfortheanalysisofcopynumberchanges(increasesordecreases)oftheimportantchromosomalfragmentsharboringgenesinvolvedindiseases. Mutationanalysis  –AsinglebasedifferencebetweentwosequencesisknownasSingleNucleotidePolymorphism(SNP)anddetectingthemisknownasSNPdetection. WithgDNAthiskindofarraystrytodetectgenesthatmightdifferfromeachotherbyaslessasasinglenucleotidebase. –

  15. TypesandUsesofMicroarrays(2) UsesofMicroarrays(2) ProteinArray TissueArray CGHArrays SNPArrayAffymetrix CNVArrayIllumina ExpressionArrays cDNANylonMembraneArray GeneChipAffymetrixArray cDNAAgilentArray

  16. TypesandUsesofMicroarrays(3) ApplicationofMicroarrays Genediscovery  Identificationofnewgenes,knowabouttheirfunctioningandexpressionlevelsunderdifferentconditions. Molecularclassificationofcomplexdiseases  Toclassifythetypesofcanceronthebasisofthepatternsofgeneactivityinthetumorcells,todevelopmoreeffectivedrugs. Drugdiscovery  Comparativeanalysisofthegenesfromadiseasedandanormalcellhelptheidentificationofthebiochemicalconstitutionoftheproteinssynthesizedbythediseasedgenes.Thisinformationcanbeusedtosynthesizedrugsthatcombatwiththeseproteinsandreducetheireffect. Toxicologicalresearch  Microarraytechnologyprovidesarobustplatformfortheresearchoftheimpactoftoxinsonthecellsandtheirpassingontotheprogeny.

  17. MicroarraysfortheStudyofGeneExpression(1) Whatisthegeneexpression? • Thegeneexpressionisthepresenceofthegeneproductsofagene,intheformofmRNA(orprotein),inacell • Toputitstraight:Sincecellscontainthesamegeneticinformation,whatmakesdifferentbraincellsfromheartcellsisthegeneexpression.

  18. MicroarraysfortheStudyofGeneExpression(2) FindingDifferentiallyExpressedGenes(DEG) Tofindgenesthatdisplayalargedifferenceingeneexpressionbetweentwoconditionsandarehomogeneouswithinthem  – Typicallystatisticaltests(t-test,Wilcoxontest)areused Iftherearemorethantwoconditions,orifconditionsarenested,theappropriatestatisticalmethodisANOVA  pvaluesfromthesetestshavetobecorrectedformultipletesting 

  19. MicroarraysfortheStudyofGeneExpression(3) Exploratorydataanalysis(1) Tofindgroupsthatarenotdefinedyet(e.g.noveldiseasesubtypes)Methods   – – – – fromthisfieldwerethefirsttobeusedformicroarraydata shouldbeusedonlyifnopriorknowledgeexiststhatcouldbeincorporated findpatternsinthedata,butanypatterns,whethertheyaremeaningfulornotinclude • • Clustering(hierarchicalandpartitioning)Projection(PCA,MDS) Alizadehetal.Nature403:503–511(2000)

  20. MicroarraysfortheStudyofGeneExpression(4) Timeseries,partitioningclusteringandcorrelation • Usuallyusedtofindpatternsofco-expressedgenesThemeaningoftimeseriesisdifferentfor • Biologists:2-10timepoints • statisticians:>200timepoints • “Non-optimal”solution:touseclusteringmethodstofindsuchpatterns    Notethattheyarebynomeansexhaustive,andthatnosignificancemeasurecanbeattachedtothem IncontrasttoEstimationofDistribuitonMethods(EDA),partitioningclustermethodsaremorepopular(e.g.K-meansorSelf-organizingmaps)   Toseekgeneswhoseexpressionprofileissimilartothatofaparadigmaticgene,correlationscanbecalculated,andsortbythem.Thereisnoneedforclustering.  Specialmethodsexistforperiodicchanges(⇒cellcycle),e.g.Fourieranalysis 

  21. MicroarraysfortheStudyofGeneExpression(5) Classification Wheninformationaboutgroupingofthesamplesisavailable,itcan(andshould)beusedtogetimprovedresults  Groupingsmaybe:  – – – – – – – – TreatmentandControlDiseaseandNormalDiseasestage1,2,3MutantandWildTypeGoodandPoorOutcome,Therapysuccessorfailure ... Onelearnscharacteristicpatternsfromatrainingsetandevaluatebypredictingclassesofatestset 

  22. MicroarraysfortheStudyofGeneExpression(6) SurvivalAnalysis Tofindpatternsthatareassociatedwithprolongedpatients’survivaltime  Insteadoftreatingoutcomeasabinaryvariable,canbeused  – – TheoverallsurvivaltimeorTheeventfreesurvivaltime ascontinuousvariables,andtrytoestimateitbyregression Sincetherisktosufferfromrelapseisdecreasingwithtime,linearregressionmodelsarealmostalwaysinappropriatespecializedmodelswouldbebetter  – – CoxregressionRegressiontrees

  23. MicroarraysfortheStudyofGeneExpression(7) Pharmacogenomics Tofindmolecularpredictorsthattellaboutprobablesuccess(orfailure)ofacertaintherapy.e.g.  – – estrogenreceptorstatusfortamoxifen(antihormone)therapyHER2/NEUstatusforherceptintherapyinbreastcancer Onemayregardtreatmentoutcomeasadiscretevariableanduseclassificationmethods  Sometimes,it’sconvenientnottowaitforthefinalendpoint(whichmaybeyearsaway),buttousesurrogatevariables,e.g.  – – thedropofthebloodlevelofacertainproteinreductionintumorvolume

  24. Fabrication Twomaintechnologies Therearemanytypesoftechnologies,butprinciplesarethesame  ThemostusedarespottedarraysandInsituarrays  Spottedarrays(akacDNAarraysorStanfordarrays)  – PreviouslysynthesizedcDNAsoroligonucleotidesaredepositedonthechip Basedon“printing-like”technologies – Insituarrays(akaoligoarraysorAffyarrays)  – – – ProbesaresynthesizeddirectlyonthechipBasedonphotolithographictechniques Affymetrixarraysarethebest-known...butnottheonlyone!

  25. SpottedArrays(1) Fromthechipstotheimages ChipDesignandProduction SamplePreparation Hybridization ScanningandCapturingImages ImageAnalysis Quantification

  26. SpottedArrays(2) Chipdesignandconstruction • Productionbeginswiththeselectionofthe"probes"tobeprintedonthearrayIngeneral:chosenfrom • GenBank(http://www.ncbi.nlm.nih.gov/) • dbEST(http://www.ncbi.nlm.nih.gov/UniGene/index.html) • cDNA’sareprintedonthearray • Eachspotcancontainuniquesequences • Printing”meansadheringsequencestothespots    Amovieoftheprintingprocessisavailablehere

  27. SpottedArrays(3) Samplepreparation RNAisextractedfromthesamples ThisRNAisconvertedtofluorescentlylabeledcDNAbyreversetranscriptioninpresenceoffluorescentlylabelednucleotideprecursors RNAfromeachsamplesare labelledfluorescentCy-5)to withdyes different(e.g.Cy-3, allowdirect comparison 4.Afterlabeling,theyaremixed andhybridizedsequencesonthe(probes) witharray

  28. SpottedArrays(4) Hybridizationwithprobes Targetslabeledandcombined Amovieofthehybridizationprocessisavailablehere

  29. SpottedArrays(5) Scanningandcapturingimage AfterhybridizationeachDNAspotisilluminatedandfluorescencemeasurestakenforeachdyeseparately Thesemeasurementswillbeused,aftertheappropriatequalitycontrols,todeterminetherelativeabundance,ofthesequenceofeachspecificgeneinthetwomRNAorDNAsamples  

  30. SpottedArrays(6) Imageanalysis(1) TIFFimagesareprocessedbyimageanalysisprograms  – – – SPOT, GenePix ... toacquireintensityvaluesforeachspot Thesemeasureswillbeused,aftertheappropriatequalitycontrols,todeterminetherelativeabundance,ofthesequenceofeachspecificgeneinthetwomRNAorDNAsamples 

  31. SpottedArrays(7) Imageanalysis(2) StepsinImageProcessing  Addressing:Estimatelocationofspotcenters Segmentation:Classifyeachspota foreground(signal)background(noise) ● ● 3.Informationextraction(quantification) Foreachspotonthearray,andeachdyeobtain  Signalmeasurements(R,G) – gg Backgroundmeasurements(bgR,bgG) gg – – Qualityindicators

  32. SpottedArrays(8) Quantification Genemeasuredmeasures expressionis ● fromintensityasthe relative (corrected) intensityofonedyevsthe(corrected)relativeintensityoftheother M=Rg,M Rg−bgRg = Corrected Gg Gg−bgGg Background correction ● maybeaccordingquality needed, ornot,array tothe

  33. SpottedArrays(8) Overviewoftheprocess Amovieofthewholeprocessisavailablehere

  34. InsituChips(1) Fromthechipstotheimages MainConcepts SynthesisofOligosontheChip SamplePreparation HybridizationProcess ScanningImages OutputImages QuantificationandExpressionMeasures

  35. InsituChips(1) Mainconcepts(1) MoreadvanceddesignthanspottedcDNAarrays  – – TheyareNOTbasedoncompetitivehybridization.Thatis,onechip,onesampleTheyareNOTaddedonthechipafterbeingsynthesizedinvitro Mainidea:Probesaresynthesizedinsitu(onthechip)  Sequencesarebuiltuponthechipsurfacebysequentiallyelongatingagrowingchainwithasinglenucleotideusingphotolithography  Chemicalyieldofthestepwiseelongationislimited  – SequencescanNOTgrowtomorethan25merslength(oligo) – Need16-20different25mersequencestouniquelycharacterizeagene • • Probe=Individual25mersequence Probeset=Setof25merscorrespondingtoaparticulargene/EST

  36. InsituChips(2) Mainconcepts(2) Affymetrix(http://www.affymetrix.com)istheleadercompanyofthesekindsofchips.TheycallthemGeneChips  Eachgeneisrepresentedbyasetofshortsequences  Someofthesechipscontainwholegenomes,thatis>50.000probesets  Aprobeset(usuallydenotedprobeset)isusedtomeasurethemRNAlevelsofauniquegene  Eachprobesetismadeupofmultipleprobecells  – – withmillonsofcopiesofoneoligodecopiasdeunoligo(25bp)Organizedinprobepairswith • • aPerfectMatch(PM):matchperfectlywithapieceofagene aMismatch(MM):itisthesametoPMbutwiththecentralnucleotidechangebythecomplementary

  37. InsituChips(3) Mainconcepts(1) MoreadvanceddesignthanspottedcDNAarrays  – – TheyareNOTbasedoncompetitivehybridization.Thatis,onechip,onesampleTheyareNOTaddedonthechipafterbeingsynthesizedinvitro Mainidea:Probesaresynthesizedinsitu(onthechip)  Sequencesarebuiltuponthechipsurfacebysequentiallyelongatingagrowingchainwithasinglenucleotideusingphotolithography  Chemicalyieldofthestepwiseelongationislimited  – SequencescanNOTgrowtomorethan25merslength(oligo) – Need16-20different25mersequencestouniquelycharacterizeagene • • Probe=Individual25mersequence Probeset=Setof25merscorrespondingtoaparticulargene/EST

  38. InsituChips(4) GeneChip®expressionarraydesign

  39. InsituChips(5) Onegene,oneprobeset Probesareselectedtobespecificoftherepresentedgene  Themusthavegoodpropertiesofhybridization  genesequence

  40. InsituChips(6) Synthesisofoligosonthechip(1) GeneChip®probearraysaremanufacturedthroughauniqueandrobustprocess,acombinationofphotolithographyandcombinationalchemistry  ImagecourtesyofAffymetrix

  41. InsituChips(7) Synthesisofoligosonthechip(2) mask mask mask mask mask mask mask C A T C mask T T T A C GA TC AG A GeneChip ImagefromacourseofDanNettleton

  42. InsituChips(8) Synthesisofoligosonthechip(3) Severalcopiesofasinglefeaturearedepositedineachcell  ImagecourtesyofAffymetrix

  43. InsituChips(9) Samplepreparation

  44. InsituChips(8) Hybridizationprocess OncetheoligoshavebeensynthesizedhybridizationisperformedbyaddingmRNAfromthetissuetoanalyzeonthechip  ImagecourtesyofAffymetrix

  45. InsituChips(9) ScanningImages Scanningoftaggedandun-taggedprobesonanAffymetrixGeneChip®microarray  ImagecourtesyofAffymetrix

  46. InsituChips(10) OutputImage DatafromanexperimentshowingtheexpressionofthousandsofgenesonasingleGeneChip®probearray  ImagecourtesyofAffymetrix

  47. InsituChips(11) Quantification Intensitiesfromeachelementareextracted  QuantitativeanalysisofthehybridizationresultsisperformedbyanalyzingthehybridizationpatternofthesetofPMandMMprobesofeverygene  Incontrastwithspottedchipsexpressionmeasuresusedhereareabsoluteones.Thatis,eachchipishybridizedwithonlyonetissueatatime 

  48. InsituChips(12) Absoluteexpressionmeasures MeasurestodeterminethequantitativeRNAabundance,i.e.theexpressionlevelbasedontheaverageofthedifferencesPMminusMMforeachprobefamily  Avg.Diff=1¿j∈APM−MM ∣A∣ Manyalternativeshavebeenintroduced 

  49. SpottedvsInsituArrays PRO'sandCON's cDNAmicroarraysOligomicroarrays PRO's PRO's • • • • • • Cheaper Flexibilitywiththeexperimentaldesign Highsignalintensity(largesequences) Quickmanufacture(automated)Highreproducibility Highspecificity Alotofprobes/genes • CON's CON's • Requiresmorespecializedequipment ExpensivesLowflexibility • Lowreproducibility • Cross-hybridization(lowspecificity) • Highmanupulation(ssibilityofcontamination) • •

  50. InsituChips(13) Overviewoftheprocess Amovieofthewholeprocessisavailablehere ImagecourtesyofAffymetrix

More Related