170 likes | 296 Vues
Study of Arabidopsis’ Copper Regulation by High Throughput Sequence Data Analysis. Steven A. Cardenas, SoCal BSI Dr. Pellegrini, PI, UCLA Dr. Casero Diaz-Cano, Post Doc, UCLA. Objective of Project.
E N D
Study of Arabidopsis’ Copper Regulationby High Throughput Sequence Data Analysis Steven A. Cardenas, SoCal BSI Dr. Pellegrini, PI, UCLA Dr. Casero Diaz-Cano, Post Doc, UCLA
Objective of Project • Analysis of Sets of Differentially Expressed Genes in Plus and Minus Copper Conditions For Arabidopsis WT • Identify Spl7 Regulated Genes • Potential Upstream Motifs That Regulate the Genes
Project Significance • To Further the Development of Techniques Used in High Throughput Analysis. • The Study of Copper Regulation in Arabidopsis. • This Data Could Be Used to Help Increase Our Understanding of Copper Regulation in the Human Body.
Outline of Presentation • Arabidopsis Thaliana • Tools Used • Solexa Sequencing • Low Level Data Analysis • Downstream Data Analysis • Future Work
Arabidopsis Thaliana • A Small Flowering Plant Related • to Cabbage and Mustard • Found in Europe, Asia, and • Northwestern Africa • First Plant Genome to be Sequenced and it is Well Annotated http://www.steve.gb.com/images/science/arabidopsis_thaliana.jpg
Tools Used • TAIR • SOAP • MATLAB www.microsoft.com www.pythonwin.org • Excel http://soap.genomics.org.cn www.arabidopsis.org www.mathworks.com
Solexa Sequencing 1. Prepare Genomic DNA Sample 2. Attach DNA to Flow Cell Surface 3. Amplification 4. Determine First Base 5. Image First Base 6. Determine Second Base 7. Sequence Reads Over Multiple Chemistry Cycles http://seqanswers.com/forums/showthread.php?t=21
Illumina mRNA Sample Preparation by Whole Transcriptome Analysis (WTA) AAAA Metal Catalyzed Fragmentation AAAA 60 – 200 nt Random Hexamer Primed 1st Strand cDNA Synthesis 2nd Strand cDNA Synthesis End Repair and Adaptor Ligation Size Selection 200 bp PCR Sequence > 250 – 500 Mb 33 nt sequence
Experimental Conditions of Analyzed Data Arabidopsis Wild Type Spl7 Mutant +Cu and -Cu +Cu and -Cu Root Cell Shoot Cell Root Cell Shoot Cell
Data Analysis Solexa Data SOAP Align Data TAIR Refseq MATLAB • Calculate Hits per Gene • Normalize • Regularize • Check For Reproducibility • Differentially Expressed Gene Statistical Analysis • Spl7 Motif Statistical Analysis Spreadsheet of Results Excel
Data Reproducibility Arabidopsis WT Root Cell Minus Copper Condition Replicate 2 (Alignment Hits per Million) Replicate 1 (Alignment Hits per Million)
Statistical Analysis for Differential Expression • Differential Expression of Genes in Plus Copper vs. Minus Copper • Statistical Problems • Only Two Replicates • Large Dynamic Range of Data
Statistical Analysis for Differential Expression • Student’s T-test • Fails With Large Dynamic Range • Bayesian T-test • Makes Use of Genes With Similar Expression Levels • Currently Still Fails With Large Dynamic Range • Binomial Test • Combined Replicates • Fails When Reproducibility is Bad
Top Differentially Expressed Genes with Binomial Test Min: Bayesian -13.87 Binomial –inf Student T-test -5.63
Motifs Analysis: The First Approach Select Potential Targets of transcription factor SPL7 Retrieve Promoter Sequences From the Genome Calculate Word Count For SPL7 Motif Statistical Test Background Distribution Derived From Word Counts In the Whole Genome
Future Work • Research New Statistical Methods to Better Identify Differentially Expressed Genes • Use of Non Fixed Window For Bayesian • T-test • Finish Analysis of Motifs That Regulate the Differentially Expressed Genes • Identify Transcribed Non Coding RNAs • (e.g. microRNAs)
Acknowledgements • UCLA and the Pellegrini Lab • Dr. Matteo Pellegrini • Dr. David Casero Díaz-Cano • Dr. Shawn Cokus • Collaborators • Ute Krammer University of Heidelberg, Germany • Sabeeha Merchant University of California Los Angeles • SoCalBSI Instructors and Fellow Researchers • Funding www.ucla.edu • Dr. JamilMomand • Dr. Sandy Sharp • Dr. Nancy Water-Perez • Dr. Wendie Johnston • Dr. Beverly Krilowicz • Dr. Silvia Heubach • Dr. Jennifer Faust • National Institutes of Health • National Science Foundation • Economic & Workforce Development • The Department of Energy http://instructional1.calstatela.edu/jmomand2/index.html