1 / 17

Study of Arabidopsis’ Copper Regulation by High Throughput Sequence Data Analysis

Study of Arabidopsis’ Copper Regulation by High Throughput Sequence Data Analysis. Steven A. Cardenas, SoCal BSI Dr. Pellegrini, PI, UCLA Dr. Casero Diaz-Cano, Post Doc, UCLA. Objective of Project.

kobe
Télécharger la présentation

Study of Arabidopsis’ Copper Regulation by High Throughput Sequence Data Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Study of Arabidopsis’ Copper Regulationby High Throughput Sequence Data Analysis Steven A. Cardenas, SoCal BSI Dr. Pellegrini, PI, UCLA Dr. Casero Diaz-Cano, Post Doc, UCLA

  2. Objective of Project • Analysis of Sets of Differentially Expressed Genes in Plus and Minus Copper Conditions For Arabidopsis WT • Identify Spl7 Regulated Genes • Potential Upstream Motifs That Regulate the Genes

  3. Project Significance • To Further the Development of Techniques Used in High Throughput Analysis. • The Study of Copper Regulation in Arabidopsis. • This Data Could Be Used to Help Increase Our Understanding of Copper Regulation in the Human Body.

  4. Outline of Presentation • Arabidopsis Thaliana • Tools Used • Solexa Sequencing • Low Level Data Analysis • Downstream Data Analysis • Future Work

  5. Arabidopsis Thaliana • A Small Flowering Plant Related • to Cabbage and Mustard • Found in Europe, Asia, and • Northwestern Africa • First Plant Genome to be Sequenced and it is Well Annotated http://www.steve.gb.com/images/science/arabidopsis_thaliana.jpg

  6. Tools Used • TAIR • SOAP • MATLAB www.microsoft.com www.pythonwin.org • Excel http://soap.genomics.org.cn www.arabidopsis.org www.mathworks.com

  7. Solexa Sequencing 1. Prepare Genomic DNA Sample 2. Attach DNA to Flow Cell Surface 3. Amplification 4. Determine First Base 5. Image First Base 6. Determine Second Base 7. Sequence Reads Over Multiple Chemistry Cycles http://seqanswers.com/forums/showthread.php?t=21

  8. Illumina mRNA Sample Preparation by Whole Transcriptome Analysis (WTA) AAAA Metal Catalyzed Fragmentation AAAA 60 – 200 nt Random Hexamer Primed 1st Strand cDNA Synthesis 2nd Strand cDNA Synthesis End Repair and Adaptor Ligation Size Selection 200 bp PCR Sequence > 250 – 500 Mb 33 nt sequence

  9. Experimental Conditions of Analyzed Data Arabidopsis Wild Type Spl7 Mutant +Cu and -Cu +Cu and -Cu Root Cell Shoot Cell Root Cell Shoot Cell

  10. Data Analysis Solexa Data SOAP Align Data TAIR Refseq MATLAB • Calculate Hits per Gene • Normalize • Regularize • Check For Reproducibility • Differentially Expressed Gene Statistical Analysis • Spl7 Motif Statistical Analysis Spreadsheet of Results Excel

  11. Data Reproducibility Arabidopsis WT Root Cell Minus Copper Condition Replicate 2 (Alignment Hits per Million) Replicate 1 (Alignment Hits per Million)

  12. Statistical Analysis for Differential Expression • Differential Expression of Genes in Plus Copper vs. Minus Copper • Statistical Problems • Only Two Replicates • Large Dynamic Range of Data

  13. Statistical Analysis for Differential Expression • Student’s T-test • Fails With Large Dynamic Range • Bayesian T-test • Makes Use of Genes With Similar Expression Levels • Currently Still Fails With Large Dynamic Range • Binomial Test • Combined Replicates • Fails When Reproducibility is Bad

  14. Top Differentially Expressed Genes with Binomial Test Min: Bayesian -13.87 Binomial –inf Student T-test -5.63

  15. Motifs Analysis: The First Approach Select Potential Targets of transcription factor SPL7 Retrieve Promoter Sequences From the Genome Calculate Word Count For SPL7 Motif Statistical Test Background Distribution Derived From Word Counts In the Whole Genome

  16. Future Work • Research New Statistical Methods to Better Identify Differentially Expressed Genes • Use of Non Fixed Window For Bayesian • T-test • Finish Analysis of Motifs That Regulate the Differentially Expressed Genes • Identify Transcribed Non Coding RNAs • (e.g. microRNAs)

  17. Acknowledgements • UCLA and the Pellegrini Lab • Dr. Matteo Pellegrini • Dr. David Casero Díaz-Cano • Dr. Shawn Cokus • Collaborators • Ute Krammer University of Heidelberg, Germany • Sabeeha Merchant University of California Los Angeles • SoCalBSI Instructors and Fellow Researchers • Funding www.ucla.edu • Dr. JamilMomand • Dr. Sandy Sharp • Dr. Nancy Water-Perez • Dr. Wendie Johnston • Dr. Beverly Krilowicz • Dr. Silvia Heubach • Dr. Jennifer Faust • National Institutes of Health • National Science Foundation • Economic & Workforce Development • The Department of Energy http://instructional1.calstatela.edu/jmomand2/index.html

More Related