1 / 31

Carnegie Institution for Science, Department of Plant Biology

Carnegie Institution for Science, Department of Plant Biology. Putting TAIR to work for you: Tips and Techniques for Accessing Arabidopsis Data for Plant Biology Research. Kate Dreher TAIR, AraCyc, PMN Carnegie Institution for Science. Overview. Part I: Presentation (with exercises)

ilya
Télécharger la présentation

Carnegie Institution for Science, Department of Plant Biology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Carnegie Institution for Science, Department of Plant Biology

  2. Putting TAIR to work for you: Tips and Techniques for Accessing Arabidopsis Data for Plant Biology Research Kate Dreher TAIR, AraCyc, PMN Carnegie Institution for Science

  3. Overview • Part I: Presentation (with exercises) • Finding a specific gene of interest in TAIR • Looking at the data on the locus, gene model, and protein pages • Getting to know GBrowse • Creating and enhancing customized data sets • Tips for working with Arabidopsis • Part II: Practice problems and individual help • Hand-outs with practice problems to work on • Questions from participants • Individual help • All documents are available in electronic form: • Resource guide • Questions, answers, and practice data • “Bienvenidos a TAIR” presentacion y esta presentacion

  4. What is TAIR? • The Arabidopsis Information Resource (TAIR) maintains a database of genetic and molecular biology data for the model plant Arabidopsis • www.arabidopsis.org • Curators and programmers at TAIR: • Collect, store, and organize Arabidopsis data • Attach functional information to genes • Improve gene structures • Provide tools to analyze data • Work with the ABRC to provide seeds and clones

  5. Tips and Techniques for Accessing Arabidopsis Data • Finding the gene you want • Case 1: You have a non-Arabidopsis gene and want to find its homolog • http://www.ncbi.nlm.nih.gov/nuccore/148189857?report=genbank • Case 2: You know exactly what Arabidopsis gene you want • You know the AGI locus code (e.g. At2g46990) • You know the gene symbol (e.g. PhyA)

  6. Finding a gene: practice problems • You are reading a paper about an interesting phenotype caused by a mutation in the AN gene. • Find the AGI locus code of this gene • You find an EST that is expressed at high levels in the seed of your Phaseolus vulgaris variety: GenBank: AB304457 • (To find gene in GenBank – google “NCBI” and you should find the page) • Find the AGI locus codes of the top three hits in TAIR using BLAST • Is it the same if you BLAST with the transcript or the protein? • Based on the transcript • Based on the protein

  7. Finding a gene: practice problems • You are reading a paper about an interesting phenotype caused by a mutation in the AN gene. • Find the AGI locus code of this gene • AT1G01510 (a.k.a. ANGUSTIFOLIA) • You find an EST that is expressed at high levels in the seed of your Phaseolus vulgaris variety: GenBank: AB304457 • Find the AGI locus codes of the top three hits in TAIR using BLAST • Is it the same if you BLAST with the transcript or the protein? • Based on the transcript • AT1G14920.1 | Symbols: GAI, RGA2 | GAI (GIBBERELLIC ACID IN... 62 3e-08 • AT3G03450.1 | Symbols: RGL2 | RGL2 (RGA-LIKE 2); transcript... 44 0.007 • AT2G01570.1 | Symbols: RGA1, RGA | RGA1 (REPRESSOR OF GA1-3... 44 0.007 • Based on the protein • AT1G14920.1 | Symbols: GAI, RGA2 | GAI (GIBBERELLIC ACID IN... 647 0.0 • AT2G01570.1 | Symbols: RGA1, RGA | RGA1 (REPRESSOR OF GA1-3... 632 0.0 • AT1G66350.1 Symbols: RGL1, RGL | RGL1 (RGA-LIKE 1

  8. Choosing the proper search result Gene Model Protein Locus

  9. The Locus page: Lots of information

  10. Looking at the Locus page: practice problems 1 • You’re interested in learning more about a gene called: • PMR2 (Powdery Mildew Resistant 2) • What is its AGI locus code? • How many splice variants does it have? • Which one has the shorter coding region? • What is another name for this gene? • What is the evidence for it being involved in the “defense response to fungus, incompatible interaction?” • How many total loci are annotated to this term? • Which paper provides experimental evidence that PMR2 is located in the plasma membrane? • What is the title of that paper?

  11. Looking at the locus page: practice problems 1 • You’re interested in learning more about a gene called PMR2: • Powdery Mildew Resistant 2 • What is its AGI locus code? • At1g11310 • How many splice variants does it have? • 2 • Which one has the shorter coding region? • At1g11310.2 • What is another name for this gene? • Mildew Resistant Locus 2 (MLO2) • What is the evidence for it being involved in the “defense response to fungus, incompatible interaction?” • Inferred from Mutant Phenotype; analysis of visible trait; Consonni 2005 • How many total loci are annotated to this term? • 44 • Which paper provides experimental evidence that PMR2 is located in the plasma membrane? • Benschop 2007 • What is the title of that paper? • Quantitative phospho-proteomics of early elicitor signalling in Arabidopsis.

  12. Looking at the locus page: practice problems 2 • You’re interested in learning more about a gene called: • PMR2 (Powdery Mildew Resistant 2) • How many cDNAs are associated with this locus? • Which are available to order from the ABRC? • What is the length of the full-length coding region? • What is the isoelectric point of the protein? • For the PERL0025782 polymorphism, what is the nucleotide difference between the Col and Bor-4 ecotypes?

  13. Looking at the locus page: practice problems 2 • You’re interested in learning more about a gene called: • PMR2 (Powdery Mildew Resistant 2) • How many cDNAs are associated with this locus? • 3 • Which are available to order? • none • What is the length of the full-length coding region? • 1722 bp • What is the isoelectric point of the protein? • 9.8492  • For the PERL0025782 polymorphism, what is the nucleotide difference between the Col-0 and Bor-4 ecotypes? • Col

  14. Looking at the locus page: practice problems 3 • You’re interested in learning more about a gene called: • PMR2 (Powdery Mildew Resistant 2) • Does the pmr2-1 mutant form lesions in response to powdery mildew attack? • What is the putative location of the T-DNA insertion in mlo2-6? • What is the ecotype of SAIL_878_H12? • How many publications are available for this gene for 2007? • Which paper also mentions the PMR3 gene? • How many papers mention the “mlo2” allele/ mutant when you do a Textpresso search?

  15. Looking at the locus page: practice problems 3 • You’re interested in learning more about a gene called: • PMR2 (Powdery Mildew Resistant 2) • Does the pmr2-1 mutant form lesions in response to powdery mildew attack? • no • What is the putative location of the T-DNA insertion in mlo2-6? • intron • What is the ecotype of SAIL_878_H12? • Col-0 • How many articles and how many abstracts are available for this gene for 2007? • 2 abstracts, 1 article • Which paper also mentions the PMR3 gene? • Isolation and characterization of powdery mildew-resistant Arabidopsis mutants • PNAS 2000 • How many papers mention the “mlo2” allele/ mutant when you do a Textpresso search? • 8

  16. Locus page links: practice problems • You’re interested in learning more about a gene called: • PMR2 (Powdery Mildew Resistant 2) • According to the Genevestigator Gene Atlas, which organ has the highest level of expression? • According to the Genevestigator Response viewer, was the level of PMR2 transcript higher 1 hr or 4 hrs after treatment with the fungal elicitor FL22? • According to the eFP site, are the absolute levels of PMR2 expression higher in the root or the shoot of a seedling, 6 hours after a cold treatment? • In the SUBA database, where does the MS/MS data indicate that this protein is located? • According to InParanoid, how many poplar genes fall into the same group? • On the AT-TED II page, how many genes are directly linked to PMR2 by co-expression analysis, and which has the strongest correlation?

  17. Locus page links: practice problems • You’re interested in learning more about a gene called: • PMR2 (Powdery Mildew Resistant 2) • According to the Genevestigator Gene Atlas, which organ has the highest level of expression? • senescent rosette leaf • According to the Genevestigator Response viewer, was the level of PMR2 transcript higher 1 hr or 4 hrs after treatment with the fungal elicitor FL22? • It is higher 1 hour after treatment • According to the eFP site, are the absolute levels of PMR2 expression higher in the root or the shoot of a seedling, 6 hours after a cold treatment? • They are higher in the root • In the SUBA database, where does the MS/MS data indicate that this protein is located? • plasma membrane • According to InParanoid, how many poplar genes fall into the same group? • 2 • On the AT-TED II page, how many genes are directly linked to PMR2 by co-expression analysis, and which has the strongest correlation? • 5, At2g44180 is the strongest

  18. Do we need anything besides the locus, gene model, and protein pages?

  19. How many Papaya genes are found in the same cluster as PMR2 in Phytozome? • How many Vitis vinifera genes?

  20. Basic navigation and tools in GBrowse Use controls to zoom and scroll along chromosome Enter locus, marker, etc. Get sequence ***Many tracks now contain data from the TAIR9 release on Monday, June 22

  21. GBrowse = Gobs of Information x x

  22. GBrowse: practice problems • How many papaya homologs are displayed from Phytozome? And how many amino acids are in the putative ortholog that has the Mlo domain? • There are two upstream regulatory regions located upstream of this gene? Which one has been linked to the a cis element in rice? • Which of the following has a longer transcript assembly aligning with PMR2? • Saccharum officinarum or Triticum aestivum? • Solanum tuberosum or Vitis vinifera? • Are there any experimentally supported phosphorylation sites? • What polymorphism appears to occur in the 5th intron? • Is there peptide support for the third exon? the fourth exon? the fifth exon? And which gene model is supported by peptide evidence? • Which exon structure seems to be better supported by the Brassica cDNA? by the Radish clones?

  23. GBrowse: practice problems • How many papaya homologs are displayed from Phytozome? And how many amino acids are in the putative ortholog that has the Mlo domain? • 2; 350 amino acids • There are two upstream regulatory regions located upstream of this gene? Which one has been linked to the a cis element in rice? • AtREG417 • Which of the following has a longer transcript assembly aligning with PMR2? • Saccharum officinarum or Triticum aestivum? Triticum aestivum • Solanum tuberosum or Vitis vinifera? Solanum tuberosum • Are there any experimentally supported phosphorylation sites? • Yes, from the motif: SVENYPSSPSPR • What polymorphism appears to occur in the 5th intron? • PERL0025787 • Is there peptide support for the third exon? the fourth exon? the fifth exon? And which gene model is supported by peptide evidence? • third – yes; fourth – no, fifth – yes; the At1g11310.1 model is supported • Which exon structure seems to be better supported by the Brassica cDNA? by the Radish clones? • the At1g11310.1 model is better supported by both types of transcripts

  24. Sometimes, one gene isn’t enough . . . • Scientists often want to work with more than one gene or protein that are related through some common feature • TAIR (and the PMN) offer some basic tools to create and/or enhance these customized data sets

  25. Creating customized data sets • Data sets can be based on many different criteria: • Overall sequence alignment (DNA or protein) • Sequence motifs (DNA or protein) • Protein domains and biochemical properties • Gene/Protein “function” • Subcellular location • Molecular function • Biological process • Expression pattern • Biochemical pathway • Mapping region • Phenotype • Gene families • How do you generate these data sets?

  26. Creating data sets: practice problems • How many DNA stocks are associated with NPR1? Do any of them that are available from the ABRC have full length cDNAs? • How many keywords contain the term “oxalate”? How many of them have been used to annotate Arabidopsis genes? • How many germplasms are associated with a “reduced seed set” phenotype? • How many genes encode proteins that are found in the “chloroplast stroma” based on a “direct assay?” • Try to get the calculated PIs for all the “chloroplast stroma” proteins and find the highest and lowest values. • How many proteins have the following domain “Gly-Arg-Ala-Asn-hydrophobic residue” (GRAN[hydrophilic])?

  27. Creating data sets: practice problems • How many DNA stocks are associated with NPR1? Do any of them that are available from the ABRC have full length cDNAs? • 11; yes, the two stocks available from the ABRC have full-length cDNAs • How many keywords contain the term “oxalate”? How many of them have been used to annotate Arabidopsis genes? • 11 keywords; two have been used for Arabidopsis • How many germplasms are associated with a “reduced seed set” phenotype? • 68 • How many genes encode proteins that are found in the “chloroplast stroma” based on a “direct assay?” • 396 loci • Try to get the calculated PIs for all the “chloroplast stroma” proteins and find the highest and lowest values. • 4.25, 12.66 • How many proteins have the following domain “Gly-Arg-Ala-Asn-hydrophobic residue” (GRAN[hydrophilic])? • 32

  28. Putting TAIR to work for you • Use TAIR to find detailed information for a specific gene / protein • Locus page, gene model page, protein page • Many sections, many data types, many external links • GBrowse • Many tracks • Use TAIR to create and enhance customized data sets • Specific and Advanced Search pages • Motif analysis tools • FTP files with large data sets • Use TAIR for data visualization and “analysis” • GO categorization (TAIR) • OMICs viewer (PMN) • If you’re having trouble getting any information you want from TAIR . . .

  29. We are here to help: www.arabidopsis.org • Please use our data • Please use our tools • Please use TAIR to help improve your research on IMPORTANT plants! • Please contact us if we can be of any help! • Make an appointment to meet with me during my visit • (Puedo tratar de hablar en español) curator@arabidopsis.org www.arabidopsis.org

  30. Thank you! TAIR, AraCyc, and the PMN Eva Huala (Director and Co-PI) Sue Rhee (PI and Co-PI) Current Curators: - Tanya Berardini (lead curator – functional annotation) - David Swarbreck (lead curator – structural annotation) - Peifen Zhang (Director and lead curator- metabolism) - A. S. Karthikeyan (curator) - Philippe Lamesch (curator) - Donghui Li (curator) - Rajkumar Sasidharan (curator) Recent Past Contributors: - Debbie Alexander (curator) - Christophe Tissier (curator) - Hartmut Foerster (curator) Tech Team Members: - Bob Muller (Manager) - Larry Ploetz (Sys. Administrator) - Raymond Chetty - Anjo Chi - Vanessa Kirkup - Cynthia Lee - Tom Meyer - Shanker Singh - Chris Wilks Metabolic Pathway Software: - Peter Karp and SRI group

  31. We are here to help: www.arabidopsis.org • Please use our data • Please use our tools • Please use TAIR to help improve your research on IMPORTANT plants! • Please contact us if we can be of any help! • Make an appointment to meet with me during my visit • (Puedo tratar de hablar en español) curator@arabidopsis.org www.arabidopsis.org

More Related