1 / 18

CS 177 Hands-on lab with databases

CS 177 Hands-on lab with databases. Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises. Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises. Quiz #1. Homework #1 Quiz #1 Summary: Nucleotide and protein databases

iden
Télécharger la présentation

CS 177 Hands-on lab with databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 177 Hands-on lab with databases Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises

  2. Quiz #1 Homework #1 Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises

  3. The International Nucleotide Sequence Database Collaboration NIH Entrez Sequin BankIt ftp NCBI GenBank • Submissions • Updates • Submissions • Updates EMBL DDBJ EBI CIB NIG • Submissions • Updates SRS EMBL getentry

  4. Primary vs. Derivative Databases ACGTGC Curators C C GA ATT GA GA C ATT GA C RefSeq TATAGCCG Sequencing Centers ACGTGC TATAGCCG AGCTCCGATA CCGATGACAA ATTGACTA CGTGA TTGACA Labs TTGACA TTGACA ACGTGC Genome Assembly TATAGCCG ACGTGC TATAGCCG ATTGACTA CGTGA CGTGA ATTGACTA TATAGCCG CGTGA ATTGACTA ATTGACTA TATAGCCG TTGACA ATTGACTA TATAGCCG TATAGCCG TATAGCCG TATAGCCG ATT C GenBank UniGene GA AT C C Algorithms ATT C C GA ATT GA GA ATT GA ATT GA ATT GA C GA C ATT GA C C

  5. The Entrez Databases

  6. The (ever) Expanding Entrez System Journals UniGene Books SNP PubMed UniSTS PubMed Central Nucleotide PopSet Protein ProbeSet Entrez Genome Structure Taxonomy CDD OMIM 3D Domains

  7. Genbank Search and retrieval of sequences Entrez is a retrieval system for searching several linked databases. It provides access to: PubMed; Nucleotide; Protein; Structure; Genome; PopSet; OMIM; Taxonomy and more. BLAST® (Basic Local Alignment Search Tool) is a set of similarity search programs designed to explore all of the available sequence databases regardless of whether the query is protein or DNA. Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises

  8. BLAST selections Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises

  9. GenBank format

  10. Fasta format

  11. Convertible in ReadSeq (Web based) http://bimas.dcrt.nih.gov/molbio/readseq/ or ForCon (stand-alone application) http://www.hgmp.mrc.ac.uk/embnet.news/vol6_1/ForCon/forcon.html NOTE: • FASTA is a popular sequence format • - it also is a sequence similarity and homology search tool (similar to BLAST) used by EMBL-EBI Sequence formats ASN.1 DNAStrider EMBL Fitch GCG GenBank/GB IG/Stanford MSF NBRF Olsen PAUP/NEXUS Pearson/Fasta Phylip PIR/CODATA Plain/Raw Pretty Zuker Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises

  12. Lab exercises 1) How many sequences are available in GenBank for Neanderthals? Depends on your search strategy … 2) Go to Entrez nucleotide. Find all sequences for the following terms: neander Neanderthals Neanderthal neanderthal neanderthal* Homo sapiens neanderthalensis 1 0 1 1 6 6 2) Go to Entrez taxonomy. Try to find all sequences for Neanderthals! Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises 6

  13. Lab exercises 4) How many nucleotide sequences are available for the house mouse Mus musculus? Try both Entrez nucleotides and Entrez taxonomy. How do you explain the difference? Entrez taxonomy Entrez nucleotides 5.403.701 5.458.506 (Mus musculus) 5.393.552 (house mouse) 5.458.527 (Mus musculsus OR house mouse) 5) A man is found murdered in Yellowstone National Park. Few hairs of unidentified origin are recovered on the victim’s clothes. The samples arrive in the lab and DNA is isolated and sequenced: CCATGCATATAAGCATGTACATAATATTATATTCTTACATAGGACATATTAACTCAATCTCATAATTCAT Formulate a hypothesis regarding the origin of the recovered hairs and potential links with the killing! Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises Canis lupus (Gray Wolf)

  14. The Poliovirus Problem VOL 297, 9 August 2002 Cello, J; Paul, A.V. & Wimmer, E.: Chemical Synthesis of Poliovirus cDNA: Generation of Infectious Virus in the Absence of Natural Template - they generated about 7.7 kilobases of single-stranded RNA genome based on the know genetic map - DNA fragments were synthesized from purified oligo- nucleotides (average length 69: bases) - the cDNA was then transcribed into highly infectious RNA Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises

  15. The Poliovirus Problem 17 July 2002 Weiss, R.: Mail-Order Molecules Brew a Terrorism Debate - mail-order oligonucleotides can be used to manufacture a deadly virus - because they are so small, most oligos lack a “fingerprint” - call for more control and/or institutional oversight Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises

  16. The Poliovirus Problem Are these oligos so small that they lack a “fingerprint” ?? • - search in Genbank for nucleotide sequences of the poliovirus • - copy about 100 bp from a sequence of your choice and paste it into the search window of blastn, is the fragment identifiable as poliovirus? • - if so, do a blastn search with a 90 bp, 80 bp, 70 bp … fragment • what is the length of the shortest fragment still identifiable as poliovirus? • is this fragment shorter than the average length of 69 bp used to synthesize the poliovirus? • - do these oligos have a “fingerprint” (i.e. can ‘typical’ oligos with lengths of 20-50 be assigned to a particular organism)? Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises

  17. Homework assignment lecture #4 • Explain in your own words and in simple termsthe basics of the BLAST tool! • assignment is due on 6 Oct 2003, 3:30 PM • send your assignment as e-mail attachment to mtmtxw@gwumc.edu • (type your name and the term “homework” in the subject line) • - maximum size: 500 words Quiz #1 Summary: Nucleotide and protein databases Sequence formats Lab exercises

More Related