1 / 52

Tools in bioinformatics

Tools in bioinformatics. Fall 2009-10. Goals. Overview. To provide students with practical knowledge of bioinformatics tools and their application in research. Prerequisites. The course “Introduction to bioinformatics”

lola
Télécharger la présentation

Tools in bioinformatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tools in bioinformatics Fall 2009-10

  2. Goals Overview • To provide students with practical knowledge of bioinformatics tools and their application in research Prerequisites • The course “Introduction to bioinformatics” • Familiarity with topics in molecular biology (cell biology, biochemistry, and genetics) • Basic familiarity with computers & internet

  3. Course website Administration http://ibis.tau.ac.il/intro_bioinfo/tools.html

  4. Administration Classes: A class will be given every two weeks There are three class groups:Sunday 16:00-18:00Monday 12:00-14:00 Monday 14:00-16:00 Location: Computer classroom Sherman 03

  5. Administration Teachers: • Nimrod Rubinstein rubi@post.tau.ac.il (Sundays) • Daiana Alaluf daianaal@post.tau.ac.il (Mondays I) • Osnat Penn penn@post.tau.ac.il (Mondays II) • Reception hours:Email your instructor any question at any time or set an appointment (Britania 405, 6409245)

  6. Requirements • Assignments – 50% of final grade (compulsory) • Assignments include class and home works: • Class works are planned to be completed during the lesson and handed in at the end of it. They will be checked but not graded. • Home works should be handed in the following lesson (two weeks after their hand out). They will be checked and graded. • Final project – 50% of final grade When emailing your instructor (a question, your assignment, or whatever) please state in the “Subject” field: “Tools in Bioinfo”, IDs, CW/HW number (if relevant)

  7. BIOINFORMATICS DATABASES

  8. What’s in a database? • Sequences – genes, proteins, etc… • Full genomes • Expression data • Structures • Annotation – information about genes/proteins:- function- cellular location- chromosomal location- introns/exons- phenotypes, diseases • Publications

  9. NCBI and Entrez • One of the most largest and comprehensive databases belonging to the NIH (national institute of health.The primary Federal agency for conducting and supporting medical research in the USA) • Entrez is the search engine of NCBI • Search for :genes, proteins, genomes, structures, diseases, publications, and more http://www.ncbi.nlm.nih.gov

  10. PubMed: NCBI’s database of biomedical articles Yang X, Kurteva S, Ren X, Lee S, Sodroski J. “Subunit stoichiometry of human immunodeficiency virus type 1 envelope glycoprotein trimers during virus entry into host cells “, J Virol. 2006 May;80(9):4388-95.

  11. Use fields! Yang[AU] AND glycoprotein[TI] AND 2006[DP] AND J virol[TA] For the full list of field tags: go to help -> Search Field Descriptions and Tags

  12. Example • Retrieve all publications in which the first author is:Davidovich C and the last author is: Yonath A

  13. Using limits Retrieve the publications of Yonath A, in the journals: Nature and Proc Natl Acad Sci U S A., in the last 5 years

  14. Google scholar http://scholar.google.com/

  15. GenBank: NCBI’s gene & protein database • GenBank is an annotated collection of all publicly available DNA sequences (and their amino-acid translations) • Holds ~106.5 billionbases of ~108.5 millionsequence records (Oct. 2009)

  16. Searching NCBI for the protein human CD4 Search demonstration

  17. Using field descriptions, qualifiers, and boolean operators • Cd4[GENE] AND human[ORGN] Or Cd4[gene name] AND human[organism] • List of field codes: http://www.ncbi.nlm.nih.gov/entrez/query/static/help/Summary_Matrices.html#Search_Fields_and_Qualifiers • Boolean Operators:ANDORNOT Note: do not use the field Protein name [PROT], only GENE!

  18. This time we directly search in the protein database

  19. RefSeq • Subcollection of NCBI databases with only non-redundant, highly annotated entries (genomic DNA, transcript (RNA), and protein products)

  20. An explanation on GenBank records

  21. Fasta format header description ID/accession > gi|10835167|ref|NP_000607.1| CD4 antigen precursor [Homo sapiens]MNRGVPFRHLLLVLQLALLPAATQGKKVVLGKKGDTVELTCTASQKKSIQFHWKNSNQIKILGNQGSFLTKGPSKLNDRADSRRSLWDQGNFPLIIKNLKIEDSDTYICEVEDQKEEVQLLVFGLTANSDTHLLQGQSLTLTLESPPGSSPSVQCRSPRGKNIQGGKTLSVSQLELQDSGTWTCTVLQNQKKVEFKIDIVVLAFQKASSIVYKKEGEQVEFSFPLAFTVEKLTGSGELWWQAERASSSKSWITFDLKNKEVSVKRVTQDPKLQMGKKLPLHLTLPQALPQYAGSGNLTLALEAKTGKLHQEVNLVVMRATQLQKNLTCEVWGPTSPKLMLSLKLENKEAKVSKREKAVWVLNPEAGMWQCLLSDSGQVLLESNIKVLPTWSTPVQPMALIVLGGVAGLLLFIGLGIFFCVRCRHRRRQAERMSQIKRLLSEKKTCQCPHRFQKTCSPI sequence Save accession numbers for future use (makes searching quicker):RefSeq accession number: NP_000607.1 24

  22. Downloading 25

  23. Swissprot • A protein sequence database which strives to provide a high level of annotation regarding:* the function of a protein* domains structure* post-translational modifications* variants • One entry for each protein http://www.expasy.ch/sprot

  24. GenBank Vs. Swissprot Swiss-Prot results GenBank results

  25. PDB: Protein Data Bank • Main database of 3D structures of macromolecules • Includes ~61,000 entries (proteins, nucleic acids, complex assemblies) • Is highly redundant http://www.rcsb.org

  26. Human CD4 in complex with HIV gp120 PDB ID 1G9M gp120 CD4

  27. Accession Numbers

  28. GeneCards • All-in-one database of human genes (a project by the Weizmann institute) • Attempts to integrate as many as possible databases, publications, and all available knowledge http://www.genecards.org

  29. Organism specific databases • Model organisms have independent databases: HIV database http://hiv-web.lanl.gov/content/index

  30. Summary • General and comprehensive databases: • NCBI, EMBL • Genome specific databases (to be discussed): • UCSC, ENSEMBL • Highly annotated databases: • Human genes • Genecards • Proteins: • Swissprot, RefSeq • Structures: • PDB

  31. As important: • Google (or any search engine)

  32. And always remember: • RT(F)M -Read the manual!!! (/help/FAQ)

  33. GO: Gene Ontology

  34. Gene Ontology • Strives to provide consistent descriptions of gene products obtained from different databases • GO annotations include three hierarchicalontologies of gene products: • cellular component(s) – the environment in which the gene product functions • biological processe(s) – the biological program/pathway in which the gene product is involved • molecular function(s) – the elemental activities of the gene product • E.g., cytochrome c: • cellular components: mitochondrial matrix and mitochondrial inner membrane • biological processes: oxidative phosphorylation and induction of cell death • molecular functions: oxidoreductase activity

  35. AmiGO: the official GO browser

  36. . .

  37. Through NCBI

  38. . . . .

  39. Enrichment analysis Query set Reference set N n k K Total – N genes Function f – K genes Total – n genes Function f – k genes Is k/n > K/N, significantly ???

  40. Statistical significance testing Problem formulation: In a group of N genes there are K “special” ones If we sample n genes out of N (without replacement), and found k “special” ones, would that be considered a random outcome? Mathematically, we use the hypergeometric distribution to compute the probability of obtaining k or more “special” ones in a sample of n

  41. Materials & Methods 21,121 siRNA knockdown assays, literally covering the entire coding-sequence part of the genome

  42. Results 273 HIV-dependency factors (HDFs) were discovered Biological processes

More Related