1 / 23

Using Local Tools: BLAST

Using Local Tools: BLAST. BCHB524 2008 Lecture 11. Outline. Install and run blast from NCBI Download Format sequence databases Run by hand Running blast and interpreting results Directly and using BioPython Exercises Lecture 9 exercises. Local Tools.

tamal
Télécharger la présentation

Using Local Tools: BLAST

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using Local Tools: BLAST BCHB5242008Lecture 11 BCHB524 - 2008 - Edwards

  2. Outline • Install and run blast from NCBI • Download • Format sequence databases • Run by hand • Running blast and interpreting results • Directly and using BioPython • Exercises • Lecture 9 exercises BCHB524 - 2008 - Edwards

  3. Local Tools • Sometimes web-based services don't do it. • For blast: • Too many query sequences • Need to search a novel sequence database • Need to change rarely used parameters • Web-service is too slow • For other tools: • No web-service? • No interactive web-site? • Insufficient back-end computational resources? BCHB524 - 2008 - Edwards

  4. Download standalone blast • In Windows, make a folder "BLAST" in your "My Documents" folder • Google "NCBI Blast" • …or go to http://www.ncbi.nlm.nih.gov/BLAST • Click on "Help" tab • Under "Other BLAST Information", • Click on "Download BLAST Software and Databases" • From the table under "Executables", find the download link at row "win32-ia32" and column "blast" • Right-click on the download link and Save As… • Put the file in your new "BLAST" folder • In Windows, double-click on the downloaded file. BCHB524 - 2008 - Edwards

  5. Folders: bin, data, doc Create folder: db Download standalone blast BCHB524 - 2008 - Edwards

  6. Look in doc: Double-click to open web-page documentation Download gunzip.py from course homepage into db Download standalone blast BCHB524 - 2008 - Edwards

  7. Download BLAST databases • Follow the link (above Executables) for the NCBI BLAST database FTP site: • ftp://ftp.ncbi.nlm.nih.gov/blast/db/ • The .tar.gz files contain databases already formatted for BLAST • The FASTA directory contains compressed (.gz) FASTA format sequence databases. • We'll download yeast.aa.gz and yeast.nt.gz to the db folder BCHB524 - 2008 - Edwards

  8. Download BLAST databases BCHB524 - 2008 - Edwards

  9. Uncompress FASTA databases • Select "Run…" from the "Start" menu • In the "Open" dialog box, type "cmd" and click OK BCHB524 - 2008 - Edwards

  10. Uncompress FASTA databases • cd My Documents • cd BLAST • cd db • dir BCHB524 - 2008 - Edwards

  11. Uncompress FASTA databases • gunzip.py yeast.*.gz • dir BCHB524 - 2008 - Edwards

  12. Format FASTA databases • cd .. • bin\formatdb.exe -i db\yeast.aa -p T -o T • bin\formatdb.exe -i db\yeast.nt -p F -o T • dir db BCHB524 - 2008 - Edwards

  13. Download formatdb databases • The .tar.gz files contain databases already formatted for BLAST • Download to BLAST\db and use the gunzip.py program to uncompress and unpack • For example, download • refseq_protein.00.tar.gz and refseq_protein.01.tar.gz • Uncompress and unpack • gunzip.py refseq_protein.*.tar.gz BCHB524 - 2008 - Edwards

  14. Running BLAST from the command-line • We need a query sequence to search: • Copy and paste this FASTA file into notepad and save as "query.fasta" in the BLAST folder >gi|6319267|ref|NP_009350.1| Yal049cp MASNQPGKCCFEGVCHDGTPKGRREEIFGLDTYAAGSTSPKEKVIVILTDVYGNKFNNVLLTADKFASAGYMVFVPDILF GDAISSDKPIDRDAWFQRHSPEVTKKIVDGFMKLLKLEYDPKFIGVVGYCFGAKFAVQHISGDGGLANAAAIAHPSFVSI EEIEAIDSKKPILISAAEEDHIFPANLRHLTEEKLKDNHATYQLDLFSGVAHGFAARGDISIPAVKYAKEKVLLDQIYWF NHFSNV >gi|6319268|ref|NP_009351.1| Yal048cp MTKETIRVVICGDEGVGKSSLIVSLTKAEFIPTIQDVLPPISIPRDFSSSPTYSPKNTVLIDTSDSDLIALDHELKSADV IWLVYCDHESYDHVSLFWLPHFRSLGLNIPVILCKNKCDSISNVNANAMVVSENSDDDIDTKVEDEEFIPILMEFKEIDT CIKTSAKTQFDLNQAFYLCQRAITHPISPLFDAMVGELKPLAVMALKRIFLLSDLNQDSYLDDNEILGLQKKCFNKSIDV NELNFIKDLLLDISKHDQEYINRKLYVPGKGITKDGFLVLNKIYAERGRHETTWAILRTFHYTDSLCINDKILHPRLVVP DTSSVELSPKGYRFLVDIFLKFDIDNDGGLNNQELHRLFKCTPGLPKLWTSTNFPFSTVVNNKGCITLQGWLAQWSMTTF LNYSTTTAYLVYFGFQEDARLALQVTKPRKMRRRSGKLYRSNINDRKVFNCFVIGKPCCGKSSLLEAFLGRSFSEEYSPT IKPRIAVNSLELKGGKQYYLILQELGEQEYAILENKDKLKECDVICLTYDSSDPESFSYLVSLLDKFTHLQDLPLVFVAS KADLDKQQQRCQIQPDELADELFVNHPLHISSRWLSSLNELFIKITEAALDPGKNTPGLPEETAAKDVDYRQTALIFGST VGFVALCSFTLMKLFKSSKFSK BCHB524 - 2008 - Edwards

  15. Running BLAST from the command-line • Run the BLAST command: • …and check out the result in query.txt. BCHB524 - 2008 - Edwards

  16. Interpreting blast results • Parsing text-format BLAST results is hard: • Use XML format output where possible (-m 7) • Use BioPython's BLAST parser from Bio.Blast import NCBIXML result_handle = open("query.xml") for blast_result in NCBIXML.parse(result_handle): for alignment in blast_result.alignments: for hsp in alignment.hsps: if hsp.expect < 1e-5: print '****Alignment****' print 'sequence:', alignment.title print 'length:', alignment.length print 'e value:', hsp.expect BCHB524 - 2008 - Edwards

  17. Running BLAST from Python • Python can run other programs, including blast and capture the output import os command = r'bin\blastall.exe -p blastp -i query.fasta -d db\yeast.aa' result_handle = os.popen(command) for l in result_handle: if l.startswith('Query='): print '\n'+l.rstrip()+'\n' if l.startswith('ref|'): print l.rstrip() BCHB524 - 2008 - Edwards

  18. Running BLAST from BioPython • Will automatically format results as XML from Bio.Blast import NCBIStandalone blast_db = r'db\yeast.aa' blast_query = r'query.fasta' blast_exe = r'bin\blastall.exe' result_handle, error_handle = NCBIStandalone.blastall(blast_exe, "blastp", blast_db, blast_query) BCHB524 - 2008 - Edwards

  19. NCBI Blast Parsing • Results need to be parsed in order to be useful… from Bio.Blast import NCBIXML for blast_result in NCBIXML.parse(result_handle): for alignment in blast_result.alignments: for hsp in alignment.hsps: if hsp.expect < 1e-5: print '****Alignment****' print 'sequence:', alignment.title print 'length:', alignment.length print 'e value:', hsp.expect print hsp.query[0:75] + '...' print hsp.match[0:75] + '...' print hsp.sbjct[0:75] + '...' BCHB524 - 2008 - Edwards

  20. Each blast result contains multiple alignments of a query sequence to a database sequence Each alignment consists of multiple high-scoring pairs (HSPs) Each HSP has stats like expect, score, gaps, and aligned sequence chunks NCBI Blast Parsing BCHB524 - 2008 - Edwards

  21. NCBI Blast Parsing • Blast parsing skeleton from Bio.Blast import NCBIXML for blast_result in NCBIXML.parse(result_handle): # each blast_result corresponds to one query sequence # blast_result.query is query description, etc. # blast_result.descriptions contains one-line summary of alignments for alignment in blast_result.alignments: # each alignment corresponds to one database sequence # alignment.title is database description for hsp in alignment.hsps: # each query/database alignment consists of multiple # high-scoring pair alignment "chunks" # HSP statistics are here # hsp.expect, hsp.score, hsp.positives, hsp.gaps BCHB524 - 2008 - Edwards

  22. Lab exercises • Try each of the examples shown in these slides. • Read through NCBI's documentation for the standalone tools. • Experiment with the different BLAST tools (blastn, tblastx, etc…) and programs included (blastclust,megablast). BCHB524 - 2008 - Edwards

  23. Lab exercises • Find putative fruit fly / yeast orthologs • Download FASTA file drosph.aa.gz from NCBI • Download FASTA file yeast.aa.gz from NCBI • Uncompress and format each FASTA file for BLAST • Search fruit fly proteins against yeast proteins • For each fruit fly query, output the best yeast protein with a significant HSP • For each yeast query, output the best fruit fly protein with a significant HSP • Find fruit fly / yeast protein pairs which are mutual best hits. BCHB524 - 2008 - Edwards

More Related