260 likes | 381 Vues
This guide provides step-by-step instructions for using MS FrontPage and BLAST to create web pages and analyze genomic data specifically for Burkholderia pseudomallei. It covers locating known and unknown genes using GenBank, performing multiple sequence alignment via CLUSTALW, and reporting coding sequence (CDS) locations. This project facilitates the understanding of genetic sequences and assists in research related to bacterial genomes, emphasizing practical applications in bioinformatics.
E N D
LSM2104 Project Guidelines 18th February 2003
Outline • Setp 0: How to use MS-FrontPage to create web page(s) • Go through Step 1 of the project • Using known gene to BLAST against pseudomallei genome itself • Report the location of CDS • Go through Step 2 of the project • Using unknown gene from related organisms to BLAST against pseudomallei genome • Find and report the location of CDS • Using CLUSTALW to do MSA
Step 0 : Using MS-Front Page • Inserting email address • Inserting images & background • Inserting hyperlinks • Bookmark(s) within the page • Link(s) to other web pages • NOTE : uploading to the web server will be taught in later session
Step 1 : Find and locate a known B.pseudomallei gene on the genome • Goto Genbank/Swiss-Prot: http://www3.ncbi.nlm.nih.gov/ • Type in keyword: pseudomallei flagellin complete cds • Look at the entry : AF030239 • Extract the protein sequence (in FASTA format) to do Blast
Blast Overview • Blastp: • An amino acid query sequence against protein sequence database • Blastn: • A nucleotide query sequence against nucleotide sequence database • Blastx: • A nucleotide query sequence translated in all reading frames against protein sequence database • Tblastn: • A protein query sequence against a nucleotide sequence database dynamically translated in all reading frames • Tblastx: • Compares the six-frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide sequence database.
How to BLAST • Perform tBLASTn at http://sf01.bic.nus.edu.sg/blast/blast.html • Output explanation (http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/glossary2.html) • Score • Expectation value • Identities • Positives • Frame • Report the location of CDS
Your Assigned Gene for Step 1 Must include at least this gene Q9RGS8 : Non-hemolytic phospholipase C, found from Swiss-Prot
Step 2: Find and locate unknown genes in Burkholderia pseudomallei • Minimum: must include at least the assigned 2 genes from Burkholderia genus (but unknown to B. pseudomallei) • Extract the protein sequences out to tBLASTn against B. pseudomallei genome. • Find and report the locations of the CDS • Use ClustalW to reconfirm your findings.
Reporting location of CDS BLAST results B.P genome Complete CDS Matched fragment Query sequence
Pick a known gene from related organism in same genus For example • N-acyl homoserine lactone synthase (AAK70351) • from Burkholderia stabilis
http://sf01.bic.nus.edu.sg/extract/ • Extension of ends • Direction
BpsChrom2 Burkholderia pseudomallei chromosome 2 (1177310-1178131 ) atcgcgccgcgcgcgcgaaacacgagcccctgtctgccgagccgcacgagcggcaggcgt tcggcgaacgacgggaacgcgacggcgatgcgggtttcgccggcatcgaacgtagggagc atcgcgcgaaataccgttgaatggtccacggtgtagaggtctccttgaatgacgaacggc gcggccccgaagcggggcgaccgggcgcgctcaggcggcttcggcgggcggcgcgcacag cagcgggtcgagatcgagcgcggcgagcgtttgcgcgtcgaggtcgatccagcacgcgac gaccatgcgcccgtcgatctgctgcgcgggccccgcccggtgcgcgtgcacgccgatccg gcggaacaggcgctccatgctcagaaacgtcacgccgatcagttgcttcgcgccaagccg cgcggcgcactcgacgacggcggcgagcatcggccgcaccgcccaggccgggttgccgcc cccggccggatcctcggcgttcgcggcgaagcgcgacaattcccagacggcggcggattg cggcaacggcatgtcttgcgcgaccagcgtcgggaacagttccttcagcagatacgggcg ggtcgtcggcagcagccgggcgcagccgcagatttccccgtcgtcgtcgcgggcgaacac atagacggtatcgtcgcgatcgtactgatcccgctcgaacccttcgcttgccgacggcag tttccagccgagctgctcgacgaaaactcggtgccgataaaggcccagatcagccgccaa gtcgctcggcaggcgcccgtcgccatgaacgaaagttcgcat
How to use ClustalW • Input sequences: • 4 Flagellin genes from different organisms, save them into one file in fasta format from GenBank: • AJ496283. Legionella longbe...[gi:22553065] • AJ496282. Fluoribacter boze...[gi:22553063] • AF030239. Burkholderia pseu...[gi:3337408] • AF307102. Borrelia parkeri ...[gi:11095317] • Copy and paste into the window
Symbol Found In the Results: '*' indicates positions which have a single, fully conserved residue • ':' indicates that one of the following 'strong' groups is fully conserved:-STA NEQK NHQK NDEQ QHRK MILV MILF HY FYW • '.' indicates that one of the following 'weaker' groups is fully conserved:-CSA ATV SAG STNK STPA SGND SNDEQK NDEQHK NEQHRK FVLIM HFY
Your minimum genes for step 2 Include at least these 2 genes from GenBank from Burkholderia genus: • AF525414 • AF333004
Step 1 (cont.) • Goto http://sf01.bic.nus.edu.sg/extract/e2.html • Type in the start & end base number (e.g. extend by 150 nucleotides) • Start position : (2902870 – 150) • End position : (2904966 + 150) • Explain why & when need to +/-
Step 1 • Extract out the nucleotide sequence from the output • Do a translation, paste in the sequence at http://sf01.bic.nus.edu.sg/translate/ • Extract the amino acid residues sequence • Do a BLAST again on the extracted sequence at http://sf01.bic.nus.edu.sg/blast/blast.html