1 / 16

File formats and conversions

File formats and conversions. Important formats. How Fasta Raw/Peptide Tab. How. One or more entries First line Length of sequence (6 digits right aligned) Name of sequence Next lines Sequence, usually 80 characters pr line Last lines Assignments of the positions in the sequence.

jabir
Télécharger la présentation

File formats and conversions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. File formats and conversions

  2. Important formats • How • Fasta • Raw/Peptide • Tab

  3. How • One or more entries • First line • Length of sequence (6 digits right aligned) • Name of sequence • Next lines • Sequence, usually 80 characters pr line • Last lines • Assignments of the positions in the sequence

  4. How file 553 ATP0_BOVIN_1E79.C MLSVRVAAAVARALPRRAGLVSKNALGSSFIAARNLHASNSRLQKTGTAEVSSILEERILGADTSVDLEETGRVLSIGDG IARVHGLRNVQAEEMVEFSSGLKGMSLNLEPDNVGVVVFGNDKLIKEGDIVKRTGAIVDVPVGEELLGRVVDALGNAIDG KGPIGSKARRRVGLKAPGIIPRISVREPMQTGIKAVDSLVPIGRGQRELIIGDRQTGKTSIAIDTIINQKRFNDGTDEKK KLYCIYVAIGQKRSTVAQLVKRLTDADAMKYTIVVSATASDAAPLQYLAPYSGCSMGEYFRDNGKHALIIYDDLSKQAVA YRQMSLLLRRPPGREAYPGDVFYLHSRLLERAAKMNDAFGGGSLTALPVIETQAGDVSAYIPTNVISITDGQIFLETELF YKGIRPAINVGLSVSRVGSAAQTRAMKQVAGTMKLELAQYREVAAFAQFGSDLDAATQQLLSRGVRLTELLKQGQYSPMA IEEQVAVIYAGVRGYLDKLEPSKITKFENAFLSHVISQHQALLSKIRTDGKISEESDAKLKEIVTNFLAGFEA -------------------------------------------------------------...SS.TTTEEEEEEEETT EEEEEE.TT.BTTEEEEETTS.EEEEEEE.SS.EEEEESS.GGG..TT.EEEEEEEESEEE.SGGGTT.EE.TTS.B.SS S.....S.EEETT.....STTB....SB...S.HHHHHHS..BTT.B.EEEESTTSSHHHHHHHHHHHTHHHHSSS.GGG ..EEEEEEES..HHHHHHHHHHHHHHT.GGGEEEEEE.TTS.HHHHHHHHHHHHHHHHHHHHTT.EEEEEEETHHHHHHH HHHHHHHTT....GGGS.TTHHHHHHHHHTT..BB.GGGTS.EEEEEEEEE.STT.TTSHHHHHHHTTSSEEEEE.HHHH HHT.SS.B.TTT.EESSGGGGS.HHHHHHHTTHHHHHHHHHHHHHHHTT.....HHHHHHHHHHHHHHHHT...SS.... HHHHHHHHHHHHTSTTTTS.GGGHHHHHHHHHHHHHHH.HHHHHHHHHHTS..HHHHHHHHHHHHHHHHHHH.

  5. Fasta • One or more entries • First line • The character “>” • The name • Optional descriptions not read by all readers • Rest of lines • The sequence usually 50-80 characteres per line

  6. Raw/peptide • Short sequences • One peptide per line

  7. Tab format • One or more entries • One entry per line • Tab delimited fields • Name • Sequence • Assignments/features

  8. Converters • Saco_convert • From/To • How • Fasta • Tab • Makefsa • Raw peptides to fasta peptides

  9. Databases at CBS

  10. Databases - ready for BLAST • SwissProt • PDB • GenBank • nr • Non redundant set of proteins from the above plus TREMBL, PIR and others • sptr_nrdb • Non redundant set of proteins from SwissProt and TREMBL

  11. BLAST routines - single search • blastp • aadb aaquery • blastn • ntdb ntquery • blastx • aadb ntquery • tblastn • ntdb aaquery • tblastx • ntdb ntquery

  12. Blastpgp - iterative blast • Repetetive searches with AA query through an AA database • Results in hits plus an optional position specific scoring matrix

  13. The actual search • Query is single file in FASTA format • Costum databases need to be initially formatted from sets in FASTA format • Use setdb program for protein sequence databases (i.e., blastp and blastx) • Use pressdb program for nucleotide sequence databases (i.e., blastn and tblastn) • Use formatdb for blastpgp (psiblast)

  14. Exercises

  15. Conversion exersise • Convert the file A1.rsee.test to fasta format • Convert the file ss_sub300.how to fasta format

  16. Blast • Take the first entry in ss_sub300.how and blastp it against ss_sub300.how and PDB • Make a position specific scoring matrix for the entry using psiblast and nr and save the profile as binary and readable matrices • Use the binary matrix to search against PDB and ss_sub300.how

More Related