160 likes | 378 Vues
File formats and conversions. Important formats. How Fasta Raw/Peptide Tab. How. One or more entries First line Length of sequence (6 digits right aligned) Name of sequence Next lines Sequence, usually 80 characters pr line Last lines Assignments of the positions in the sequence.
E N D
Important formats • How • Fasta • Raw/Peptide • Tab
How • One or more entries • First line • Length of sequence (6 digits right aligned) • Name of sequence • Next lines • Sequence, usually 80 characters pr line • Last lines • Assignments of the positions in the sequence
How file 553 ATP0_BOVIN_1E79.C MLSVRVAAAVARALPRRAGLVSKNALGSSFIAARNLHASNSRLQKTGTAEVSSILEERILGADTSVDLEETGRVLSIGDG IARVHGLRNVQAEEMVEFSSGLKGMSLNLEPDNVGVVVFGNDKLIKEGDIVKRTGAIVDVPVGEELLGRVVDALGNAIDG KGPIGSKARRRVGLKAPGIIPRISVREPMQTGIKAVDSLVPIGRGQRELIIGDRQTGKTSIAIDTIINQKRFNDGTDEKK KLYCIYVAIGQKRSTVAQLVKRLTDADAMKYTIVVSATASDAAPLQYLAPYSGCSMGEYFRDNGKHALIIYDDLSKQAVA YRQMSLLLRRPPGREAYPGDVFYLHSRLLERAAKMNDAFGGGSLTALPVIETQAGDVSAYIPTNVISITDGQIFLETELF YKGIRPAINVGLSVSRVGSAAQTRAMKQVAGTMKLELAQYREVAAFAQFGSDLDAATQQLLSRGVRLTELLKQGQYSPMA IEEQVAVIYAGVRGYLDKLEPSKITKFENAFLSHVISQHQALLSKIRTDGKISEESDAKLKEIVTNFLAGFEA -------------------------------------------------------------...SS.TTTEEEEEEEETT EEEEEE.TT.BTTEEEEETTS.EEEEEEE.SS.EEEEESS.GGG..TT.EEEEEEEESEEE.SGGGTT.EE.TTS.B.SS S.....S.EEETT.....STTB....SB...S.HHHHHHS..BTT.B.EEEESTTSSHHHHHHHHHHHTHHHHSSS.GGG ..EEEEEEES..HHHHHHHHHHHHHHT.GGGEEEEEE.TTS.HHHHHHHHHHHHHHHHHHHHTT.EEEEEEETHHHHHHH HHHHHHHTT....GGGS.TTHHHHHHHHHTT..BB.GGGTS.EEEEEEEEE.STT.TTSHHHHHHHTTSSEEEEE.HHHH HHT.SS.B.TTT.EESSGGGGS.HHHHHHHTTHHHHHHHHHHHHHHHTT.....HHHHHHHHHHHHHHHHT...SS.... HHHHHHHHHHHHTSTTTTS.GGGHHHHHHHHHHHHHHH.HHHHHHHHHHTS..HHHHHHHHHHHHHHHHHHH.
Fasta • One or more entries • First line • The character “>” • The name • Optional descriptions not read by all readers • Rest of lines • The sequence usually 50-80 characteres per line
Raw/peptide • Short sequences • One peptide per line
Tab format • One or more entries • One entry per line • Tab delimited fields • Name • Sequence • Assignments/features
Converters • Saco_convert • From/To • How • Fasta • Tab • Makefsa • Raw peptides to fasta peptides
Databases - ready for BLAST • SwissProt • PDB • GenBank • nr • Non redundant set of proteins from the above plus TREMBL, PIR and others • sptr_nrdb • Non redundant set of proteins from SwissProt and TREMBL
BLAST routines - single search • blastp • aadb aaquery • blastn • ntdb ntquery • blastx • aadb ntquery • tblastn • ntdb aaquery • tblastx • ntdb ntquery
Blastpgp - iterative blast • Repetetive searches with AA query through an AA database • Results in hits plus an optional position specific scoring matrix
The actual search • Query is single file in FASTA format • Costum databases need to be initially formatted from sets in FASTA format • Use setdb program for protein sequence databases (i.e., blastp and blastx) • Use pressdb program for nucleotide sequence databases (i.e., blastn and tblastn) • Use formatdb for blastpgp (psiblast)
Conversion exersise • Convert the file A1.rsee.test to fasta format • Convert the file ss_sub300.how to fasta format
Blast • Take the first entry in ss_sub300.how and blastp it against ss_sub300.how and PDB • Make a position specific scoring matrix for the entry using psiblast and nr and save the profile as binary and readable matrices • Use the binary matrix to search against PDB and ss_sub300.how