1 / 19

Matlab Bioinformatics Toolkit Evaluation

Matlab Bioinformatics Toolkit Evaluation . Kanishka Bhutani. What I expected ??. Local/Global sequence alignments. Multiple sequence alignments. Choice of different scoring matrices (BLOSUM, PAM) for evaluation. Build Hidden Markov Models.

zuzana
Télécharger la présentation

Matlab Bioinformatics Toolkit Evaluation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Matlab Bioinformatics Toolkit Evaluation Kanishka Bhutani

  2. What I expected ?? • Local/Global sequence alignments. • Multiple sequence alignments. • Choice of different scoring matrices (BLOSUM, PAM) for evaluation. • Build Hidden Markov Models. • Easily import sequences from databases (PFAM,PDB, Swissprot)

  3. What I found ?? • Most of the features. • “Bonus” = Microarray normalization tools. Microarray Visualization tools including box plots, heat maps.

  4. Any surprises ? • No “Multiple sequence alignments” • Avg./Std Dev. of hydrophobicity, solvent accessibility : Command ? • “Proteinplot”- GUI for protein structure analysis. • Import your file to view, select parameters and display stats.

  5. What all I tried? • Local alignment, Global alignment. • For short sequences: swalign(‘seq1’,’seq2’) nwalign(‘seq1’,’seq2’) seq1,seq2: AA or NT sequences. • For ‘imported’ long sequences: Convert seq into a vector of integer values Commands: nt2int, aa2int

  6. Pairwise Sequence alignment • S = getgenbank(‘NM_00001’) • M= getgenbank(‘NM_00002’) • Output : Header and a sequence. • K=nt2int(S.Sequence) B=nt2int(M.Sequence) [sc,align] = nwalign [K,B] Alignment Score Aligned seq.

  7. Getting sequences : V Easy ! • ‘getgenbank’: Retrieve sequence information from Genbank database. • ‘getembl’: Retrieve seq. information from EMBL database. • ‘getpept’: Retrieve seq information from Genpept database. • ‘gethmmprof’: Get HMM from the PFAM database.

  8. Experiment • hmmodel = gethmmprof(‘PF00001’)

  9. Visualization of model Showhmmprof (hmmodel,’scale’,’logodds’)

  10. Get GPCR seq’s • S = getgenbank (‘NM_024531’) • disp (S.Sequence)

  11. Alignment of the seq’s • var = gethmmalignment (‘PF00001,’type’,’seed’) • disp [char(var.Header) char (var.Sequence)]

  12. For GPCR Family C • Similarly for diff families. • Multiple aligned sequences retrieved.

  13. GUI proteinplot • User friendly. • Avg./ Std. dev values for: Hydrophobicity. Secondary structure propensity (Alpha helices or beta strands) Accessibility (accessible and buried residues)

  14. Mglur1 plot (Proteinplot)

  15. Mglur1 results

  16. Test a seq. with HMM • Retrieve mglur1 from Genbank mgr = getgenbank (‘NM_012407’) glusequence = mgr.sequence • Test it with the HMM model class A [a.sglu] = hmmprofalign (model A, glusequence,’showscore’,true) • Score = -203.53 • Seq =

  17. Log odd score plot for best path

  18. Difficulties & questions • No multiple sequence alignment. • Demos: Not very helpful. • Difficult to view the sequences as no “disp” command found. • Bugs: Storing huge sequences (GPCR A) in a file, parsing error. HMMprofdemo command abruptly stops and gives errors. • Proteinplot (GUI) hangs the machine often. • Verify the sequences using the HMM models ?? • Regular expression matches and highlighting those positions??

  19. Suggestions of experiment • Given an unknown sample dataset of proteins, known dataset of proteins (known structural information). • Utilize the BLMT to extract ‘over expressed’ 4 Grams in a protein sequence or a group of protein sequences from the known set. • Use “search for regular expression” function in the Matlab toolkit to look for those ‘4 Grams’ in unknown proteins and hence predict their structure.

More Related