Peptide Sequence Pairwise Alignment using Hidden Markov Model Design

Peptide Sequence Pairwise Alignment using Hidden Markov Model - Design Document Life Science 20001396 IIS Lab 이청재

Design Modules Query Input ② Two peptide sequences Estimation Parameters ③ Query Processing Request of Parameters ① Learning Module Sample Input ④ Passing Parameters Score or Similarity ⑤ Matching Diagram Result Output

Sample Sample Peptide Sequences GPCR (G protein-coupled Receptor) peptide ligand library • There are about 500 peptides in the Phoenixpeptide site. Available Known Sequences : About 100 sequences These sequences are ready in the form of Excel file. Source : http://www.phoenixpeptide.com/PeptideLibraries/GPCRLibrary.htm

Estimation g : a gap of length d : a gap-open penalty Generally, d > e e : a gap-extension penalty • Estimation parameters : transition probability and emission probability • Estimation when paths are unknown • : Baum-Welch algorithms • Estimation 한 parameter를 이용하여, 실제 필요한 s(a,b), d, e를 구할 수 있다.

Query Input >Apelin LVQPRGSRNGPGPWQGGRRKFRRQRPRLSHKGPMPF Query Data Processing! >Ghrelin GSSFLSPEHQRVQQRKESKKPPAKLQPR • Data Format : FASTA Sequence Peptide Name >Apelin LVQPRGSRNGPGPWQGGRRKFRRQRPRLSHKGPMPF • Input : two sequences in interesting • Data preprocessing • 잘못된 형식의 data가 들어오면 error를 출력한다. • Sequence에 amino acids의 symbol이 아닌 것이 들어오면 error를 출력한다. • Sequence에 공백이 들어오면 공백을 삭제해준다.

Query Processing Estimation Parameters Query Processing Request of Parameters Learning Module Sample Input Passing Parameters • Hidden Markov Models : Optimal log-odds alignment using Viterbi Algorithms

Result • Visualization Example) Two sequences Score or Probability matrix Optimal Alignment Result Visualize desired result in Web browser or File

References 1. Richard Durbin, Sean R. Eddy, Anders Krogh, Graeme Mitchison. Biological Sequence Analysis : Probabilistic models of proteins and nucleic acids (2000). Cambridge University Press. • Available Lecture Note : http://bi.snu.ac.kr/Courses/bio02/bio02_2.html 2. Website For Bioinformatics : http://bioinfo.sarang.net/wiki/BioinfoWiki 3. Programs for biosequence analysis : http://www.dina.dk/~sestoft/bsa.html 4. Dynamic Programming : http://no-smok.net/nsmk/DynamicProgramming

Peptide Sequence Pairwise Alignment using Hidden Markov Model Design