PEAKS: De Novo Sequencing using MS/MS spectra
280 likes | 530 Vues
PEAKS: De Novo Sequencing using MS/MS spectra. Bin Ma, U. Western Ontario, Canada Kaizhong Zhang, U. Western Ontario, Canada Chengzhi Liang, Bioinformatics Solutions Inc. Canada. Outline. Background Tandem Mass Spectrometry De novo sequencing Problem Definition and Algorithm.
PEAKS: De Novo Sequencing using MS/MS spectra
E N D
Presentation Transcript
PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang, U. Western Ontario, Canada Chengzhi Liang, Bioinformatics Solutions Inc. Canada
Outline • Background • Tandem Mass Spectrometry • De novo sequencing • Problem Definition and Algorithm. • Software implementation – PEAKS • Future work
Background • Human has 100,000 different proteins. Because of the existence of post translational modifications, each protein can have many different versions. • Diseases are closely related to the abnormal proteins or the expression levels of proteins. • Given a tissue, the identification of the proteins (and their modified versions) in it is a fundamental problem for the drug design.
Proteins and Peptides • A protein is a sequence of 20 different types of amino acids. • A protein is a string over alphabet with size 20 • A peptide is a substring of the protein. • The 20 amino acids have 19 distinct masses. • I and L have the same mass and cannot (difficult) be distinguished by MS/MS. • Regard them as the same letter.
tissue protein gel fraction …VITK | GTDIMNEMR | SMW… peptide Tandem Mass Spectrometry • MS/MS is the only reliable way for protein identification.
database de novo sequencing: LGSSEVEQVQLVVDGVK peptide sequence: LGSSEVEQVQLVVDGVK tandem mass spectrometer: MS/MS spectrum
How Does a Peptide Fragment? m(b1)=1+m(A1) m(b2)=1+m(A1)+m(A2) m(b3)=1+m(A1)+m(A2)+m(A3) m(y1)=19+m(A4) m(y2)=19+m(A4)+m(A3) m(y3)=19+m(A4)+m(A3)+m(A2)
De Novo Sequencing • For any peptide P= a1…an, m(P) = Σi ai. • De Novo Sequencing • Given a spectrum, a mass value m, compute a sequence P, s.t. m(P)=m, and the matching score score(P) is maximized.
19 Y-ions Determined By a Suffix y1 y2 y3 score(Q) can be defined for a suffix Q.
Strategies • Consider a pair of prefix R and a suffix Q simultaneously. • Consider only those pairs (R,Q) that satisfy a nice property, which we call “chummy” • Chummy pairs allow: • The score of a chummy pair can be computed recursively from a smaller chummy pair. • There are a series of chummy pairs that grow to the optimal solution.
Dynamic Programming • Combining Lemma A, B, we can compute • Suppose (R,Q) is the pair maximizing DP(u,v) under the condition m(R)+m(Q)+a=m. Then RaQ is the optimal peptide.
Comparison of PEAKS and Lutefisk Red = Correct
Implementation Particulars • More accurate scoring: • sum of the logarithmic intensities • many other ion types • coexisting ions, e.g., x2, y2, z2 • Deconvolution • converting multiply-charged peaks to singly-charged ones • Recalibration • compress/stretch the spectrum for calibration error • Noise reduction
Acknowledgement • Bin Ma, Kaizhong Zhang were supported by NSERC. • Chengzhi Liang was supported by BSI. • Thanks the development team in BSI for the software development.
Tandem Mass Spectrometer detector ions precursor ions fragment ions + mass analyzer P + + AK mass analyzer MPSER PAK + + + + + + fragment P AK AK PA K P + PAK PAK + + K + PAK PA SG… + + PAK PA K … de novo sequencing
Algorithm Sandwich • DP(0,0) = 0;DP(u,v) = -infinity for (u,v)!=(0,0); • for u from 1 to m/2 do for v from u-max(a) to u+max(a) do for a in Σ do if u<v then else • find u,v,a, s.t. u+v+a=m and DP(u,v) maximized; • backtracking;
Dynamic Programming • for u from 0 to m • backtracking
Dynamic Programming • We hope DP(u,v) for u+v=m gives the optimal prefix and suffix. • The optimal solution can be obtained by concatenation of the prefix and suffix.
Chummy Pairs • Two strings Ra and bQ are called chummy pairs, iff. either of the following two is true: (C1) (C2) (LGE, LVR) (C2) (LGE, VR) (C1) (LGE, R) (C1) (LG,VR) is not chummy
Chummy pairs • Lemma A – Suppose Ra and bQ are a chummy pair. u=m(Ra), v=m(bQ). If (C1) is true, If (C2) is true,
Chummy Pairs • Lemma B – Let P be the optimal solution. Then there is a chummy pair (R,Q) and a letter a such that P=RaQ. Also, there is a chummy pair series such that