200 likes | 309 Vues
About BoostThreader. Lee, Juyong 2009. 08. 26. What is BoostThreader?. A Sequence-Structure threading program Published by J. Xu’s group Known to be good for hard cases Does not work…… for me……. Let’s thread!. 준비물 : sequence protein structure scoring function algorithm.
E N D
About BoostThreader Lee, Juyong 2009. 08. 26
What is BoostThreader? • A Sequence-Structure threading program • Published by J. Xu’s group • Known to be good for hard cases • Does not work…… for me……
Let’s thread! • 준비물 : • sequence • protein structure • scoring function • algorithm Deletion Match F C D E B G A BAD Good
Three algorithms for Alignment! I’m Andrei AndreyevichMarkov. I’m your father • Generative model • Traditional • Hidden Markov Chain • Not that old • Conditional Random Field • Up to date • Dynamic programming
Dynamic programing • Finding the best scoring path on the alignment matrix Initial Final The path The alignment!
More about Dynamic Programming Follow the maximum scoring path! SEQUENCE deletion A ― g = Gap penalty = -1 g match f insertion STRUCTURE h A a F(i+1, j+1) ― a h = Gap penalty = -1
In Conventional seq.-str. alignment • Linear sum of similarities of properties • Functions for Match and Gap cases are only needed! • Fmatch= w1*predicted SS * real SS + w2*predicted SA * real SA + w3*predicted residue depth * real depth + … • Fgap= Opening penalty+# of gaps * Extension penalty • Only consider next step!
What’s different in BoostThreader? • Dependent on the current and next step both! • Nine scoring functions are necessary! • Gap penalty is context-dependent • Trained from reference alignments! • DALI, TMalign etc…… • Regression Trees are used as scoring function • Not Linear function!
쉬어가는페이지 Hey nature, Not all flies are not Drosophilia
Regression Tree! 100대의중고차 Training! 1500cc가 넘는가? 아니요 예 5년이 넘었는가? 아니요 예 20만km이상 뛰었는가? 아니요 예 평균 8백만원 평균 5백만원 평균 15백만원 평균 11백만원
Example in Threading Sequence – predicted properties Structure – observed properties SS 가같은가? 아니요 예 SA 정도가같은가? 아니요 예 SA 정도가 같은가? 아니요 예 확률 0.1 10개 중에 1개 확률 0.3 10개 중에 3개 확률 0.6 10개 중에 6개 확률 0.9 10 개중에 9개 Estimate Prob. from examples
Advantage of Tree • Fast • Interaction between variables can be easily considered
What’s really happening in BoostThreader? • Initial Setting • Set all F0 (uv,seq(i),str(j))= 0 • P ~ exp(F) • 30 개의 정답 Sequence-Structure alignment! • Calculate Prob. of all possible state transition! • Probabilities of all examples! • Forward-backward algorithm
“All Possible” Transitions? For MM AB–DE a b c d– mmimd AB ab AB bc AB cd Generate examples! BD ab BD bc BD cd DE ab DE bc DE cd
Examples(2) For MI AB–DE a b c d– mmimd B- ab B- bc B- cd Generate examples! A- ab A- bc A- cd D- ab D- bc D- cd E- ab E- bc E- cd
Inside BoostThreader • Examples and their probabilities • Calculated with the current scoring functions • Modify Scoring Functions • 정답이면 F값 증가! : F1=F0 + (1 – P ) • 오답이면 F값 감소! : F1=F0- P • Addtrees until prediction quality doesn’t increase • F=F0+F1+F2+F3+F4+F5+……
Summary • BoostThreader considers “Current” and “Next” step • Scoring function consists of Regression Trees • Trees are trained based on Examples~