Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University

Inferring strengths of protein-protein interactions from experimental data using linear programming Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University

Overview • Background • Probabilistic model • Related work • Biological experimental data • Proposed methods • For binary data • For numerical data • Results of computational experiments • Conclusion

Background (1/3) • Understanding protein-protein interactions is useful for understanding of protein functions. • Transcription factors • Proteins interact with a factor. • Regulate the gene. • Receptors, etc.

Background (2/3) • Various methods were developed for inference of protein-protein interactions • Gene fusion/Rosetta stone (Enright et al. and Marcotte et al. 1999) • Number of possible genes to be applied is limited. • Molecular dynamics • Long CPU time • Difficult to predict precisely

Background (3/3) • A Model based on domain-domain interactions hasbeen proposed. • Use domains defined by databases like InterPro or Pfam. Domain Domain

Probabilistic model of interaction (1/2) • Model (Deng et al., 2002) • Two proteins interact. At least one pair of domains interacts. • Interactions between domains are independent events. D3 D1 P1 P2 D2 D2 D4

Probabilistic model of interaction (2/2) • : Proteins Pi and Pj interact • : Domains Dm and Dn interact • : Domain pair (Dm ,Dn) is included in protein pair PiXPj

Overview • Background • Probabilistic model • Related work • Association method (Sprinzak et al., 2001) • EM method (Deng et al., 2002) • Biological experimental data • Proposed methods • Results of computational experiments • Conclusion

Related work • INPUT: • interacting protein pairs (positive examples) • non-interacting protein pairs (negative examples) • OUTPUT: Pr(Dmn=1) for all domain pairs

Association method (Sprinzak et al., 2001) • Inference of probabilities of domain-domain interactions using ratios of frequencies • : Number of interacting protein pairs that include (Dm, Dn) • : Number of protein pairs that include (Dm, Dn)

EM method (Deng et al.,2002) • Probability (likelihood L) that experimental data {Oij={0,1}} are observed. • Use EM algorithm in order to (locally) maximize L. • Estimate Pr(Dmn=1)

Biological experimental data • Related methods (Association and EM) use only binary data (interact or not). • Experimental data using Yeast 2 hybrid • Ito et al. (2000, 2001) • Uetz et al. (2001) • For many protein pairs, different results (Oij= {0,1}) were observed. • We developed new methods using raw numerical data.

Numerical data • Ito et al. (2000,2001) • For each protein pair, experiments were performed multiple times. • IST (Interaction Sequence Tag) • Number of observed interactions • By using a threshold, we obtain binary data.

It seems difficult to modify EM method for numerical data. Linear Programming For binary data LPBN Combined methods LPEM EMLP SVM-based method For numerical data ASNM LPNM Proposed methods

LPBN (LP-based method)(1/2) • Transformation into linear inequalities • PiandPjinteract

LPBN (LP-based method)(2/2) • Linear programming for inference of protein-protein interactions

Combination of EM and LPBN • LPEM method • Use the results of LPBN as initial parameter values for EM. • EMLP method • Constrains to LPBN with the following inequalities so that LP solutions are close to EM solutions.

Simple SVM-based method • Feature vector • Simple linear kernel with • Interacting pairs = Positive examples • Non-interacting pairs = Negative examples

Strength of protein-protein interaction • For each protein pair, experiments were performed multiple times. • The ratio can be considered as strength. • Kij : Number of observed interactions for a protein pair (Pi,Pj) • Mij : Number of experiments for (Pi,Pj)

LPNM method (1/2) • Minimize the gap between Pr(Pij=1) and using LP.

LPNM method (2/2) • Linear programming for inference of strengths of protein-protein interactions

ASNM • Modified Association method for numerical data • For binary data (Sprinzak et al., 2001)

Overview • Background • Probabilistic model • Related work • Biological experimental data • Proposedmethods • For binary data • For numerical data • Results of computational experiments • Conclusion

Computational experimentsfor binary data • DIP database (Xenarios et al., 2002) • 1767 protein pairs as positive • 2/3 of the pairs for training, 1/3 for test • Computational environment • Xeon processor 2.8 GHz • LP solver: loqo

Results on training data (binary data) EM Association LPBN SVM

Results on test data (binary data) EM EMLP LPEM SVM Association

Computational experimentsfor numerical data • YIP database (Ito et al., 2001, 2002) • IST (Interaction Sequence Tag) • 1586 protein pairs • 4/5 for training, 1/5 for test • Computational environment • Xeon processor 2.8 GHz • LP solver: lp_solve

Results on test data (numerical data) ASNM LPNM EM Association

Results on test data (numerical data) • LPNM is the best. • EM and Association methods classify Pr(Pij=1) into either 0 or 1.

Conclusion • We have defined a new problem to infer strengths of protein-protein interactions. • We have proposed LP-based methods. • For binary data • LPBN, LPEM, EMLP • SVM-based method • For numerical data • ASNM • LPNM • LPNM outperformed the other methods.

Future work • Improve the methods to avoid overfitting. • Improve the probabilistic model to understand protein-protein interactions more accurately.

Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University

Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University

Presentation Transcript

Toshiya Namikawa ( Kyoto University )

Cornell University Bioinformatics Facility

Cornell University Bioinformatics Facility

Koichiro Yoshino , Shinsuke Mori and Tatsuya Kawahara Kyoto University, Japan

Ballet By Karin Ueda

Tsinghua University-Kyoto University International Symposium

Kyoto University Masa-aki SAKAGAMI

Yasushi Tsubota, Tatsuya Kawahara, Masatake Dantsuji Kyoto University, Japan

Yuya Akita , Tatsuya Kawahara

The Galactic Center Diffuse X-Rays Katsuji Koyama , Kyoto University

Morihiro Hayashida, Nobuhisa Ueda, Tatsuya Akutsu Bioinformatics Center, Kyoto University

Bioinformatics Center of Expertise

Hirotaka Ito YITP, Kyoto University

CENTER FOR GENOMICS AND BIOINFORMATICS

Kiichiro Yagi (Kyoto University)

Yasuo OKABE Academic Center for Computing and Media Studies, Kyoto University

Tatsuya Kawahara (Kyoto University, Japan) kawahara@i.kyoto-u.ac.jp

Tatsuya Akutsu Bioinformatics Center Institute for Chemical Research Kyoto University

Kyoto Bunkyo University Yuji BABA

Yasuo OKABE Academic Center for Computing and Media Studies, Kyoto University

Tatsuya Kawahara (Kyoto University, Japan)

Kyoto University H. Nanjo