1 / 40

Sam Danziger Institute For Genomics and Bioinformatics Department of Biomedical Engineering

Choosing where to look next in a mutation sequence space: Active Learning of informative p53 cancer rescue mutants. Sam Danziger Institute For Genomics and Bioinformatics Department of Biomedical Engineering University of California, Irvine www.SamDanziger.com. Jue Zeng

Télécharger la présentation

Sam Danziger Institute For Genomics and Bioinformatics Department of Biomedical Engineering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Choosing where to look next in a mutation sequence space:Active Learning of informative p53 cancer rescue mutants Sam DanzigerInstitute For Genomics and Bioinformatics Department of Biomedical Engineering University of California, Irvine www.SamDanziger.com Jue Zeng Department of Medicine Rainer Brachmann Department of Medicine Richard Lathrop Department of Computer Science University of California, Irvine

  2. Outline • Overview: Computer Guided Discovery • Problem: Cancer and p53 • Results: Best Active Learning • Next: Future Experiments

  3. Computer Guided DiscoveryOf “Active” Mutant Proteins • Starting Point: A biomedically important protein with some known mutants. • Problem: Find novel mutant proteins with an “Active” phenotype. • Naive Solution: Make and test all other possible mutants in the wet lab. Known Mutants Other Possible Mutants

  4. Known Mutants Why Use Computers? Assuming up to 5 mutants in 200 residuesHow Many Mutants are There?: ~10^11 Known Mutants: ~10^2 Spiral Galaxy M101 http://hubblesite.org/ ~10^9 stars.

  5. A Better Solution: Active Learning Pick the best unknown mutants to know Unknown Known Example N+1 Example 1 Train the Classifier Example N+2 Classifier Example 2 Example N+3 Example 3 Choose an Example to Label Example N+4 … … Example N Example M Training Set Add the New Example To Training Set

  6. Unknown Mutant 1 1 Unknown Mutant 2 2 An Example of Active Learning:Minimum Marginal Hyperplane Should unknown Mutant 1 or Mutant 2 be added to the training set? INACTIVE Known Inactive 1 Known Active 2 ACTIVE SelectMutant 2

  7. Another Example: Maximum Curiosity Should Mutant 1 or Mutant 2 be added to the training set? Change in correlation coefficient Training Set Training Set + Mutant 1(Active) Cross-validator .0411 Training Set + Mutant 1(Inactive) Cross-validator -.6014 Training Set + Mutant 2(Active) Cross-validator .0309 Training Set + Mutant 2(Inactive) Cross-validator .0276 SelectMutant 1

  8. Known Active Known Inactive Unclassified SelectedUnclassified OK A Third Example:Entropic Tradeoff INACTIVE OK OK OK ACTIVE

  9. Which is the Best Active Learning Method? TYPE I: Select mutants that most improve the classifier if correctly predicted. • Maximum Curiosity • Composite Classifier • Improved Composite Classifier TYPE II: Select mutants that most improve the classifier. • Additive Curiosity • Additive Bayesian Surprise TYPE III: Common methods taken from the literature. • Minimum Marginal Hyperplane • Maximum Entropy TYPE IV: Variations on methods from the literature. • Maximum Marginal Hyperplane • Minimum Entropy • Entropic Tradeoff TYPE C: Controls • Non-iterated Prediction • Predict All Inactive • Random (30 trials)

  10. Outline • Overview: Computer Guided Discovery • Problem: Cancer and p53 • Results: Best Active Learning • Next: Future Experiments

  11. The Problem: p53 and Cancerp53 mutations occur in ~50% of human cancers • Tumor Suppressor Protein. • Receives upstream signals indicating cellular stress. • Acts as a transcription factor in the cancer suppression pathway. p53 core domain bound to DNA Image Generated with UCSF Chimera Cho, Y.,  Gorina, S.,  Jeffrey, P.D.,  Pavletich, N.P. Crystal structure of a p53 tumor suppressor-DNA complex: understanding tumorigenic mutations. Sciencev265pp.346-355 , 1994

  12. The p53 Cancer Pathway David W. Meek: http://www.dundee.ac.uk/biomedres/meek.htm

  13. 249 235+240 The Concept of “Cancer Rescue”:Second-site Suppressor Mutations 273 248 175 245 282 N C 102-292 324-355 1-42 Transactivation Core domain for DNA binding Tetramerization Cancer mutation prevalence data from the IARC p53 database: http://www-p53.iarc.fr/

  14. Immediate Goal Ultimate Goal Find novel p53 Cancer Rescue Mutants. Intermediate Goal Advance medical practice by revealing p53 mutant functional properties across p53’s mutation sequence space. + = Inactive p53Cancer Mutant Functionally Active Rescued p53 Engineered Small MoleculeDrug

  15. Evaluating Cancer Rescue Mutants in the Wet Lab INACTIVE ACTIVE A Yeast containing an inactive p53 cancer mutant will not grow. A Yeast containing an active p53 cancer rescue mutant will grow. Baroni, T.E., Wang, T., Qian, H., Dearth, L.R., Truong, L.N., Zeng, J., Denes, A.E., Chen, S.W. and Brachmann, R.K. (2004) A global suppressor motif for p53 cancer mutants. Proc Natl Acad Sci U S A, 101, 4930-5.

  16. In Vitro Phenotype

  17. Knowledge Model Experiment In a Nutshell Cancer Rescue Mutants Use Active Learning to select the p53 mutants that will be the most informative. Test the predictions in-vitro. Build classifiers of putative p53 cancer rescue mutants. Find all p53 cancer rescue mutants

  18. Outline • Overview: Computer Guided Discovery • Problem: Cancer and p53 • Results: Best Active Learning • Next: Future Experiments

  19. The Active Learning Tradeoff:How Fast Does It Learn?

  20. The Active Learning Tradeoff:How Accurate On The Chosen?

  21. The Tradeoff Entropic Tradeoff Maximum Curiosity Geometric Distance? How Accurate on the Chosen? Area? Length * Width Sum? Length + Width Minimum Marginal Hyperplane How Fast Does It Learn? Solution: Average Score of All Three Metrics

  22. The Overall Best

  23. How Fast Does It Learn?The Three Previous Examples

  24. How Accurate On The Chosen? The Three Previous Examples

  25. Why Does Random Do So Well? Very Few Examples Tong, S. and D. Koller (2002). "Support vector machine active learning with applications to text classification." The Journal of Machine Learning Research2: 45-66.

  26. Outline • Overview: Computer Guided Discovery • Problem: Cancer and p53 • Results: Best Active Learning • Next: Future Experiments

  27. Exploring New p53 Regions • Each new p53 region potentially introduces new rescue mechanisms. • New pools of mutants restart the Active Learning problem. 273 248 281-289 175 113-124 245 282 C N p53 Core Domain

  28. Most Interesting or Most Interesting Active? Known Mutants Which Finds More Active Cancer Rescue Mutants? Select The Most Interesting Select The Most Interesting Active Iteration 1 Iteration 1 Iteration 2 Iteration 2 Iteration 3 Iteration 3

  29. Knowledge Theory Experiment Conclusion Find Cancer Rescue Mutants

  30. Baldi Lab Lathrop Lab Leuke Lab Luo Lab Brachmann Lab Pierre Baldi Jonathan Chen Hiroto Saigo S. Joshua Swamidass Richard Lathrop Gabe Moothart Ying Wang Ray Luo Qiang Lu Rainer Brachmann Jue Zeng Acknowledgments FundingNational Institute of Health ( p53: CA112560 ), UCI Office of Research and Graduate Studies, UCI Institute for Genomics and Bioinformatics ( BIT: LM007443 ), US Department of Energy (DOE)

  31. Knowledge Theory Experiment Questions? Find Cancer Rescue Mutants

  32. Most Interesting Region • Scan the p53 core domain to find the most interesting region.

  33. Create All Single Point Mutations in a Region in-vitro? CODA*: Assemble p53 using thermodynamically optimized oligonucleotides. Allow all possible mutations within a region. Assemble mutated region with cancer mutants to look for rescue mutants. *http://www.codagenomics.com/

  34. Knowledge Representation: Homology ModelingModeling done using Amber™ with zinc ion characteristics tuned by Dr. Qiang Lu working in Dr. Ray Lui’s lab. 1. Take a wild type crystal structure of the protein in question. 2. Substitute one or more amino acids to mutate the protein. 4. Minimize the energy of the new mutant protein. 3. Apply simulated physical laws to determine an energy function.

  35. Knowledge Representation: Features Simulated Structure -> String of Numbers • 1d: Sequence Mutation Features • s1d: Sequence Similarity Features • 2d: Surface Map Features • 3d: Atomic Position Features • 4d: “Time Dependant” Stability Information

  36. What is Machine Learning? Training: Set the parameters (W) with n features. Testing: Use the parameters (W) to predict unclassified examples

  37. Machine Learning Use Homology Modeling to guide biological research Modeling: How To Use It Computer Generated Structure Biology Make a protein and test it in-vitro PRO: Real CON: Slow Predict a protein structure in-silico PRO: Fast CON: Inaccurate, what does it tell us?

  38. Knowledge Model Experiment Maximum Curiosity Crossvalidate the training set with the chosen mutant and record the correlation coefficient. Choose a mutant from the test set that has not been considered yet. Assume the chosen is “Active” or “Inactive” Find the Mutants that Most Improve the Training Set Start with a training set of examples with known classes and an unclassed testset.

  39. Exploring New p53 Regions • Each new p53 region potentially introduces new rescue mechanisms. • New pools of mutants restart the Active Learning problem. p53 Core Domain 113-124 281-289

  40. Primary Collaborators Dr. Richard Lathrop School of Information and Computer Science Jue Zeng School of Medicine Dr. Rainer Brachmann School of Medicine

More Related