1 / 62

Rational HIV vaccine design

Rational HIV vaccine design. Nebojsa Jojic and David Heckerman Machine Learning and Applied Statistics Microsoft Research. Collaborators. Vladimir Jojic, Microsoft/U Toronto Carl Kadie, Microsoft Jennifer Listgarten, Microsoft/U Toronto Chris Meek, Microsoft

gzifa
Télécharger la présentation

Rational HIV vaccine design

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Rational HIV vaccine design Nebojsa Jojic and David Heckerman Machine Learning and Applied Statistics Microsoft Research

  2. Collaborators • Vladimir Jojic, Microsoft/U Toronto • Carl Kadie, Microsoft • Jennifer Listgarten, Microsoft/U Toronto • Chris Meek, Microsoft • Brendan Frey, Microsoft/ U Toronto • Bette Korber, Los Alamos National Laboratory • Christian Brander, Harvard/MGH • Nicole Frahm, Harvard/MGH • Simon Mallal/ Royal Perth Hospital • Jim Mullins/ University of Washington

  3. Epitome as a model of diversity in natural signals A set of image patches Input image Epitome

  4. Compact representation

  5. Compact representation

  6. Using the epitome for recognition The smiling point Epitome of 295 face images Images with the highest total posterior at the “smiling point” Images with the lowest total posterior at the “smiling point”

  7. Epitomes may also allow some variability Epitome e: Mean  Variances 

  8. Epitomes can be computed for ordered datasets (e.g., 1-D arrays or 2-D, or 3-D or n-D matrices) with arbitrary measurement types: • Intensities • R, G, B values • Gradient values • Wavelet coefficients • Spectral energies • Nucelotide or aminoacid content … • We even played with text and MIDI files

  9. AIDS 101 • AIDS (acquired immune deficiency syndrome) was first described in the early 1980s • HIV (human immunnodeficiency virus) causes AIDS was isolated in 1983; 40 million people now infected • HIV is RNA virus: protein coat + copying proteins + regulatory proteins + RNA • Copying proteins + RNA enters cell • RNA is reverse transcribed to DNA • DNA inserts into cells DNA and is transcribed and translated to more HIV protein • Infected cell assembles more copies of HIV • Cell bursts releasing many new copies of HIV

  10. The map of HIV From http://www.mcld.co.uk/hiv (A simplified version of the LANL detailed map)

  11. HIV diversity (LANL database) HIV is encoded in an RNA sequence of about 10000 nucleotides, divided into several genes. NEF is one of the shorter and moderately variable ones. The NEF length in the strain The 73 nucelotides of the NEF gene Note the insertions, deletions and mutations. A triplet of nucleotides encode for one aminoacid. A change in a single aminoacid may lower the cellular immunity to the virus in one patient and increase it in the other.

  12. Immune system response

  13. MHC-I Molecule Epitope

  14. Known epitopes in a part of HIV’s Gag protein

  15. Epitopes in variable regions Colors signify different human immune types

  16. Immunology 101 “Train and kill” mechanism • Immune system sees a virus and trains “killer cells” (T cells) to kill any cell showing a pattern from the virus • Patterns are short peptides (8-11 amino acids long) called epitopes: 3D structure of an epitope as presented by an infected cell to the killer cells SLYNTVATL Amino-acid pattern (peptide)

  17. But, HIV is variable… The train-and-kill mechanism doesn’t work as well for HIV – the virus adapts through rapid mutation. As soon as the killer cells get the upper hand, the epitopes start changing. Possible solution: • Find epitopes that occur frequently across a *population* of HIV viruses • Compact these epitopes into a small vaccine (small is good: long vaccines are hard to deliver, and less likely to be effective)

  18. The epitome of a virus

  19. Colors: Different patients Sequence data VLSGGKLDKWEKIRLRPGGKKKYKLKHIVWASRELERF LSGGKLDRWEKIRLR KKKYQLKHIVW KKKYRLKHIVW Epitome

  20. Machine Learning Approach to Vaccine Design • Use sample HIV strains from multiple patients • Build models that compactly encode as many epitopes (or likely epitopes) as possible • Learning techniques: • Myopic • Split and merge • Expectation Maximization

  21. Coverage of all 10aa blocks from 245 Gag proteins (Perth data)

  22. A Vaccine for HIV/AIDS • Typical vaccines are near copies of the virus that is being vaccinated against • HIV mutates at a high rate – can’t use traditional techniques • Machine learning allows us to build compact forms of “pseudo-virus” that covers the diversity of the HIV virus (or rather a pseudo-protein that covers the diversity of a particular HIV protein) • This pseudo-protein, which we call the epitome is much shorter than the concatenation of all strains

  23. Expected (weighted) coverage optimization We have algorithms to predict this! p(T), p(S): Cleavage, MHC binding, transport P(XS|ET): T-cell cross-reactivity We have some idea about this, too.

  24. MHC-I Molecule Peptide Finding Epitopes and their MHC-I counterparts

  25. Important to find both epitopes and the MHC-I types that can present them • Each patient has six MHC-I types (2 As, 2Bs, 2Cs) • Most epitopes can be presented by only a few MHC-I molecules • Different populations (China, India, South Africa, etc.) have different MHC-I frequencies

  26. Finding Epitopes and their MHC-I counterparts Existing methods: • Trial and error in the wet lab • Machine learning Our methods: • More machine learning • Machine learning + physics • Machine learning + wet lab

  27. Machine Learning Examples of peptide is epitope for MHC-I type Examples of peptide is NOT epitope for MHC-I type • Classifier: • Logisitc regression • SVM • Neural net • Etc

  28. Issues (from experience) • Amount of data • Feature extraction • Algorithm choice

  29. Simple feature extraction SLYNTVATL, A02 • Amino acid at position 1=S • Amino acid at position 2=L • Amino acid at position 3=Y • … • Amino acid at position 9=L • MHC-I type=A02

  30. Simple feature extraction(logistic regression)

  31. Better feature extraction SLYNTVATL, A02 • Previously mentioned features • Amino acid at position 1 = S & MHC-I = A02 • Amino acid at position 2 = L & MHC-I = A02 • … • Amino acid at position 9 = L & MHC-I = A02

  32. Better feature extraction

  33. Machine learning + physicswith David Baker and Ora Furman, UW

  34. Machine learning + physicswith David Baker and Ora Furman, UW

  35. Machine learning + wet labWith Christian Brander & Nicole Frahm, HarvardJennifer Listgarten, U. Toronto • If a patient’s blood reacts with a peptide, then it is very likely that some subsequence of the peptide is an epitope for at least one of the patient’s six MHC-I types • From observations for many patients, tease out the responsible MHC-I type(s) • Find the subsequence in the lab peptide, e.g., NYTSLIYTLIEESQNQQEK … Pt1 Pt2 Pt3 Pt4 PtN

  36. What makes a good solution for a peptide? • The fewer the responsible MHC-I types the better • An MHC-I type gets “points” for appearing in reacting patients and loses “points” for appearing in non-reacting patients

  37. Not easy… • Lots of noise: p(react | is epitope)~0.25 • “Leaks”: may see a reaction even when the peptide is not an epitope for any MHC-I type of the patient • “Explaining away”: When a patient has two MHC-I types that can be responsible for a reaction, those two get less credit • Don’t actually know • p(react | is epitope) • Leak probabilities • Example solution: A B C reacting patients non-reacting patients A B C

  38. Graphical model for a peptide A01 A02 A03 B01 B02 B03 C01 C02 C03 A02c A01c A03c A03c … B01c B02c B02c B03c C01c C01c OR OR C03c C02c pt1 reacts pt2 reacts leak leak p0 p0

  39. Fuel TurnOver Gauge Start Battery (Directed Acyclic) Graphical Models p(F,B,T,G,S) = p(F) p(B|F) p(T|F,B) p(G|F,B,T) p(S|F,B,T,G) = p(F) p(B|F) p(T|F,B) p(G|F,B,T) p(S|F,B,T,G) = Pvarsp(var|parents)

  40. Graphical model for a peptide A01 A02 A03 B01 B02 B03 C01 C02 C03 … … …

  41. Graphical model for a peptide A01 A02 A03 B01 B02 B03 C01 C02 C03 p A02c A03c B01c B02c C01c C03c

  42. Graphical model for a peptide A01 A02 A03 B01 B02 B03 C01 C02 C03 p p p A02c p p A03c p B01c B02c C01c C03c

  43. Graphical model for a peptide A01 A02 A03 B01 B02 B03 C01 C02 C03 A02c A03c B01c B02c C01c OR C03c pt1 reacts leak p0

  44. Graphical model for a peptide A01 A02 A03 B01 B02 B03 C01 C02 C03 A02c A01c A03c A03c … B01c B02c B02c B03c C01c C01c OR OR C03c C02c pt1 reacts pt2 reacts leak leak p0 p0

  45. Solving the model • Principle: find the p, p0 and MHC-I assignments that maximize the likelihood of the data • Algorithm: Guess p, p0 Iterate • Use relaxation method to find max likelihood MHC-I assignments • Use gradient descent to find values of p, p0 that maximize the likelihood

  46. Status Most likely assignments have been confirmed

  47. Summary • HIV vaccine design is a data intensive problem • Data is in the form of discrete sequences, making it ideal for computer-science/machine-learning analysis • Machine learning approaches are instrumental in finding epitopes and vaccine compression • Work in progress: Our vaccine designs are scheduled to be tested at Mass General in vitro this summer

More Related