Visualization of Influenza Protein Segment HA in Manifold Space

Visualization of Influenza Protein Segment HA in Manifold Space Cheng-Yuan Liou* and Wei-Chen Cheng Department of Computer Science and Information Engineering National Taiwan University Republic of China *cyliou@csie.ntu.edu.tw 24th, March, 2010 11:00-13:00 Hue City, Vietnam

Idea D>>2 Dimension: D Dimension: 2 2

LDIM Energy function • : distance between gene and gene • : a distance in the manifold space 3

LDIM Algorithm • Initialize the cell set • Find the minimum and maximum distances among all protein sequences, • For each epoch t from to • Set • For every pair protein sequences and , adjust their cell positions by • End For • End For 4

LDIM Algorithm 5

Difference between SOM and LDIM 6

Levenberg–Marquardtmethod for LDIM algorithm Batch Mode • Set time • Set • Calculate the error for all data • Calculate Jacobian matrix • Initialize • Calculate the update function, , to have the moving vectors of every data, is a vector which have elements. • Try to update and rearrange to . • Recalculate the new errorby • If the new error is greater than old error , set and then go back to step 6, otherwise continue Step 10. • Update the real output • Shrink the value of by • if then and go back to Step 2, otherwise stop the procedure. 7

Levenberg–Marquardtmethod for LDIM algorithm Batch Mode Jacobian Matrix 8

Levenberg–Marquardtmethod for LDIM algorithm Sequential Mode, in this paper • Set time • Set • For from 1 to • Calculate the error for data by • Calculate Jacobian matrix • Initialize • Calculate the update function, , to have the moving vector of the data, is a vector which have elements. • Try to update and rearrange to . • Recalculate the new errorby • If the new error is greater than old error , set and then go back to step 7, otherwise continue Step 11. • Update the real output • End For • if then and go back to Step 2, otherwise stop the procedure. 9

Levenberg–Marquardtmethod for LDIM algorithm Sequential Mode, in this paper Jacobian Matrix 10

Computer Simulation for LDIM algorithms • Input: 100 H1N1 sequences • Aligned sequences length: 570 • Min length: 561 • Max length: 566 • Learning rate: 0.00025 • Epochs: 150 11

100 H1N1 sequences 12

Pandemic (H1N1) 2009Mortality Statistics source: Global Alert and Response, WHO http://www.who.int/csr/disease/swineflu/updates/en/index.html 13

H5N1 Mortality C: Cases, D: Deaths http://www.who.int/csr/disease/avian_influenza/country/cases_table_2010_03_04/en/index.html

Goal This work presented a visualization method to study the family of viruses and to infer their grandmother (ancestor). This manifold reveals the relations among all family members. Their grandmothers are useful sources for medical antigen. By this manifold, one can monitor the mutation rates and the evolution trends of the viruses. A new viral sample can be placed in the population with its family relations. 15

Data Information Distance: Hamming distance after performing multiple alignment Characters: 20 amino acid Information of data:Influenza A virusessubtype H1N1, H1N2, H2N2, H3N2, H5N1, H7N2, H7N3, H7N7, H9N2 Data source: NCBI Influenza Virus Resource 16

Parameters of multiple alignment • Program: Clustalw2 • European Bioinformatics Institute • Parameter Setting: • Penalty for opening a gap : 10.0 • Penalty for extending a gap: 0.2 • The gap separation penalty: 4 • GONNET 250 matrix Larkin MA, Blackshields G, Brown NP, etc.: Clustal W and Clustal X version 2.0. Bioinformatics, 23(21):2947-8

Protein HA (Hemagglutinin) The function of this protein is responsible for bind the virus to the cell.

Influenza A Segment HA-- Timeline

Pandemic (H1N1) 2009Segment HA in 3D The color shows value of z-axis.

H1N1 Segment HA in 3D Space 21

Ten H1N1 HA sequences that are closest to the center 22

Partial Phylogenetic Tree (H1N1) 23

Avain Flu (H5N1)Segment HA in 3D The color shows the time.

H5N1 Segment HA in 3D Space 25

Ten H5N1 HA sequences that are closest to the center 26

Partial Phylogenetic Tree (H5N1) 27

H1N1 & H3N2 & H5N1Hemagglutinin (HA) in 3D 28

Overview Influenza AHemagglutinin (HA) in 3D 29

Influenza AHemagglutinin (HA) in 3D Detailed 30

Protein NA (Neuraminidase) The protein facilitates the release of progeny viruses from infected cells.

Influenza A Segment NA-- Time Line

Pandemic (H1N1) 2009Segment NA in 3D The color shows value of z-axis.

Avain Flu (H5N1)Segment NA in 3D The color shows the time.

H1N1 & H3N2 & H5N1Neuraminidase (NA) in 3D 35

Influenza ANeuraminidase (NA) in 3D Overview 36

Influenza ANeuraminidase (NA) in 3D Detail 37

Influenza ANeuraminidase (NA) in 3D Detail 38

Comparison of Mutation Rate Reference: [Manna and Liou (2007); Tutorial link (ppts); Jukes and Cantor (1969); Nei and Gojobori (1986)]

Summary • This manifold is designed for information visualization. • It uses relative distances among patterns and is invariant under translation, rotation and scaling of the pattern coordinates. • It has a perfect energy function. We expect that it can preserve the physical meaning among patterns and reveal their various hidden relations maximally. • The initial setting of the algorithm is flexible. The computation can be parallelized and distributed. The cells can be trained in a sequential mode or a batch mode. 40

Thanks for your attention.

Visualization of Influenza Protein Segment HA in Manifold Space

Visualization of Influenza Protein Segment HA in Manifold Space

Presentation Transcript

Membrane Protein Structure Prediction and Visualization

Protein Visualization

Protein Crystals In Space

Manifold Topics

Adult Immunization 2010 Influenza Segment

Protein structure Visualization Molecular Story

Deduction and Visualization of Protein Structures

Visualization of peptide-protein relationship networks in Cytoscape

Manifold learning

Manifold Clustering of Shapes

Visualization of protein 3D structure in reduced representation with intramolecular interactions.

TSAT Space Segment Program Structure

ha, ha, ha

COSPAS-SARSAT SPACE SEGMENT

WP2300 – Space Segment Concepts

Visualization of TV Space

Manifold Learning – from Brain Visualization to Advanced Image Processing

Topology in Manifold Learning

HA-HA-HA HA - HA

Visualization of High-Dimensional Space

Manifold Hydraulics

Protein Structure Visualization