1 / 34

My Research Work and Clustering

My Research Work and Clustering. Bernard Chen 2009. Outline. Introduction Experimental Setup Clustering Future Works. Central Dogma of Molecular Biology. Amino Acids, the subunit of proteins. Protein Primary, Secondary, and Tertiary Structure. Protein 3D Structure.

muniya
Télécharger la présentation

My Research Work and Clustering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. My Research Work and Clustering Bernard Chen 2009

  2. Outline • Introduction • Experimental Setup • Clustering • Future Works

  3. Central Dogma of Molecular Biology

  4. Amino Acids, the subunit of proteins

  5. Protein Primary, Secondary, and Tertiary Structure

  6. Protein 3D Structure

  7. Protein Sequence Motif • Although there are 20 amino acids, the construction of protein primary structure is not randomly choose among those amino acids • Sequence Motif: A relatively small number of functionally or structurally conserved sequence patterns that occurs repeatedly in a group of related proteins.

  8. Protein Sequence Motif These biologically significant regions or residues are usually: • Enzyme catalytic site • Prostethic group attachment sites (heme, pyridoxal-phosphate, biotin…) • Amino acid involved in binding a metal ion • Cysteines involved in disulfide bonds • Regions involved in binding a molecule (ATP/ADP, GDP/GTP, Ca, DNA…)

  9. Goal of the our group • The main purpose is trying to obtain and extract protein sequence motifs which are universally conserved and across protein family boundaries. • Discuss the relation between Protein Primary structure and Tertiary structure

  10. Outline • Introduction • Experimental Setup • Clustering • Future Works

  11. Experiment setup: HSSP matrix: 1b25

  12. HSSP matrix: 1b25

  13. Representation of Segment • Sliding window size: 9 • Each window corresponds to a sequence segment, which is represented by a 9 × 20 matrix plus additional nine corresponding secondary structure information obtained from DSSP. • More than 560,000 segments (413MB) are generated by this method. • DSSP: Obtain 2nd Structure information

  14. Outline • Introduction • Experimental Setup • Clustering • Future Works

  15. Clustering Algorithms • There are two clustering algorithms we used in our approach: • K-means Clustering • Fuzzy C-means Clustering

  16. K-means Clustering

  17. K-means Clustering

  18. K-means Clustering

  19. K-means Clustering

  20. K-means Clustering

  21. Fuzzy C-means Clustering

  22. Fuzzy C-means Clustering

  23. Fuzzy C-means Clustering

  24. Fuzzy C-means Clustering

  25. Fuzzy C-means Clustering

  26. Fuzzy C-means Clustering

  27. Fuzzy C-means Clustering

  28. Outline • Introduction • Experimental Setup • Clustering • Future Works

  29. Original dataset Fuzzy C-Means Clustering Information Granule 1 ... Information Granule M K-means Clustering ... K-means Clustering Join Information Final Sequence Motifs Information Granular Computing Model

  30. Motivation

  31. Reduce Space-complexity Table 1 summary of results obtained by FCM

  32. Reduce Time-complexity Wei’s method: 1285968 sec (15 days) * 6 = 7715568 sec (90 days) Granular Model: 154899 sec + 231720 sec * 6 = 1545219 sec (18 days) (FCM exe time) (2.7 Days)

  33. HSSP-BLOSUM62 Measure

  34. FutureWorks

More Related