1 / 24

I.U. School of Informatics

Capstone Presentation. Motif Discovery from Large Number of Sequences: A Case Study with Disease Resistance Genes in Arabidopsis thaliana by Irfan Gunduz. I.U. School of Informatics. 04/25/04. INTRODUCTION. Motifs

Télécharger la présentation

I.U. School of Informatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Capstone Presentation Motif Discovery from Large Number of Sequences:A Case Study with Disease Resistance Genes in Arabidopsis thalianaby Irfan Gunduz I.U. School of Informatics 04/25/04

  2. INTRODUCTION • Motifs • Highly conserved regions across a subset of proteins • that share the same function >Seq A >Seq B >Seq C >Seq D YNEDSKH YDDDSNH YDNDSNH YENDSKH • Motifs can be used to predict • A molecule’s function • A Structural Feature • Family membership I.U. School of Informatics

  3. INTRODUCTION • Current motif finding soft-wares: • MEME • PROSITE • PRATT, etc Do they work with large number of sequences? • Pattern discovery relies on statistical or combinatorial techniques,looking for signals • Signal-to-noise ratio becomes less clear as the number of sequences increases What to do? I.U. School of Informatics

  4. Objective • Develop a computational procedure to find functional motifs from large number of sequences I.U. School of Informatics

  5. COMPUTATIONAL PROCEDURE Tools • BLAST (Sequence alignment tool) • BAG ( Sequence Clustering package) • CLUSTAL W (Multiple sequence alignment) • HMMERII (HMM based software) • BLOCK MAKER (Block/Motif finder) • LAMA (Block comparison tools) • PERL I.U. School of Informatics

  6. COMPUTATIONAL PROCEDURE 1- Collecting and Clustering Sequences I.U. School of Informatics

  7. COMPUTATIONAL PROCEDURE 2 - ENRICHMENT I.U. School of Informatics

  8. COMPUTATIONAL PROCEDURE 3 – REFINEMENT 4 – MOTIF FINDING I.U. School of Informatics

  9. A Case Study with Disease Resistance Genes in Arabidopsis thaliana I.U. School of Informatics

  10. Why Disease Resistance Genes? I.U. School of Informatics

  11. Background, Disease Resistance Genes DomainProbable Function TIR CC KIN LRR Recognition of specificity NB ATP and GTP binding I.U. School of Informatics

  12. Case Study, Arabidopsis thaliana • 116 disease resistance protein or disease resistance protein like • annotated sequences were extracted from Arabidopsis thaliana genome • Clustered into 32 groups • 20 to 640 sequences were added in each cluster after HMM iterations • After refinement four clusters were formed for further analysis I.U. School of Informatics

  13. Case Study, Arabidopsis thaliana PFAM Search Domains Cluster 1 NB-ARC, TIR, Kin, LRR NB-ARC, Kin, LRR Cluster 2 Ser/Thr Kin Cluster 3 Kin Cluster 4 I.U. School of Informatics

  14. Case Study, Arabidopsis thaliana Results, Block Maker 15218608 YDVFLSFRGVDTRQTIVSHL 15218618 YDVFLSFRGEDTRKNIVSHL 15220795 YDVFLSFRGEDTRKTIVSHL Cluster1 Cluster2 I.U. School of Informatics

  15. Case Study, Arabidopsis thaliana Results, Lama and BAG Clusters at the whole gene level Cluster1 Cluster2 Cluster1 Cluster3 Cluster2 Clusters at the Block Level I.U. School of Informatics

  16. Case Study, Arabidopsis thaliana RPS4 RPP1 RPP5 Clusters at the whole gene level Cluster1 TIR-I TIR-II Kin1a Kin2 NBS-B LRR Kin1a NBS-A Kin2 NBS-B NBS-C GLPL LRR Cluster2 RPP8 RPM1 Cluster1 Cluster3 Cluster2 Clusters at the Block Level I.U. School of Informatics

  17. Case Study, Arabidopsis thaliana Number of Disease Resistance Gene Candidates on each Chromosome CHR-1CHR-IICHR-IIICHR-IVCHR-V Cluster 1 16 2 6 16 35 Cluster 2 20 0 6 4 9 I.U. School of Informatics

  18. Case Study, Arabidopsis thaliana New Disease Resistance Gene Candidates Cluster 2 Cluster 1 GI 15221277 GI 15221280 GI 15217940 GI 15221744 GI 15236505 GI 15242136 GI 15233862 I.U. School of Informatics

  19. Case Study, Arabidopsis thaliana To test effectiveness of the computational procedure • 792 Unique sequences were merged and submitted to MEME and PRATT to detect functional motifs. • Time : Took more than 9000 minutes on Pentium IV • 1.7 GHz machine running on Linux • Result : No known disease resistance gene motifs • were detected I.U. School of Informatics

  20. Case Study, Arabidopsis thaliana CONCLUSIONS: • Sensible combination of tools provides an excellent mechanism for motif detection • Clustering helps to improve performance of other well known tools I.U. School of Informatics

  21. ACKNOWLEDGEMENT Motif Discovery from Large Number of Sequences: A Case Study with Disease Resistance Genes in Arabidopsis thaliana Irfan Gunduz, Sihui Zhao, Mehmet Dalkilic and Sun Kim will be presented at The 2003 International Conference on Mathematics andEngineering Techniques in Medicine and Biological Sciences I.U. School of Informatics

  22. Case Study, Arabidopsis thaliana I.U. School of Informatics

  23. Disease Resistance Mechanism I.U. School of Informatics

  24. COMPUTATIONAL PROCEDURE • Refinement B A D B D C C I.U. School of Informatics

More Related