1 / 17

A discriminative method for protein remote homology detection based on N-Gram

A discriminative method for protein remote homology detection based on N-Gram. Reporter : Xie sifa Mentor : Zou quan. Outline. Introduction. Method. Improve P&R. Conclusion. Introduction. Introduction. Protein homology detection. detect 10%~30% protein structure.

Télécharger la présentation

A discriminative method for protein remote homology detection based on N-Gram

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A discriminative method for protein remote homology detection based on N-Gram Reporter : Xie sifa Mentor : Zou quan

  2. Outline Introduction Method Improve P&R Conclusion

  3. Introduction

  4. Introduction Protein homology detection detect 10%~30% protein structure Remote homology detection ...ATTATCCGACGGCCGCCT... ...TCATCTGCACGGCCTCAC... Similarity<25% --《生物信息学基础》 孙啸,陆祖宏,谢建明

  5. Process Data Set Feature Extraction Classify

  6. Date Set Benchmark (Liao and Noble,2003) Same superfamily Similatiry<10-25 4352proteins TrainSet Different family 54 Families Familyi Same family Test Set Different family

  7. Ngram 2Gram: 400 3Gram: 8000 1Gram: 20 "A Closer Look at Skip-gram Modelling" --David Guthrie,Ben Allison et al Skip-Ngram: "I hit the tennis ball" "hit the ball" !!! "the tennis ball" "I hit the" "hit the tennis"

  8. Random Forest Ensemble !!!

  9. Result the area under the ROC curve up to first 50 false positives

  10. Result

  11. Result

  12. Improving Recall and Precision Unbalance data set Trade-off

  13. Improving Recall and Precision One family one threshold

  14. Improving Recall and Precision Train set 0.98+ 0.95+ 0.93+ 0.92+ 0.90- 0.87- 0.85+ 0.84- 0.81+ 0.79+ 0.77- 0.75- 0.73- 0.69+ 0.65- 0.62- 0.58- 0.55- 0.53- F value 0.88 0.85 0.82 0.79 0.78 0.76 0.75 0.72 0.70 0.68 0.67 0.63 0.60 0.57 0.56 0.54 0.51 0.49 0.48 0.79 New test New train F value F value no value but position! F value

  15. Improving Recall and Precision

  16. Conclusion 1. Ngram model is successfully used to detect protein remote homology. The result on the benchmark is satisfied. 2. A novel method is proposed to improve the recall and precision of positive samples. This method yields values of 0.86752 and 0.56470 for mean recall and mean precision, respectively.

More Related