1 / 30

Functional characterization of membrane transporters from protein sequences

Functional characterization of membrane transporters from protein sequences. Haiquan Li The Samuel Roberts Noble Foundation. Membrane transport proteins (transporters). Functions Uptake of nutrients (nitrogen) Pump out toxic metabolites Mediate signal transduction

lilli
Télécharger la présentation

Functional characterization of membrane transporters from protein sequences

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Functional characterization of membrane transporters from protein sequences Haiquan Li The Samuel Roberts Noble Foundation

  2. Membrane transport proteins (transporters) • Functions • Uptake of nutrients (nitrogen) • Pump out toxic metabolites • Mediate signal transduction • Maintain ionic osmotic homeostasis • Classes based on driving energy • channels (passive diffusion) • carrier-type facilitators (electrochemical potential-driven eg. sodium potential) • primary active transporters

  3. Characterization of transporters • Small-scale experimental methods • Patch-clamp techniques for channels • Isotopical-labeled substrates • Heterologous expression • Mutant complementation • The demand of genome-scale computational methods (transportomics) • Comparative studies Comparative study of transporter families from multiple organisms, such as lignin-making organisms and non-lignin marking organisms • Integrative study with transporter gene expression Exchange of metabolites (e.g. nitrogen) between legumes and rhizobia

  4. An example of transportomics Udvardi & Day, 1997 Day et al., 2001

  5. Transporter resources and classification systems • Manually curated resources • TCDB by Sailer et al. • TransportDB by Ren et al.

  6. Computational characterization of transporters • False positives caused by gene duplication (paralogs), domain shuffling, or non-transporter domains • Example: Plant Plasmodesmata (PPD) family (1.A.26) transports hormones or growth factors. • Single member: Connexin 32, a gap junction protein Computational characterization methods Machine learning Empirical rules Homology search (Domain) (Blast)

  7. Motivation of our work • Objectives • List of all candidate transporters, since the low confidence may imply novelty and significance • Reduce curation efforts significantly • Methodologies • Using distinct machine learning and empirical rules to enhance annotation confidence • Efficiently and automatically integrate multiple evidence from TCDB, Pfam, GO, SWISS-PROT and transmembrane segment (TMS)

  8. Saport: a semi-automatic transporter annotation system Input sequences Machine Learning Module (TransportTP) Empirical Rule Module Initial classifier from TCDB Collect transporter- related evidence BLAST Search HMM Search Score integration and initial prediction Summarize family-based empirical rules Refining classifier TMS KNN in TCDB Pfam domains Go Terms SwissProt Homologs Interpret rules and generate putative transporters Classification by ensemble of SVMs Score integration and ranking

  9. TransportTP: Two-phase classification True positives (Correctly categorized transporters) Initial classifier from TCDB Transporters ? Refining classifier F1 False positives (incorrectly predicted transporters) Nontransporters NN transporter Fi p False negatives (Missed transporters) … True negatives (non-transporters) Fm Haiquan Li, Vagner A. Benedito, Michael K. Udvardi and Xunchun Zhao. BMC Bioinformatics, under revision. Haiquan Li, Xinbin Dai & Xunchun Zhao. Bioinformatics, 24,1129-1136, 2008.

  10. Refining features: TMS & KNN KNN

  11. Refining features: Pfam,GO & Swissprot Pfam families TC families … p TCDB Swissprot … + cross-links

  12. Refining classifier: ensemble of SVMs • Classification label of training samples • Positives are benchmarked by TransportDB for their manual annotation • Others are negatives SVM1 Major class Major samples SVM2 pos_weight > neg_weight? unknown proteins Minor class SVMk

  13. Generation of empirical rules • Manual curation of transporters • Collect transporter-related evidence • Categorize the evidence manually • Summarize the rules on transporter families during the curation of plant organisms • medicago, lotus, sorghum, poplar, grape, sorghum, moss, green algae Universal table of raw evidence

  14. Representation of rules • Categories of curation • Level 1: every expected features are there • Level 2: a minor feature is missing • Level 3: a major feature is missing or multiple features are conflicted • Representation and customization of complicated empirical rules isnull($tcdb_top_evalue); lt($len,$-2/2):=3; lt($len,$-2-$0):+1 up to 3; gt($len,$-1+$0):+1 up to 3

  15. Interpretation of Rules • A simple script language • Flow control: serial ‘&’, otherwise ‘;’ • Variable definition: database field variable and rule column variable • Assign and arithmetic operations: ‘:’ ‘+’ ‘-’ ‘*’ ‘/’ • Comparison operations: lt, gt, eq, le, ge • String operations: isnull, matched, items, match_items, compatible • Boundary functions: up to, down to • Advance functions: key, index, gradient, etc • Nested functions • The interpretation program can be fixed and the rules can be tuned and customized for other kingdoms of organisms • Interpret the script language using programming techniques

  16. Final issues on Saport • Final Integration • Final scores are integrated from machine learning scores and empirical categorization • Sequences annotated by either method is accepted, otherwise, will be filtered out • Confidence is gained from the mutual support of both methods; further review is need for conflicted or single annotated ones • Tools: filtering, visualization and online curation

  17. Saport (http://bioinfo3.noble.org/saport)

  18. Evaluation of TransportTP module: cross-validation results Yeast was used for training and e-value threshold of initial classifier was set to 0.1

  19. Full results of TransportTP in Leave-one-in cross-validation Recall/sensitivity Average=80.2% Precision Average=81.9% E-value threshold was set to 0.1 in initial classifier

  20. General model versus genome-specific model on the balanced accuracy of TransportTP E-value thresholds of initial classifier

  21. Benefit of integrating machine learning with homology search Yeast was used for training and e-value threshold 10 to 1e-50 were tested

  22. The predictive performance of TransportTP on plant organisms Manually curated: curation with confidence level 1 and 2 Potential transporter rates: proportion of predictions match curation level 3 Arabidopsis was used for training and 10 was used as e-value threshold

  23. Preliminary results of automatic annotation by empirical rules

  24. Consistence between the two modules

  25. Consistence between the two methods (con’t) Curation results TransportTP Empirical Rules Human Curation 76.74 79.38 69.28 76.74 Machine Learning results 65.01 90.88 Empirical rule results Saport Recall Precision

  26. Comparative study of monolignal transporters Plant cell • Comparative study • strengthening predictions versus all potential predictions • Candidate mono-lignol transporters • 2.A.85 Aromatic Acid Transporters (ArAE) High plants moss ? algae fungi

  27. Results on nodule transporters Benedito, Li et al. Plant Physiology, under review.

  28. Discussion • Comparison of two methods • Machine learning method is general, but the black boxed method is difficult to check by biologists • Empirical rules are family-based, easy to check by biologists, but may be biased on the organisms summarized • Pitfalls of system • Difficult to distinguish transporters and sensors • Sensitive to partial sequences such as ESTs • Weak to handle transporter complexes • Further work • Integrate gene expression and sub-cellular localization analysis • Integrate phylogenetic analysis 1) characterize subfamily or substrates based on SIFTER or TransportDB and 2) comparative study of annotated transporter families from multiple organisms

  29. Summary • Present a transporter annotation system which effectively integrates homology based, machine learning methods and empirical rules • The system is promising to characterize eukaryotic transporters with significantly reduced curation efforts • Provide a general framework for integrative decision, including integration of multiple resources and prior biological knowledge

  30. Acknowledgements • Michael Udvardi • Carolyn Young • Rick Dixon • Patrick Xuechun Zhao • Vagner Benedito • Ranamalie Amarasinghe • Jian Zhao • Xinbin Dai

More Related