1 / 1

Feature selection for characterizing HLA class I peptide motif anchors.

Feature selection for characterizing HLA class I peptide motif anchors. Perry G. Ridge 1 , Hernando Escobar 1 , Peter E. Jensen 1 , Julio C. Delgado 1 , David K. Crockett 1,2 1 ARUP Laboratories, Department of Pathology, University of Utah School of Medicine, Salt Lake City, UT

garran
Télécharger la présentation

Feature selection for characterizing HLA class I peptide motif anchors.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Feature selection for characterizing HLA class I peptide motif anchors. Perry G. Ridge1, Hernando Escobar1, Peter E. Jensen1, Julio C. Delgado1,David K. Crockett1,2 1ARUP Laboratories, Department of Pathology, University of Utah School of Medicine, Salt Lake City, UT 2Department of Biomedical Informatics, University of Utah School of Medicine, Salt Lake City, UT 84108 Table 1. Selected attributes for HLA-A*0201 anchor positions 1 and 2. INTRODUCTION RESULTS Selected features using the full training set for anchor 1 and anchor 2 were summarized in Table 1, and results using fivefold cross-validation are reported below. Using fivefold cross-validation, the amino acid properties of normalized frequency of extended structure (Burgess et al., 1974), parameter of charge transfer capability (Charton-Charton, 1983), and relative preference value at C1 (Richardson-Richardson, 1988) best characterized the residues in anchor 1 (P2). The anchor 2 position (Pω), again using fivefold cross-validation, was best represented by the number of atoms in the side chain labeled 3+1 (Charton-Charton, 1983), parameter of charge transfer donor capability (Charton-Charton, 1983), normalized frequency of C-terminal non helical region (Chou-Suzuki, 1976), information measure for middle turn (Robson-Suzuki, 1976), and amphiphilicity index (Mitaku et al., 2002). Anchor Position AAIndex Propertya Original Reference Anchor 1 A parameter of charge transfer donor capability Charton, 1983 Amino acid composition Dayhoff, 1978 Atom based hydrophobic moment Eisenberg, 1986 Partition coefficient Garel, 1973 Polarity Grantham, 1974 Hydrophilicity value Hopp-Woods, 1981 Normalized frequency value of alpha-helix with weights Levitt, 1978 AA composition of total proteins Nakashima, 1990 Normalized frequency of beta-sheet in all-beta class Palau, 1981 Weights for alpha-helix at the window position of 3 Qian-Sejnowski, 1988 Average relative fractional occurrence in E0(i) Rackovsky-Scheraga, 1982 Relative preference value at C-cap Richardson, 1988 Normalized positional frequency at helix termini N4 Aurora-Rose, 1998 Volumes including crystallographic waters using ProtOr Tsai, 1999 Anchor 2 The number of bonds in the longest chain Charton, 1983 Average volume of buried residue Chothia, 1975 Normalized frequency of N-terminal beta-sheet Chou-Fasman, 1978 Conformational preference for parallel beta-strands Lifson-Sander, 1979 AA composition of mt-proteins from fungi and plant Nakashima, 1990 Information measure for C-terminal turn Robson-Suzuki, 1976 Volumes including crystallographic waters using ProtOr Tsai, 1999 a Accessed March 2010 from http://www.genome.jp/aaindex/ HLA class I peptide motifs have been described by dominant amino acid residues located in primary anchor positions. For example, the reported motif for HLA-A*0201 from the SYFPEITHI database is x-[LM]-x-x-x-x-x-x-[VL]. [1] Variations of this nomenclature are also seen in other HLA class I peptide motif databases such as IMGT/HLA [2]. Patterns of anchor residues has led to the development of software tools and algorithms for prediction of peptide binding and screening of target organisms or sequences for a given peptide motif. However, the physical and chemical properties of peptide anchor position residues that confer allele specificity have not been as well described. For this study, supervised feature selection was used to identify the physical and chemical properties that best distinguish A*0201 peptide binders from non-binders. Anchor 2 Anchor 1 METHODS A publicly available data set of A*0201 binding peptides (n=1181) and non-binding peptides (n=1908) was downloaded from the Immune Epitope Database (IEDB) [3]. Amino acid residues of anchor positions (P2 and Pω) were characterized using values of 544 physical, chemical, conformational, or energetic properties (AAindex v9.4). [4] Properties downloaded from the AAindex (http://www.genome.jp/aaindex/) were each represented numerically (each amino acid had a numerical value for each property). In cases where there was no value for a particular amino acid/property combination a value of zero was assigned. We created input files for the next step in processing using a simple Java program. Each amino acid in the anchor positions was assigned the numerical value given from the reported AAindex properties table. For each anchor position, the Correlation-based Feature Subset Selection algorithm [5], together with the Best First (greedy hillclimbing) search method, were used to identify the subset of properties that best distinguished binders from non-binders. Attribute selection algorithms were implemented using the Weka software package v3.6. [6] Figure 1. Common HLA-A*0201 motif. Anchor 1 and Anchor 2 were characterized using AAIndex Properties (v9.4). References: 1. Rammensee, H.G., T. Friede, and S. Stevanoviic, MHC ligands and peptide motifs: first listing. Immunogenetics, 1995. 41(4): p. 178-228. 2. Robinson, J., et al., IMGT/HLA database--a sequence database for the human major histocompatibility complex. Tissue Antigens, 2000. 55(3): p. 280-7. 3. Peters, B., et al., The immune epitope database and analysis resource: from vision to blueprint. PLoS Biol, 2005. 3(3): p. e91. 4. Kawashima, S. and M. Kanehisa, AAindex: amino acid index database. Nucleic Acids Res, 2000. 28(1): p. 374. 5. Hall, M.A., Correlation-based feature selection of discrete and numeric class machine learning, in Computer Science Working Papers. 2000, University of Waikato, Department of Computer Science: Hamilton, New Zealand. 6. Witten and Frank. Data Mining: Practical machine learning tools and techniques. 2nd edition ed. 2005, San Francisco: Morgan Kaufmann. CONCLUSIONS Supervised feature selection was used to characterize prominent physical and chemical properties for anchoring amino acid residues in HLA-A*0201 allele specificity. Ongoing efforts include allele representation and binding prediction algorithms for different HLA class I subtypes.

More Related