190 likes | 298 Vues
GlycoCT provides a standardized format for encoding carbohydrate sequences, enhancing glyco-bioinformatics. This comprehensive overview discusses its unique structural features, including the controlled vocabulary for defining carbohydrate linkages and monosaccharides. By applying strict sorting rules, GlycoCT ensures a unique representation for carbohydrates, facilitating exact structure searches. The implementation of hierarchical algorithms for comparing residues and linkages further streamlines sequence analysis. The recent translation of CarbBank monosaccharides into GlycoCT demonstrates significant reductions in distinct residues, showcasing its efficacy.
E N D
GlycoCT—a unifying sequence format for carbohydrates S. Herget, R.Ranzinger, K.Maass and C.-W.v.d.Lieth Presented by YingxinGuo
An overview of the sequence formats used in glycobioinformatics
Uniqueness—A central requirement for encoding carbohydrate sequences • Why • Server as primary key in database • Beneficial for the implementation of exact structure search • How • Apply strict sorting rules • Define a controlled vocabulary • Support encoding of uncertain linkages and unspecified monosaccharides
Basic residue(RES) entities in GlycoCT • Substituents and other entities
Modeling the topology • Residue entities are modeled in RES section. • Linkages are modeled in LIN section. • Atom replacement schema.
Sorting • Why • One central requirement is to generate a unique representation for all carbohydrates. • Sorting is used to determine the order of appearance of elements. • How • A set of hierarchical rules are used in GlycoCT to define the ordering of residues, linkages and special structural features. • Residue comparison algorithm • Linkage comparison algorithm • Underdetermined subtree comparison algorithm • Alternative subtree comparison algorithm
Residue comparison • Apply when there are multiple starting points exist. • Rules • Number of child residues. • Length of the longest branch. • Number of terminal residues. • Number of branching points. • Lexical order.
Linkage comparison • Rules • Number of bonds between parent and child residues. • Atom linkage position at the parent residue. • Atom linkage position at the child residue. • Linkage type at the parent residue. • Comparison of child residues with residue comparison algorithm. • Decide the internal orderof the RES and LIN sections
Underdetermined subtree & Alternative subtree comparison • The encoding of UND and ALT is handled separately from the description of the other topological features. • Apply the set of rules from the residue and linkage comparison algorithm to each UND and ALT to determine internal order. • The reducing residues of UNDs and ALTs are compared with the residue comparison. • If two compared UNDs are identical, the parent residues and linkages(linkage between UND and main graph) are compared.
First application and results • All the monosaccharides from CarbBank were translated to the naming defined by GlycoCT. • 1439 different names in CarbBank resulted in 474 different basetypes and 29 different substituents, reducing the number of distinct residues by 65%. • Two main reasons for the reduction • The separation of monosaccharides into basetype and substituents • The unique encoding for monosaccharides
Conclusion • A superset of capabilities of all known sequence formats in glycobioinformatics • Support structurally undetermined sequences • The consistent naming scheme for monosaccharides can be easily maintained.