Representation of biomedical sublanguages using symbolic notation

Representation of biomedical sublanguages using symbolic notation John MacMullen SILS Bioinformatics Journal Club Fall 2003

Terminology article assumptions • “Knowledge encoded in textual documents is organized around sets of domain-specific terms…” [938] • “Terms represent the most important concepts in a domain and characterize documents semantically.” [939] • “[T]he basic problem is to recognize domain-specific concepts and to extract instances of specific relationships among them.” [938] • Terms are ambiguous and have variation; they are hardly ever mono-referential • The lack of naming conventions (controlled vocabularies), the existence of acronyms, and the large existing heterogeneous literatures increase complexity. [from Nenadic, G., Spasic, I., & Ananiadou, S. (2003). Terminology-driven mining of biomedical literature. Bioinformatics 19(8), 938-943.] SILS Bioinformatics Journal Club – Fall 2003

Harris’ Assumptions • “[T]here is a particular structure to science information in general, and to the information of each subscience in particular”, [because] “for each subscience there are particular subsets of nouns that occur with particular subsets of verbs or other words” [215]. • “[I]t is not intrinsic properties of sounds and meanings that determine the possible word-sequences of sentences.” […] “For each word, we ﬁnd roughly stable inequalities of probability among the words in its required (positive probability) set” [216]. • “What is common to the texts of a given subject matter is that ﬁrst-level words of a given subset require zero-level words of only a particular subset” [217]. • “We thus obtain for the science several statement-types […]” [217]. • “What we have here is thus an information-theoretic approach to the structure of information, as against solely the amount of information” [217, emphasis added] SILS Bioinformatics Journal Club – Fall 2003

Linguistic probabilities • “For each word, we ﬁnd roughly stable inequalities of probability among the words in its required (positive probability) set” [216]. • “The meaning of a word is indicated, and in part created, by the meanings of the words in respect to which it has higher than average probability” [217] • “Words with highest probability in respect to another word, or which otherwise can be shown structurally to have highest expectancy, add little or no information” [217]. SILS Bioinformatics Journal Club – Fall 2003

Representing Meaning with Notation • Movement towards structured rather than natural language • Representing sentences as propositions whose truth can be tested • Example [from 217-218]: If ‘G’ = “antigen”, and ‘J’ = “injected into”, and ‘B’ = “ear”, then ‘GJB’ = “antigen injected into ear” SILS Bioinformatics Journal Club – Fall 2003

Symbolic representation SILS Bioinformatics Journal Club – Fall 2003

Applications • Investigate “the possibilities of obtaining standard notations for science languages,not by ﬁat but by boiling down from actual use…” [220]. • “[R]elate the information structure of a science to anything else that characterizes the ﬁeld, in order to reach if possible a ‘‘structure’’of the science” [220]. • “[S]ee how tabular or other two-dimensional displays can represent the data (or the Result statements) of articles, for human inspection or for computer processing” [220]. SILS Bioinformatics Journal Club – Fall 2003

Other Propositions • “[W]hen, in a given science, articles written in different languages are analyzed […], we obtain the same sentence-types and structures,with only small differences due to the languages. • “The word class and subclass symbols,and the sentence-types,are therefore not just a sublanguage of a particular language, but an independent symbolic linguistic system” [219]. • Difference between ‘equivalance’ and ‘equal’? Example: • La casa di Gianni è bianco. [original] • The house of Gianni is white. [literal or equal] • John’s house is white. [equivalent] SILS Bioinformatics Journal Club – Fall 2003

Questions • Assume Harris’ notation method is valid and works well. • How might it be implemented in practice? (This is both an algorithm question and a policy question.) • Who would apply it? • What would some of the barriers be? • Do Harris' arguments hold in an interdisciplinary environment? SILS Bioinformatics Journal Club – Fall 2003

References • Harris, Zellig S. (2002). The structure of science information. Journal of Biomedical Informatics, 35, 215-221. • Linguistic String Project @ NYU: http://www.cs.nyu.edu/cs/projects/lsp/ • MedLEE project (Medical Language Extraction and Encoding System): http://cat.cpmc.columbia.edu/medleexml/ • Zellig Harris homepage: http://www.dmi.columbia.edu/zellig/ SILS Bioinformatics Journal Club – Fall 2003

Representation of biomedical sublanguages using symbolic notation

Representation of biomedical sublanguages using symbolic notation

Presentation Transcript

SAX: a Novel Symbolic Representation of Time Series

CAMPBELL’S AND STANLEY’S SYMBOLIC REPRESENTATION OF ‘TRUE’ EXPERIMENTS

Knowledge Representation using XML

Allegory: expression using symbolic fictional generalizations about human experience x symbolic representation

Representation of Chemicals in Biomedical Terminologies

A Multiresolution Symbolic Representation of Time Series

A Multiresolution Symbolic Representation of Time Series

Using Symbolic PathFinder at NASA

Modeling Assertions: Symbolic Model Representation of Application Performance

Representation of Trust Model using RDBMs

Esprit Project N° 28953 Interpretation of Symbolic Objects in 3D representation

SYMBOLIC REPRESENTATION

Using the User Requirements Notation

Representation of Symbolic Objects According to the description structure

Music Notation, Music Representation, AND Intelligence

Symbolic Representation

Representation of Symbolic Expressions in Mathematics

Symbolic Representation

High-Level Symbolic Representation (HLSR)

Frequency Domain Representation of Biomedical Signals

Practical Estimation Using Scientific Notation

A Multiresolution Symbolic Representation of Time Series