Discovering Descriptive Knowledge

Discovering Descriptive Knowledge Lecture 18

Descriptive Knowledge in Science In an earlier lecture, we introduced the representation and use of taxonomies and laws. Informatics tools for working with taxonomies • represent them as a collection of hypotheses about categories and their is-a relationships; • use them to organize knowledge and to classify new observations. Informatics tools for working with laws • represent them as hypotheses about quantitative and/or qualitative relationships among an object’s properties; • use them to predict the static or dynamic properties of an entity or an interconnected system.

The Taxonomy Formation Task Taxonomy formation consists of three tasks that may be solved separately or simultaneously: • the construction of categories; • the organization of the categories into a hierarchy; and • the explicit definition of the categories. Informatics tools for taxonomy formation fall into two general categories: • those that analyze finite batches of observations and create separate taxonomies for each batch; and • those that incrementally construct and refine taxonomies based on an effectively continuous stream of data.

Cluster 3.0 Cluster is designed to construct and organize categories from a batch of gene expression data. As input, Cluster takes gene expression levels from multiple experiments. The program clusters genes based on their expression patterns across experiments. Scientists can select the clustering method and set the available parameters. Cluster produces a text file that contains the taxonomy.

Cluster 3.0: Results Viewing the taxonomy produced by Cluster requires a separate program, such as Tree View. taxonomy data selected section of the taxonomy gene annotations

ReTAX ReTAX is an interactive environment that helps scientists revise taxonomies in response to new observations. A taxonomy in ReTAX includes hierarchically organized categories and their definitions. The data for ReTAX are a set of features, such as the size of a plant’s leaf, the type of its fruit, etc. and a category. As a scientist enters data, ReTAX ensures that the new item’s features • match or specialize the category’s defining features; and • distinguish it from other categories in the taxonomy. If the new item violates either of these rules, then ReTAX attempts to revise its taxonomy.

ReTAX Ericaceae Andromeda Pernettya … Gaultheria A. uva-ursi P. tasmanica G. oppositifolia G. rupestris G. antipoda Working in the context of a botanical taxonomy like this one, ReTAX replicated historical revisions. In the course of its use, ReTAX • identified descriptive features that were insufficient for distinguishing members of two taxa; • searched for new features to refine the taxa; and • eventually suggested that the genera Pernettya and Gaultheria should be merged.

Qualitative Law Discovery Qualitative laws fall into two primary categories: • those involving categorical statements about objects, such as “all ravens are black”; and • those describing qualitative changes, such as “temperature and pressure increase proportionately”. Informatics tools that discover categorical relationships have received the majority of the attention in this area. These tools typically address a supervised learning task: • data are described by multiple features (color = black, wings = present); • one of these features serves as a target for classification (species = C. corax); and • the tool relates the features to the target.

RL RL addresses the supervised learning task to produce qualitative laws that are expressed as logical rules. The rules are qualitative laws such that if all the conditions are true of a datum, then it is assigned to the target class. As input, RL takes a data set and information that controls the characteristics of the rules, such as • taxonomies of the values for features, • constraints among features in each rule, • minimum accuracy, & • maximum features.

RL As an example, consider the task of finding law-like relationships that link medical findings to a disease class. The data are patient findings, and the target is a syndrome that covers several ailments (lower respiratory syndrome). RL produces rules that relate the findings to the syndrome. Each rule has numeric measures of support. RL has been applied • to identify carcinogens, and • to determine parameters for crystallographic experiments.

Quantitative Law Discovery Quantitative laws may describe: • algebraic relationships such as Newton’s second law of motion, a=F/m; and • dynamic responses such as the unbounded growth rate of a population, dP/dt = kP. Informatics tools address both classes of laws through a variety of techniques. BACON discovers quantitative, algebraic laws through problem space search guided by declarative heuristics. Cubist discovers conditional, algebraic laws using techniques for linear regression.

LAGRAMGE LAGRAMGE, and it’s precursor LAGRANGE, were the first in a line of law discovery systems for differential equations. LAGRAMGE takes as input • time series for multiple variables, • an indicator that identifies the dependent variable, and • knowledge about the structure of plausible solutions. As output, the system produces an algebraic or differential equation for the dependent variable. LAGRAMGE has been applied in ecosystem dynamics, fjord hydrodynamics, and other domains.

Discovering Descriptive Knowledge: Summary The computational scientific discovery has a long history particularly in the context of descriptive knowledge. Such systems have played a large role in exploring, analyzing, and understanding data. Work in this area laid the foundations for the field of data mining both in terms of research and applications. However, the discovery of descriptive knowledge • can lead to a shallow interpretation of data; • generally avoids statements of causality; and • makes limited contact with the rich, theoretical content of a scientific discipline Next we will discuss systems that address these concerns.

Discovering Descriptive Knowledge

Discovering Descriptive Knowledge

Presentation Transcript

Descriptive Statistics

Descriptive Writing

descriptive observation

Discovering Robust Knowledge from Databases that Change

Are We Really Discovering “Interesting” Knowledge from Data?

Descriptive Writing

Descriptive Language

Descriptive Designs

Descriptive Essay

Descriptive Statistics

Descriptive statistics

Descriptive Statistics

Discovering Missing Background Knowledge in Ontology Matching

Mining the Web: Discovering New Biomedical Knowledge

Be descriptive:

Discovering Yourself, Discovering Your Future

Discovering Communicable Scientific Knowledge from Spatio-Temporal Data

Discovering Knowledge in Data Daniel T. Larose, Ph.D.

Discovering Robust Knowledge from Databases that Change

Discovering Robust Knowledge from Databases that Change

DESCRIPTIVE

Discovering