A Generalized Version Space Learning Algorithm for Noisy and Uncertain Data

A Generalized Version Space Learning Algorithm for Noisy and Uncertain Data T.-P. Hong, S.-S. Tseng IEEE Transactions on Knowledge and Data Engineering, Vol. 9, No. 2, 1997 2002. 11. 14 임희웅

Introduction • Generalized learning strategy of VS • Noisy & uncertain training data • Searching & pruning • Trade-off between including positive training instances and excluding negative ones • Trade-off between computational time consumed and the accuracy by pruning factors

New Definition of S/G • Addition Information : Count • Sum of positive/negative information implicit in the training instances presented so far. • S/G boundary • S • A set of the first i maximally consistent hypotheses. • No other hypothesis in S exists which is both more specific than another and has equal or larger count. • G • A set of the first j maximally consistent hypotheses. • No other hypothesis in G exists which is both more general than another and has equal or larger count.

FIPI • FIPI • Factor of Including Positive Instances • Trade-off between including positive training instances vs. excluding negative ones • 0~1, real number • 0: only to include positive training example • 1: only to exclude negative training example • 0.5: same importance

Certainty Factor (CF) • A measure for positiveness • -1~1, real number • -1: negative example • 1: positive example • In case of new training example of CF • S(1+CF)/2 positive example • G(1-CF)/2 negative example

Learning Process • Searching & Pruning • Searching • Generate and collects possible candidates into a large set • Pruning • Prune above set according to the degree of consistency of the hypotheses

Learning Process

Input & Output • Input • A set of n training instances each with CF • FIPI • i: the max # of hypotheses in S • J: the max # of hypotheses in G • Output • The hypotheses in sets S and G that are maximally consistent with the training instances.

Step 1 & 2 • Step 1 • Initialize S=, & G=<?> with count 0 • Step 2 • For each training instance with uncertainty CF, do Step 3 to Step 7.

Step 3 – Search 1 • Generalize/Specialize each hypothesis in S/G • ck: count of hypothesis in S/G • Attach new count • ck+(1+CF)/2 / ck+(1-CF)/2 •  S’/G’

Step 4 – Search 2 • Find the set S”/G” • Which Include/exclude only the new training instance itself • Set the count of each hypothesis in S”/G” to be (1+CF)/2 / (1-CF)/2

Step 5 – Pruning 1 • Combine S/G, S’/G’, and S”/G” • Identical hypotheses • only with maximum count is retained • If a particular hypothesis is both more general/specific than another and has an equal or smaller count, discard that.

Step 6 – Confidence Calc. • Confidence of each new hypothesis • For each hypothesis s with count cs in the new S • Find the hypothesis g in the new G that is more general than s and has the maximum count cg • Confidence = FIPI  cs + (1-FIPI)  cg • For each hypothesis g with count cg in the new G • Do the same.

s (count=cs), … specific S Confidence of s = FIPI  cs + (1-FIPI)  max(cg) g is more general than s g (count=cg), … G general Confidence of g = FIPI  cs + (1-FIPI)  max(cg)

Step 7 – Pruning 2 • Select only i/j hypotheses with highest confidence in the new S/G

Another Papers • GA • L. De Raedt, et al., “A Unifying Framework for Concept-Learning Algorithms”, Knowledge Engineering Rev., vol. 7, no. 3, 1989 • R. G. Reynolds, et al., “The Use of Version Space Controlled Genetic Algorithms to Solve the Boole Problem”. Int’l J. Artificial Intelligence Tools, vol. 2, no. 2, 1993 • Fuzzy • C. C. Lee, “Fuzzy Logic in Control Systems: Fuzzy Logic Controller Part1&2”, IEEE Trans. Systems, Man, and Cybernetics, vol. 20, no. 2, 1990 • L. X. Wang, et al., “Generating Fuzzy Rules by Learning from Examples”, Proc. IEEE Conf. Fuzzy Systems, 1992

A Generalized Version Space Learning Algorithm for Noisy and Uncertain Data

A Generalized Version Space Learning Algorithm for Noisy and Uncertain Data

Presentation Transcript

Data Quality, Data Cleaning and Treatment of Noisy Data

Uncertain Data Management

Representation Formalisms for Uncertain Data

Generalized Optimal Wavelet Decomposing Algorithm for Big Financial Market Data

Uncertain input and noisy-channel sentence comprehension

Uncertain Data Management for Sensor Networks

Clustering Uncertain Data

An Agent-Based Algorithm for Generalized Graph Colorings

Probabilistic Queries and Uncertain Data

Security with Noisy Data

Space Data Routers for Exploiting Space Data

Generalized Vector Space Model

COMP9315 Uncertain and Probabilistic Data

UV-diagram: a Voronoi Diagram for uncertain data

Managing Uncertain Data

LEARNING FROM NOISY DATA

A Space-Optimal Data-Stream Algorithm for Coresets in the Plane

A Fast Algorithm for Generalized Van Vleck Perturbation Theory

Black Box and Generalized Algorithms for Planning in Uncertain Domains

Data Quality, Data Cleaning and Treatment of Noisy Data

Uncertain Data