Kolmogorov Complexity and Universal Distribution

Kolmogorov Complexity and Universal Distribution Presented by Min Zhou Nov. 18, 2002

Content • Kolmogorov complexity • Universal Distribution • Inductive Learning

Principle of Indifference(Epicurus) • Keep all hypotheses that are consistent with the facts

Occam’s Razor • Among all hypotheses consistent with the facts, choose the simplest • Newton’s rule #1 for doing nature philosophy • We are to admit no more costs of nature things than such as are both true and sufficient to explain the appearances

Question • What does “simplest” mean? • How to define simplicity? • Can a thing be simple under one definition and not under another?

Bayes’ Rule • P(H|D) = P(D|H)*P(H)/P(D) -P(H) is often considered as initial degree of belief in H • In essence, Bayes’ rule is a mapping from prior probability P(H) to posterior probability P(H|D) determined by D

How to get P(H) • By the law of large numbers, we can get P(H|D) if we use many examples • Give as much information about that from only a limited of number of data • P(H) may be unknown, uncomputable, even may not exist • Can we find a single probability distribution to use as prior distribution in each different case, with a proximately the same result as if we had used the real distribution

Hume on Induction • Induction is impossible because we can only reach conclusion by using known data and methods. • So the conclusion is logically already contained in the start configuration

Solomonoff’s Theory of Induction • Maintain all hypotheses consistent with the data • Incoporate “Occam’s Razor”-assign the simplest hypotheses with highest probability • Using Bayes’ rule

Kolmogorov Complexity • k(s) is the length of the shortest program which, on no input, prints out s • k(s)<=|s| • There is a string s, k(s) >=n • k(s) is objective (program language independent) by Invariance Theorem

Universal Distribution • P(s) = 2-k(s) • We use k(s) to describe the complexity of an object. By Occam’s Razor, the simplest should have the highest probability.

Problem: P(s)>1 • For every n, there exists a n-bit string s, k(s) = log n, so P(s) = 2-log n = 1/n • ½+1/3+….>1

Levin’s improvement • Using prefix-free program • A set of programs, no one of which is a prefix of any other • Kraft’s inequality • Let L1, L2,… be a sequence of natural numbers. There is a prefix-code with this sequence as lengths of its binary code words iff n2-ln<=1

Multiplicative domination • Levin proved that there exists c, c*p(s) >= p’(s) where c depends on p, but not on s • If true prior distribution is computable, then use the single fixed universal distribution p is almost as good as the actually true distribution itself

Turing’s thesis: Universal turing machine can compute all intuitively computable functions • Kolmogorov’s thesis: the Kolmogorov complexity gives the shortest description length among all description lengths that can be effectively approximated according to intuition. • Levin’s thesis: The universal distribution give the largest distribution among all the distribution that can be effectively approximated according to intuition

Universal Bet • Street gambler Bob tossing a coin and offer: • Next is head “1” – give Alice 2$ • Next is tail “0” – pay Bob 1$ • Is Bob honest? • Side bet: flip coin 1000 times, record the result as a string s • Alice pay 1$, Bob pay Alice 21000-k(s) $

Good offer: • |s|=1000 2-1000 21000-k(s)= |s|=1000 2-k(s)<=1 • If Bob is honest, Alice increase her money polynomially • If Bob cheat, Alice increase her money exponentially

Notice • The complexity of a string is non-computable

Conclusion • Kolmogorov complexity – optimal effective descriptions of objects • Universal Distribution – optimal effective probability of objects • Both are objective and absolute

Reference • Ming Li, Paul Vitanvi, An Introduction to Kolmogorov complexity and its applications, 2nd Edtion Spring – Verky 1997

Kolmogorov Complexity and Universal Distribution

Kolmogorov Complexity and Universal Distribution

Presentation Transcript

Circuit Complexity, Kolmogorov Complexity, and Prospects for Lower Bounds

Kolmogorov complexity and its applications

Church, Kolmogorov and von Neumann: Their Legacy Lives in Complexity

Kolmogorov Complexity for analysis of DNA sequence

On The Effectiveness of Kolmogorov Complexity Estimation to Discriminate Semantic Types

Kolmogorov :

On Data Mining, Compression, and Kolmogorov Complexity.

Lecture 8. Kolmogorov complexity and Nature

Optimal Communication Complexity of Generic Multicast Key Distribution

The Kolmogorov-Smirnov Test

Algorithms and Complexity 2: Complexity Notation

Kolmogorov -Smirnov Test

Kolmogorov Complexity

A Kolmogorov Complexity Approach for Measuring Attack Path Complexity

Kolmogorov complexity and its applications

On Data Mining, Compression, and Kolmogorov Complexity.

Busch Complexity Lectures: A Universal Turing Machine

Kolmogorov :