Stefan Arnborg, KTH

DD2447, DD3342, fall 2011 Statistical Methods in Applied Computer Science http://www.nada.kth.se/~stefan Stefan Arnborg, KTH

SYLLABUS Common statistical models and their use: Bayesian, testing, and fiducial statistical philosophy Hypothesis choice Parametric inference Non-parametric inference Elements of regression Clustering Graphical statistical models Prediction and retrodiction Chapman-Kolmogoroff formulation Evidence theory, estimation and combination of evidence. Support Vector Machines and Kernel methods Vovk/Gammerman hedged prediction technology Stochastic simulation, Markov Chain Monte Carlo. Variational Bayes

LEARNING GOALS After successfully taking this course, you will be able to: -motivate the use of uncertainty management and statistical methodology in computer science applications, as well as the main methods in use, -account for algorithms used in the area and use the standard tools, -critically evaluate the applicability of these methods in new contexts, and design new applications of uncertainty management, -follow research and development in the area.

GRADING DD2447: Bologna grades Grades are E-A during 2009. 70% of homeworks and a very short oral discussion of them gives grade C. Less gives F-D. For higher grades, essentially all homeworks should be turned in on time. Alternative assignments will be substituted for those homeworks you miss. For grade B you must pass one Master's test, for grade A you must do two Master's tests or a project with some research content. DD3342: Pass/Fail Research level project, or deeper study of part of course

Previous course analyses on previous course pages.

Applications of Uncertainty everywhere Medical Imaging/Research (Schizophrenia) Land Use Planning Environmental Surveillance and Prediction Finance and Stock Marketing into Google Robot Navigation and Tracking Security and Military Performance Tuning …

Some Master’s Projects using this syllabus (subset) • Recommender system for Spotify • Behavior of mobile phone users • Recommender system for book club • Recommender for job search site • Computations in evolutionary genetics • Gene hunting • Psychiatry: genes, anatomy, personality • Command and control: Situation awareness • Diagnosing drilling problems • Speech, Music, …

Aristotle: Logic Logic as a semi-formal system was created by Aristotle, probably inspiredby current practice in mathematicalarguments. There is no record of Aristotle himselfapplying logic, but probably the Elementsof Euclid derives from Aristotles illustrations of the logical method. Which role has logic in Computer Science??

Visualization • Visualize data in such a way that the important aspects are obvious - A good visualization strikes you as a punch between your eyes (Tukey, 1970) • Pioneered by Florence Nightingale, first female member of Royal Statistical Society, inventor of pie charts and performance metrics

Probabilistic approaches • Bayes: Probability conditioned by observation • Cournot: An event with very small probability will not happen. • Vapnik-Chervonenkis: VC-dimension and PAC,distribution-independence • Kolmogorov/Vovk: A sequence is random if it cannot be compressed

Peirce: Abduction and uncertainty Aristotles induction , generalizingfrom particulars, is considered invalidby strict deductionists.Peirce made the concept clear, or atleast confused on a higher level. Abduction is verification by findinga plausible explanation. Key processin scientific progress.

Sherlock Holmes: common sense inference Techniques used by Sherlock are modeled on Conan Doyle’s professor in medical school, who followed the methodological tradition of Hippocrates and Galen. Abductive reasoning, first spelled out by Peirce, is found in 217 instances in Sherlock Holmes adventures - 30 of them in the first novel, ‘A study in Scarlet’.

If we have a probability modelof the world we know how to compute probabilities of events. But is it possible to learn aboutthe world from events we see? Bayes’ proposal was forgottenbut rediscovered by Laplace. Thomas Bayes,amateur mathematician

Antoine Augustine Cournot (1801--1877)Pioneer in stochastic processes, market theoryand structural post-modernism. Predicted demise of academic system due to discourses of administration and excellence(cf Readings). An alternative to Bayes’ method - hypothesis testing - is based on ’Cournot’s Bridge’:an event with very small probability will not happen

Kolmogorov and randomness Andrei Kolmogorov(1903-1987) is the mathematician best known for shaping probability theory into a modern axiomatized theory. His axioms of probability tells how probability measures are defined, also on infinite and infinite-dimensional event spaces and complex product spaces. Kolmogorov complexity characterizes a random string by the smallest size of a description of it. Used to explain Vovk/Gammerman scheme of hedged prediction. Also used in MDL (Minimum Description Length) inference.

Normative claim of Bayesianism • EVERY type of uncertainty should be treated as probability • This claim is controversial and not universally accepted: Fisher(1922), Cramér, Zadeh, Dempster, Shafer, Walley(1999) … • Students encounter many approaches to uncertainty management and identify weaknessess in foundational arguments.

Foundations for Bayesian Inference • Bayes method, first documented methodbased on probability: Plausibility of event depends on observation, Bayes rule: • Bayes’ rule organizing principle for uncertainty • Parameter and observation spaces can be extremely complex, priors and likelihoods also. • MCMC current approach -- often but not always applicable (difficult when posterior has many local maxima separated by low density regions) • Variational Bayes –approximate posterior by factorized function – result also approximate.

Showcase application: PET-camera Camera geometry&noise film scene regularity and also any other camera or imaging device …

PET camera likelihood prior D: film, count by detector pair jX: radioactivity in voxel ia: camera geometry Inference about X gives posterior,its mean is often a good picture

Sinogram and reconstruction Tumour Fruit FlyDrosophila family (Xray)

* WIRED on Total Information Awareness WIRED (Dec 2, 2002) article "Total Info System Totally Touchy" discusses the Total Information Awareness system. ~~~ Quote: "People have to move and plan before committing a terrorist act. Our hypothesis is their planning process has a signature." Jan Walker, Pentagon spokeswoman, in Wired, Dec 2, 2002. "What's alarming is the danger of false positives based on incorrect data," Herb Edelstein, in Wired, Dec 2, 2002.

Combination of evidence In Bayes’ method, evidence is likelihood for observation.

Particle filter-general tracking

Chapman Kolmogorov version of Bayes’ rule

Berry and Linoff have eloquently stated their preferences with the often quoted sentence: "Neural networks are a good choice for most classification problems when the results of the model are more important than understanding how the model works". “Neural networks typically give the right answer”

1950-1980: The age of rationality. Let us describe the world witha mathematical model and compute the best way to manage it!! This is a large Bayesian Network, a popular statistical model

Ed Jaynes devoted a large part of his career to promoteBayesian inference. He also championed theuse of Maximum Entropy in physics Outside physics, he received resistance from people who hadalready invented other methods.Why should statistical mechanics say anything about our daily human world??

Robust Bayes • Priors and likelihoods are convex sets of probability distributions (Berger, de Finetti, Walley,...): imprecise probability: • Every member of posterior is a ’parallell combination’ of one member of likelihood and one member of prior. • For decision making: Jaynes recommends to use that member of posterior with maximum entropy (Maxent estimate).

Based on Vapnik-Chervonenkis learning theorySeparate classes by wide margin hyperplane classifier,or enclose data points between close parallell hyperplanesfor regression Possibly after non-linear mapping to highdimensional spaceAssumption is only point exchangeability Given training sequence ((xi,yi), i=1…N),find y(N+1) given x(N+1). Y discrete: classification;Y real valued: regression. SVM and Kernel method

Classify with hyperplanes Frank Rosenblatt (1928 – 1971) Pioneering work in classifying byhyperplanes in high-dimensional spaces. Criticized by Minsky-Papert, sincereal classes are not normallylinearly separable. ANN research taken up again in1980:s, with non-linear mappingsto get improved separation.Predecessor to SVM/kernel methods

Find parallel hyperplanes Classification Red: true separatingplane. Blue: wide marginseparation in sample Classify by planebetween blue planes

SVM and Kernel method

Vovk/Gammerman Hedged predictions • Based on Kolmogorov complexity ornon-conformance measure • In classification, each prediction comes with confidence • Asymptotically, misclassifications appear independently and with probability 1-confidence. • Only assumption is exchangeability

Stefan Arnborg, KTH

Stefan Arnborg, KTH

Presentation Transcript

Research: Theory, Method, Practice Intro Lecture 2011 Stefan Arnborg, KTH

Välkommen till KTH

Reuben Mugisha and Stefan Hrastinski KTH Royal Institute of Technology

Webbanalys KTH EES

Legacy of Ed Jaynes -- approaches to uncertainty management. Stefan Arnborg, KTH

Stefan Arnborg, KTH, SICS Ingrid Agartz, Håkan Hall, Erik Jönsson,

Stefan Arnborg, KTH, SICS Ingrid Agartz, Håkan Hall, Erik Jönsson,

SNIC/KTH Proposal

Welcome to KTH!

Coordinated by KTH

Stefan Thorvaldsson – stefan@ywies

Welcome to KTH

KTH

Pre-worlds @ Arnborg Self -briefing

KTH Campuses

About KTH

Welcome to KTH

KTH Innovation