Mini-course on Artificial Neural Networks and Bayesian Networks

Mini-course on Artificial Neural Networks and Bayesian Networks Michal Rosen-Zvi Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

Section 1: Introduction Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

Networks (1) • Networks serve as a visual way for displaying relationships: • Social networks are examples of ‘flat’ networks where the only information is relation between entities • Example - collaboration networks Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

Collaboration Network Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

Networks (2) Artificial Neural Networks represent rules – deterministic relations - between input and output Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

Networks (3) Bayesian Networks represent probabilistic relations - conditional independencies and dependencies between variables Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

Outline • Introduction/Motivation • Artificial Neural Networks • The Perceptron, multilayered FF NN and recurrent NN • On-line (supervised) learning • Unsupervised learning and PCA • Classification • Capacity of networks • Bayesian networks (BN) • Bayes rules and the BN semantics • Classification using Generative models • Applications: Vision, Text Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

Motivation • The research of ANNs is inspired by neurons in the brain and (partially) driven by the need for models of the reasoning in the brain. • Scientists are challenged to use machines more effectively for tasks traditionally solved by humans (example - driving a car, inferring scientific referees to papers and many others) Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

Hebbian Learning rule Perceptron Hopfield Network Gardner’s studies McCulloch and Pitts Model Pearl’s Book Minsky and Papert’s book History of (modern) ANNs and BNs Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

Section 2: On-line Learning Based on slides from Michael Biehl’s summer course Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

Section 2.1: The Perceptron Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

The Perceptron Input:  Adaptive Weights W Output: S Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

W Perceptron: binary output Implements a linearly separable classification of inputs Milestones: Perceptron convergence theorem, Rosenblatt (1958) Capacity, winder (1963) Cover(1965) Statistical Physics of perceptron weights, Gardner (1988) How does this device learn? Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

Learning a linearly separable rule from reliable examples • Unknown rule: ST()=sign(B) =±1 Defines the correct classification. Parameterized through a teacher perceptron with weights BRN, (BB=1) • Only available information: example data D= {, ST()=sign(B) for =1…P } Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

Learning a linearly… (Cont.) • Training: finding the student weights W • W parameterizes a hypothesis SS()=sign(W) • Supervised learning is based on the student performance with respect to the training data D • Binary error measure T(W)= [SS(),ST()] T(W)=0 if SS()ST() T(W)=1 if SS()=ST() Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

Off-line learning • Guided by the minimization of a cost function H(W), e.g., the training error H(W) tT(W) Equilibrium statistical mechanics treatment: • Energy H of N degrees of freedm • Ensemble of systems is in thermal equilibrium at formal temperature • Disorder avg. over random examples (replicas) assumes distribution over the inputs • Macroscopic description, order parameters • Typical properties of large sustems, P= N Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

On-line training • Single presentation of uncorrelated (new) {,ST()} • Update of student weights: • Learning dynamics in discrete time Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

On-line training - Statistical Physics approach • Consider sequence of independent, random • Thermodynamic limit • Disorder average over latest example self-averaging properties • Continuous time limit Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

Section 3: Unsupervised learning Based on slides from Michael Biehl’s summer course Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

Dynamics of unsupervised learning Learning without a teacher? Real world data is, in general, not isotropic and structure less in input space. Unsupervised learning = extraction of information from unlabelled iputs Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

Potential aims • Correlation analysis • Clustering of data – grouping according to some similarity criterion • Identification of prototypes – represent large amount of data by few examples • Dimension reduction – represent high dimensional data by few relevant features Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

Dimensionality Reduction • The goal is to compress information with minimal loss • Methods: • Unsupervised learning • Principle Component Analysis • Nonnegative Matrix Factorization • Bayesian Models (Matrices are probabilities) Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

Section 4: Bayesian Networks Some slides are from Baldi’s course on Neural Networks Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

Bayesian Statistics • Bayesian framework for induction: we start with hypothesis space and wish to express relative preferences in terms of background information (the Cox-Jaynes axioms). • Axiom 0: Transitivity of preferences. • Theorem 1: Preferences can be represented by a real number π (A). • Axiom 1: There exists a function f such that π(non A) = f(π(A)) • Axiom 2: There exists a function F such that π(A,B) = F(π(A), π(B|A)) • Theorem2: There is always a rescaling w such that p(A)=w(π(A))is in [0,1], and satisfies the sum and product rules. Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

Probability as Degree of Belief • Sum Rule: P(non A) = 1- P(A) • Product Rule: P(A and B) = P(A) P(B|A) • BayesTheorem: P(B|A)=P(A|B)P(B)/P(A) • Induction Form: P(M|D) = P(D|M)P(M)/P(D) • Equivalently: log[P(M|D)] = log[P(D|M)]+log[P(M)]-log[P(D)] Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

The Asia problem “Shortness-of-breath (dyspnoea) may be due to Tuberculosis, Lung cancer or bronchitis, or none of them. A recent visit to Asia increases the chances of tuberculosis, while Smoking is known to be a risk factor for both lung cancer and Bronchitis. The results of a single chest X-ray do not discriminate between lung cancer and tuberculosis, as neither does the presence or absence of Dyspnoea.” Lauritzen & Spiegelhalter 1988 Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

x1 x2 x3 Graphical models “Successful marriage between Probabilistic Theory and Graph Theory” M. I. Jordan P(x1,x2,x3)  P(x1,x3) P(x2,x3) P(x1,x2,x3)  Y(x1,x3) Y(x2,x3) Applications: Vision, Speech Recognition, Error correcting codes, Bioinformatics Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

x1 x2 x3 Directed acyclic Graphs Involves conditional dependencies P(x1,x2,x3) = P(x1)P(x2)P(x3|x1,x2) Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

Directed Graphical Models (2) • Each node is associated with a random variable • Each arrow is associated with conditional dependencies (Parents–child) • Shaded nodes illustrates an observed variable • Plates stand for repetitions of i.i.d. drawings of the random variables Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

Directed graph: ‘real world’ example Statistical modeling of data mining: Huge corpus, authors and words are observed, topics and relations are learned. The author topic model Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

Topics Model for Semantic Representation Based on a Professor Mark Steyver’s slides, a joint work of Mark Steyver’s (UCI) and Tom Griffiths (Stanford) Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

The DRM Paradigm The Deese (1959), Roediger, and McDermott (1995) Paradigm: • Subjects hear a series of word lists during the study phase, each comprising semantically related items strongly related to another non-presented word (“false target”). • Subjects (later) receive recognition tests for all words plus other distracted words including the false target. • DRM experiments routinely demonstrate that subjects claim to recognize false tagets. Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

Example: test of false memory effects in the DRM Paradaigm STUDY: Bed, Rest, Awake, Tired, Dream, Wake, Snooze, Blanket, Doze, Slumber, Snore, Nap, Peace, Yawn, Drowsy FALSE RECALL: “Sleep” 61% Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

A Rational Analysis of Semantic Memory • Our associative/semantic memory system might arise from the need to efficiently predict word usage with just a few basis functions (i.e., “concepts” or “topics”) • The topics model provides such a rational analysis Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

Doc1 Doc2 Doc3 … LOVE 34 0 3 Document/Term count matrix High dimensional space SOUL 12 0 2 SOUL SVD RESEARCH 0 19 6 LOVE SCIENCE … 0 … 16 … 1 … RESEARCH SCIENCE A Spatial Representation: Latent Semantic Analysis (Landauer & Dumais, 1997) EACH WORD IS A SINGLE POINT IN A SEMANTIC SPACE Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

SOCCER AC AB BC MAGNETIC FIELD Triangle Inequality constraint on words with multiple meanings Euclidian distance: AC AB + BC Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

 N T D A generative model for topics Each document (i.e. context) is a mixture of topics. Each topic is a distribution over words. Each word is chosen from a single topic. Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

Application to corpus data • TASA corpus: text from first grade to college • representative sample of text • 26,000+ word types (stop words removed) • 37,000+ documents • 6,000,000+ word tokens Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

Fitting the model • Learning is unsupervised • Learning means inverting the generative model • We estimate P( z | w ) – assign each word in the corpus to one of T topics • With T=500 topics and 6x106 words, the size of the discrete state space is (500)6,000,000 HELP! • Efficient sampling approach  Markov Chain Monte Carlo (MCMC) • Time & Memory requirements linear with T and N Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

Gibbs Sampling & MCMCsee Griffiths & Steyvers, 2003 for details • Assign every word in corpus to one of T topics • Sampling distribution for z: number of times word w assigned to topic j number of times topic j used in document d Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

A selection from 500 topics [P(w|z = j)] THEORY SCIENTISTS EXPERIMENT OBSERVATIONS SCIENTIFIC EXPERIMENTS HYPOTHESIS EXPLAIN SCIENTIST OBSERVED EXPLANATION BASED OBSERVATION IDEA EVIDENCE THEORIES BELIEVED DISCOVERED SPACE EARTH MOON PLANET ROCKET MARS ORBIT ASTRONAUTS FIRST SPACECRAFT JUPITER SATELLITE SATELLITES ATMOSPHERE SPACESHIP SURFACE SCIENTISTS ASTRONAUT BRAIN NERVE SENSE SENSES ARE NERVOUS NERVES BODY SMELL TASTE TOUCH MESSAGES IMPULSES CORD ORGANS SPINAL FIBERS SENSORY ART PAINT ARTIST PAINTING PAINTED ARTISTS MUSEUM WORK PAINTINGS STYLE PICTURES WORKS OWN SCULPTURE PAINTER ARTS BEAUTIFUL DESIGNS Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

Polysemy: words with multiple meanings represented in different topics FIELD MAGNETIC MAGNET WIRE NEEDLE CURRENT COIL POLES IRON COMPASS LINES CORE ELECTRIC DIRECTION FORCE MAGNETS BE MAGNETISM SCIENCE STUDY SCIENTISTS SCIENTIFIC KNOWLEDGE WORK RESEARCH CHEMISTRY TECHNOLOGY MANY MATHEMATICS BIOLOGY FIELD PHYSICS LABORATORY STUDIES WORLD SCIENTIST BALL GAME TEAM FOOTBALL BASEBALL PLAYERS PLAY FIELD PLAYER BASKETBALL COACH PLAYED PLAYING HIT TENNIS TEAMS GAMES SPORTS JOB WORK JOBS CAREER EXPERIENCE EMPLOYMENT OPPORTUNITIES WORKING TRAINING SKILLS CAREERS POSITIONS FIND POSITION FIELD OCCUPATIONS REQUIRE OPPORTUNITY Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

Word Association (norms from Nelson et al. 1998) Associate N. People: 1 EARTH 2 STARS 3 SPACE 4 SUN 5 MARS CUE: PLANET Model STARS SUN EARTH SPACE SKY Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

Mini-course on Artificial Neural Networks and Bayesian Networks

Mini-course on Artificial Neural Networks and Bayesian Networks

Presentation Transcript

Artificial Neural Networks

Artificial Neural Networks

Artificial Neural Networks

Artificial Neural Networks

Artificial Neural Networks

Mini-course on Artificial Neural Networks and Bayesian Networks

Artificial Neural Networks

Artificial Neural Networks

Bayesian Neural Networks

Artificial Neural Networks

Artificial neural networks

Artificial Neural Networks

Artificial Neural Networks

Artificial Neural Networks

Bayesian Neural Networks

Artificial Neural Networks

Artificial Neural Networks

Artificial Neural Networks

Artificial Neural Networks