1 / 47

Mini-course on Artificial Neural Networks and Bayesian Networks

Mini-course on Artificial Neural Networks and Bayesian Networks. Michal Rosen-Zvi. Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004. Section 1: Introduction.

juro
Télécharger la présentation

Mini-course on Artificial Neural Networks and Bayesian Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mini-course on Artificial Neural Networks and Bayesian Networks Michal Rosen-Zvi Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  2. Section 1: Introduction Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  3. Networks (1) • Networks serve as a visual way for displaying relationships: • Social networks are examples of ‘flat’ networks where the only information is relation between entities • Example - collaboration networks Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  4. Collaboration Network Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  5. Networks (2) Artificial Neural Networks represent rules – deterministic relations - between input and output Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  6. Networks (3) Bayesian Networks represent probabilistic relations - conditional independencies and dependencies between variables Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  7. Outline • Introduction/Motivation • Artificial Neural Networks • The Perceptron, multilayered FF NN and recurrent NN • On-line (supervised) learning • Unsupervised learning and PCA • Classification • Capacity of networks • Bayesian networks (BN) • Bayes rules and the BN semantics • Classification using Generative models • Applications: Vision, Text Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  8. Motivation • The research of ANNs is inspired by neurons in the brain and (partially) driven by the need for models of the reasoning in the brain. • Scientists are challenged to use machines more effectively for tasks traditionally solved by humans (example - driving a car, inferring scientific referees to papers and many others) Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  9. Hebbian Learning rule Perceptron Hopfield Network Gardner’s studies McCulloch and Pitts Model Pearl’s Book Minsky and Papert’s book History of (modern) ANNs and BNs Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  10. Section 2: On-line Learning Based on slides from Michael Biehl’s summer course Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  11. Section 2.1: The Perceptron Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  12. The Perceptron Input:  Adaptive Weights W Output: S Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  13. W Perceptron: binary output Implements a linearly separable classification of inputs Milestones: Perceptron convergence theorem, Rosenblatt (1958) Capacity, winder (1963) Cover(1965) Statistical Physics of perceptron weights, Gardner (1988) How does this device learn? Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  14. Learning a linearly separable rule from reliable examples • Unknown rule: ST()=sign(B) =±1 Defines the correct classification. Parameterized through a teacher perceptron with weights BRN, (BB=1) • Only available information: example data D= {, ST()=sign(B) for =1…P } Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  15. Learning a linearly… (Cont.) • Training: finding the student weights W • W parameterizes a hypothesis SS()=sign(W) • Supervised learning is based on the student performance with respect to the training data D • Binary error measure T(W)= [SS(),ST()] T(W)=0 if SS()ST() T(W)=1 if SS()=ST() Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  16. Off-line learning • Guided by the minimization of a cost function H(W), e.g., the training error H(W) tT(W) Equilibrium statistical mechanics treatment: • Energy H of N degrees of freedm • Ensemble of systems is in thermal equilibrium at formal temperature • Disorder avg. over random examples (replicas) assumes distribution over the inputs • Macroscopic description, order parameters • Typical properties of large sustems, P= N Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  17. On-line training • Single presentation of uncorrelated (new) {,ST()} • Update of student weights: • Learning dynamics in discrete time Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  18. On-line training - Statistical Physics approach • Consider sequence of independent, random • Thermodynamic limit • Disorder average over latest example self-averaging properties • Continuous time limit Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  19. Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  20. Section 3: Unsupervised learning Based on slides from Michael Biehl’s summer course Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  21. Dynamics of unsupervised learning Learning without a teacher? Real world data is, in general, not isotropic and structure less in input space. Unsupervised learning = extraction of information from unlabelled iputs Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  22. Potential aims • Correlation analysis • Clustering of data – grouping according to some similarity criterion • Identification of prototypes – represent large amount of data by few examples • Dimension reduction – represent high dimensional data by few relevant features Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  23. Dimensionality Reduction • The goal is to compress information with minimal loss • Methods: • Unsupervised learning • Principle Component Analysis • Nonnegative Matrix Factorization • Bayesian Models (Matrices are probabilities) Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  24. Section 4: Bayesian Networks Some slides are from Baldi’s course on Neural Networks Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  25. Bayesian Statistics • Bayesian framework for induction: we start with hypothesis space and wish to express relative preferences in terms of background information (the Cox-Jaynes axioms). • Axiom 0: Transitivity of preferences. • Theorem 1: Preferences can be represented by a real number π (A). • Axiom 1: There exists a function f such that π(non A) = f(π(A)) • Axiom 2: There exists a function F such that π(A,B) = F(π(A), π(B|A)) • Theorem2: There is always a rescaling w such that p(A)=w(π(A))is in [0,1], and satisfies the sum and product rules. Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  26. Probability as Degree of Belief • Sum Rule: P(non A) = 1- P(A) • Product Rule: P(A and B) = P(A) P(B|A) • BayesTheorem: P(B|A)=P(A|B)P(B)/P(A) • Induction Form: P(M|D) = P(D|M)P(M)/P(D) • Equivalently: log[P(M|D)] = log[P(D|M)]+log[P(M)]-log[P(D)] Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  27. The Asia problem “Shortness-of-breath (dyspnoea) may be due to Tuberculosis, Lung cancer or bronchitis, or none of them. A recent visit to Asia increases the chances of tuberculosis, while Smoking is known to be a risk factor for both lung cancer and Bronchitis. The results of a single chest X-ray do not discriminate between lung cancer and tuberculosis, as neither does the presence or absence of Dyspnoea.” Lauritzen & Spiegelhalter 1988 Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  28. x1 x2 x3 Graphical models “Successful marriage between Probabilistic Theory and Graph Theory” M. I. Jordan P(x1,x2,x3)  P(x1,x3) P(x2,x3) P(x1,x2,x3)  Y(x1,x3) Y(x2,x3) Applications: Vision, Speech Recognition, Error correcting codes, Bioinformatics Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  29. x1 x2 x3 Directed acyclic Graphs Involves conditional dependencies P(x1,x2,x3) = P(x1)P(x2)P(x3|x1,x2) Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  30. Directed Graphical Models (2) • Each node is associated with a random variable • Each arrow is associated with conditional dependencies (Parents–child) • Shaded nodes illustrates an observed variable • Plates stand for repetitions of i.i.d. drawings of the random variables Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  31. Directed graph: ‘real world’ example Statistical modeling of data mining: Huge corpus, authors and words are observed, topics and relations are learned. The author topic model Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  32. Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  33. Topics Model for Semantic Representation Based on a Professor Mark Steyver’s slides, a joint work of Mark Steyver’s (UCI) and Tom Griffiths (Stanford) Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  34. The DRM Paradigm The Deese (1959), Roediger, and McDermott (1995) Paradigm: • Subjects hear a series of word lists during the study phase, each comprising semantically related items strongly related to another non-presented word (“false target”). • Subjects (later) receive recognition tests for all words plus other distracted words including the false target. • DRM experiments routinely demonstrate that subjects claim to recognize false tagets. Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  35. Example: test of false memory effects in the DRM Paradaigm STUDY: Bed, Rest, Awake, Tired, Dream, Wake, Snooze, Blanket, Doze, Slumber, Snore, Nap, Peace, Yawn, Drowsy FALSE RECALL: “Sleep” 61% Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  36. A Rational Analysis of Semantic Memory • Our associative/semantic memory system might arise from the need to efficiently predict word usage with just a few basis functions (i.e., “concepts” or “topics”) • The topics model provides such a rational analysis Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  37. Doc1 Doc2 Doc3 … LOVE 34 0 3 Document/Term count matrix High dimensional space SOUL 12 0 2 SOUL SVD RESEARCH 0 19 6 LOVE SCIENCE … 0 … 16 … 1 … RESEARCH SCIENCE A Spatial Representation: Latent Semantic Analysis (Landauer & Dumais, 1997) EACH WORD IS A SINGLE POINT IN A SEMANTIC SPACE Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  38. SOCCER AC AB BC MAGNETIC FIELD Triangle Inequality constraint on words with multiple meanings Euclidian distance: AC AB + BC Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  39. N T D A generative model for topics Each document (i.e. context) is a mixture of topics. Each topic is a distribution over words. Each word is chosen from a single topic. Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  40. Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  41. Application to corpus data • TASA corpus: text from first grade to college • representative sample of text • 26,000+ word types (stop words removed) • 37,000+ documents • 6,000,000+ word tokens Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  42. Fitting the model • Learning is unsupervised • Learning means inverting the generative model • We estimate P( z | w ) – assign each word in the corpus to one of T topics • With T=500 topics and 6x106 words, the size of the discrete state space is (500)6,000,000 HELP! • Efficient sampling approach  Markov Chain Monte Carlo (MCMC) • Time & Memory requirements linear with T and N Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  43. Gibbs Sampling & MCMCsee Griffiths & Steyvers, 2003 for details • Assign every word in corpus to one of T topics • Sampling distribution for z: number of times word w assigned to topic j number of times topic j used in document d Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  44. A selection from 500 topics [P(w|z = j)] THEORY SCIENTISTS EXPERIMENT OBSERVATIONS SCIENTIFIC EXPERIMENTS HYPOTHESIS EXPLAIN SCIENTIST OBSERVED EXPLANATION BASED OBSERVATION IDEA EVIDENCE THEORIES BELIEVED DISCOVERED SPACE EARTH MOON PLANET ROCKET MARS ORBIT ASTRONAUTS FIRST SPACECRAFT JUPITER SATELLITE SATELLITES ATMOSPHERE SPACESHIP SURFACE SCIENTISTS ASTRONAUT BRAIN NERVE SENSE SENSES ARE NERVOUS NERVES BODY SMELL TASTE TOUCH MESSAGES IMPULSES CORD ORGANS SPINAL FIBERS SENSORY ART PAINT ARTIST PAINTING PAINTED ARTISTS MUSEUM WORK PAINTINGS STYLE PICTURES WORKS OWN SCULPTURE PAINTER ARTS BEAUTIFUL DESIGNS Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  45. Polysemy: words with multiple meanings represented in different topics FIELD MAGNETIC MAGNET WIRE NEEDLE CURRENT COIL POLES IRON COMPASS LINES CORE ELECTRIC DIRECTION FORCE MAGNETS BE MAGNETISM SCIENCE STUDY SCIENTISTS SCIENTIFIC KNOWLEDGE WORK RESEARCH CHEMISTRY TECHNOLOGY MANY MATHEMATICS BIOLOGY FIELD PHYSICS LABORATORY STUDIES WORLD SCIENTIST BALL GAME TEAM FOOTBALL BASEBALL PLAYERS PLAY FIELD PLAYER BASKETBALL COACH PLAYED PLAYING HIT TENNIS TEAMS GAMES SPORTS JOB WORK JOBS CAREER EXPERIENCE EMPLOYMENT OPPORTUNITIES WORKING TRAINING SKILLS CAREERS POSITIONS FIND POSITION FIELD OCCUPATIONS REQUIRE OPPORTUNITY Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  46. Word Association (norms from Nelson et al. 1998) Associate N. People: 1 EARTH 2 STARS 3 SPACE 4 SUN 5 MARS CUE: PLANET Model STARS SUN EARTH SPACE SKY Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

  47. Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

More Related