Psych 85-419: Introduction to Parallel Distributed Processing

Psych 85-419: Introduction to Parallel Distributed Processing Michael Harm, Professor Anthony Cate, TA

Course Objectives • Solid background in the philosophical and computational underpinnings of modern connectionist (PDP) research • Experience with the construction and analysis of pdp models • Appreciation of the benefits (and limitations!) of PDP approaches to psychological research

By May, You All Should Be Able To: • Recognize when a PDP model may be useful to your research, • Build a model of a phenomena that interests you • Understand the contributions of models you see in the literature • … and/or critique them!

Course Will Be Geared Towards Two Communities • Modelers who plan to use these techniques in their work • Researchers who want to better understand these models and their implications, even if they don’t want to be a modeler • straw poll: which group do you fall into?

Grading • Four homeworks, each of which count for 10% of your final grade • One exam, worth 15% of your grade • A project proposal, worth 5% • A final project worth 30% of your grade • Class participation, worth 10% of your grade • No final exam

Class Web Page www.cnbc.cmu.edu/~mharm/courses/pdp_spring2001/ Watch for updates

What is Expected of You • Readings assigned for each class. Read them! • Come prepared with thoughtful questions • Participate in class discussions • Complete assignments on time • Come to us if you need help! Don’t wait until the last minute!

Overview of Class • What is PDP, anyway? (That’s next) • Processing and Constraint Satisfaction • Simple learning and distributed representations • Learning internal representations • Unsupervised learning • Psychological phenomena • Language, vision, higher level cognition

So What is PDP, Anyway? • Start by describing more traditional approaches • Why would one want a different approach? • PDP defined • A case study • History of the approach

Traditional Approach to Studying Cognition • The mind is like a computer • There are rules, facts and propositions • There is a logic engine that operates over these rules and propositions • Generates new propositions, new facts, new rules • The Name of the Game: Identify the rules and propositions for a given phenomena

Who Uses This Method (Implicitly or Otherwise)? • Traditional AI, e.g. unification • if (not (married X)) -> (bachelor X) • (not (married JOHN)) implies JOHN is a bachelor • Traditional linguistics (Chomsky, etc.) • Philosophy of Mind (Fodor, etc.) • Psychologists (some, at least)

Why Would One Question This Approach? • Descriptive versus explanatory • An equation for an ellipse describes planetary motion. • But planets do not compute the equation for an ellipse to decide where to go! • Has an air of Greek Mythology about it • Creating theories to account for data, with no external validation

Why Would One Question This Approach (More) • Doesn’t seem to be how the mind actually works • Robust to damage • Graded degradation in performance • Doesn’t seem to be a single “logic engine” shared across all domains

Why Would One Question This Approach (Yet More) • No obvious link to neuroscience • Single cell recordings, systems neuroscience • Impairments that have different effects on cells • Method is typically grounded in symbolic rules • What about phenomena that aren’t rule governed?

So, Fine. Now Will You Tell Us What PDP Is? • The idea that cognition can arise through the interactions of simple processing units • Blind to the global task at hand • Output activity based on state and summed input • … kind of like neurons • … and that this may be a good way to study cognition

The Name of the Game • Construct a model consisting of processing units and connections between them • Guided by theory, observation, hypothesis • Explore the behavior of the model. Relate to behavioral data • Use model to gain insights into causes of behavioral data

A Case Study: Frequency by Regularity in Reading • Regular words are words whose spelling to sound correspondences are predictable from other words. Like gave, save, wave, pave. • Exception words are ones that violate the normal rules of pronunciation, like have, yacht, sergeant • Word frequency is how often it is seen. Words like the versus yacht

Frequency by Regularity • Exception words affected by frequency • Regular words not (more or less)

Traditional Account (Coltheart and colleagues) • One cognitive module is responsible for reading exception words. It is frequency sensitive • Another module can only read regular items. It is rule governed, frequency insensitive.

An Alternative Account, Part I: The Existence Proof • Seidenberg & McClelland ‘89 constructed large scale connectionist model of reading • Mapped spelling patterns onto pronunciation • Observed same frequency by regularity interaction • Therefore, data does not necessitate separate systems for rules and exceptions

An Alternative Account, Part II: Analysis • Plaut et al ‘96 analyzed a network that exhibited frequency by regularity interaction • Accounted for effect through mathematical analysis of network • This is a different kind of theorizing. • Rooted in computational principles • Discovered, rather than designed

History I: The Age of Discovery • McCulloch & Pitts (1943) • Networks of simple logic gates can compute any finite logic proposition • Hebb (1949) • Clear definition of a learning rule for neurons • Selfridge (1958) • Intelligent behavior from interactions of many agents • And many others...

History II: The Cold Years • Minsky & Papert ‘69: Simple associators cannot compute problems that are not linearly separable • The XOR problem • Many problems aren’t linearly separable • Led to scarcity of funding for such research. Golden years of artificial intelligence.

History III:Renaissance of the Mid ‘80s • Discovery of training algorithms that are more powerful than simple associators • Could compute problems that are not linearly separable • Resurgence in interest in use of these models for theory construction

History IV:The Counter Attack • Pinker & Prince ‘88 launched attack on PDP account of inflectional morphology • Fodor & Pylyshyn ‘88 attacked connectionist enterprise as a whole • Besner et al., Coltheart et al. attacked findings of Seidenberg & McClelland ‘89 model • McCloskey: Networks are not theories!

Where We Are Today High Level Reasoning, Creativity Parsing Sentences Morphology Reading Classical Conditioning, Priming Low Level

For Next Class • Read PDP1, Chapter 2 • Optional: Read PDP1, Chapter 1

Psych 85-419: Introduction to Parallel Distributed Processing

Psych 85-419: Introduction to Parallel Distributed Processing

Presentation Transcript

Distributed File Systems

Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More

An Introduction and Overview of the Parallel Curriculum Model: Promise and Process

Chapter 11

An Introduction to Parallel Processing

Lecture 3: Introduction to Parallel Computing Using CUDA

Parallel Algorithms and Computing Selected topics

Optimization of Java-Like Languages for Parallel and Distributed Environments

Chapter 3

Chapter 22: Distributed Databases

CS 347: Parallel and Distributed Data Management Notes X: S4

Introduction to XML and its processing techniques

From Distributed Processing Systems to the buzz word of the day and back

Parallel and Series Circuits

Distributed Databases

CS 591x Clutter Computing and Programming Parallel Computers

Introduction to Parallel Computing

Parallel Processing Lab 1

Parallel Algorithms and Computing Selected topics

Chapter 23

CPSC 463 Networks and Distributed Processing

CS 347: Parallel and Distributed Data Management Notes07: Data Replication