Instrumental Conditioning: Motivational Mechanisms

Instrumental Conditioning: Motivational Mechanisms

Contingency-Shaped Behaviour • Uses three-term contingency • Reinforcement schedule (e.g., FR10) imposes contingency • Seen in non-humans and humans

Rule Governed Behaviour • Particularly in humans • Behaviour can be varied and unpredictable • Invent rules or use (in)appropriate rules across conditions (e.g., language) • Age-dependent, primary vs. secondary reinforcers, experience

Role of Response in Operant Conditioning • Thorndike • Performance of response necessary • Tolman • Formation of expectation • McNamara, Long & Wike (1956) • Maze • Running rats or riding rats (cart) • Association what is needed

Role of the Reinforcer • Is reinforcement necessary for operant conditioning? • Tolman & Honzik (1930) • Latent learning • Not necessary for learning • Necessary for performance

Day 11 Results no food Average Errors no food until day 11 food Days

Associative Structure in Instrumental Conditioning • Basic forms of association • S = stimulus, R = response, O = outcome • S-R • Thorndike, Law of Effect • Role of reinforcer: stamps in S-R association • No R-O association acquired

Hull and Spence • Law of Effect, plus a classical conditioning process • Stimulus evokes response via Thorndike’s S-R association • Also, S-O association creates expectancy of reward • Two-process approach • Classical and instrumental are different

One-Process or Two-Processes? • Are instrumental and classical the same (one process) or different (two processes)? • Omission control procedure • US presentation depends on non-occurrence of CR • No CR, then CS ---> US • CR, then CS ---> no US

Trial with a CR CS US CR Trial without a CR CS US CR Omission Control

Gormenzano & Coleman (1973) • Eyeblink with rabbits • US=shock, CS=tone • Classical group: 5mA shock each trial, regardless of response • Omission group: making eyeblink CR to CS prevents delivery of US

One-process prediction: • CR acquisition faster and stronger for Omission group • Reinforcement for CR is shock avoidance • In Classical group CR will be present because it somehow reduces shock aversiveness • BUT… • CR acquisition slower in Omission group • Classical conditioning extinction (not all CSs followed by US) • Supports Two-process theory

Classical in Instrumental • Classical conditioning process provides motivation • Stimulus substitution • S acquires properties of O • rg = fractional anticipatory goal response • Response leads to feedback • sg = sensory feedback • rg-sg constitutes expectancy of reward

rg - sg Timecourse S R O Through stimulus substitution S elicits rg-sg, giving motivational expectation of reward

Prediction • According to rg-sg CR should occur before operant response; but doesn’t always • Dog lever pressing on FR33 ---> PRP • Low lever presses early, then higher; but salivation only later Lever pressing Magnitude salivation Time from start of trial

Modern Two-Process Theory • Classical conditioning in instrumental • Neutral stimulus ---> elicits motivation • Central Emotional State (CES) • CES is a characteristic of the nervous system (“mood”) • CES won’t produce only one response • Bit annoying re: prediction of effect

Prediction • Rate of operant response modified by presentation of CS • CES develops to motivate operant response • CS from classical conditioning also elicits CES • Therefore, giving CS during instrumental conditioning should alter CES that motivates instrumental response

US CS Appetitive Aversive (e.g., food) (e.g., shock) CS+ Hope Fear CS- Disappointment Relief “Explicit” Predictions • Emotional states

Behavioural predictions Aversive US Instrumental schedule CS+(fear) CS-(relief) Positive reinforcement decrease increase Negative reinforcement increase decrease

R-O and S(R-O) • Earlier interpretations had no response-reinforcement associations • Intuitive explanation, though • Perform response to get reinforcer

Colwill & Rescorla (1986) • R-O association • Devalue reinforcer post-conditioning • Does operant response decrease? • Bar push right or left for different reinforcers • Food or sucrose Testing of Reinforcers normal reinforcer Mean responses/min. devalued reinforcer Blocks of Ext. Trials

Interpretation • Can’t be S-R • No reinforcer in this model • Can’t be S-O • Two responses, same stimuli (the bar), but only one response affected • Conclusion • Each response associated with its own reinforcer • R-O association

Hierarchical S-(R-O) • R-O model lacks stimulus component • Stimulus required to activate association • Really, Skinner’s (1938) three term contingency • Old idea; recent empirical testing

Colwill & Delameter (1995) • Rats trained on pairs of S+ • Biconditional discrimination problem • Two stimuli • Two responses • One reinforcer • Match the correct response to the stimuli to be reinforced • Training, reinforcer devaluation, testing

Training • Tone: lever --> food; chain --> nothing • Noise: chain --> food; lever --> nothing • Light: poke --> sucrose; handle --> nothing • Flash: handle --> sucrose; poke --> nothing • Aversion conditioning • Testing: marked reduction in previously reinforced response • Tone: lever press vs. chain • Noise: chain vs. lever • Light: poke vs. handle • Flash: handle vs. poke

Analysis • Can’t be S-O • Each stimulus associated with same reinforcer • Can’t be R-O • Each response reinforced with same outcome • Can’t be S-R • Due to devaluation of outcome • Each S activates a corresponding R-O association

Reinforcer Prediction, A Priori • Simple definition • A stimulus that increases the future probability of a behaviour • Circular explanation • Would be nice if we could predict beforehand

Need Reduction Approach • Primary reinforcers reduce biological needs • Biological needs: e.g., food, water • Not biological needs: e.g., sex, saccharin • Undetectable biological needs: e.g., trace elements, vitamins

Drive Reduction • Clark Hull • Homeostasis • Drive systems • Strong stimuli aversive • Reduction in stimulation is reinforcer • Drive is reduced • Problems • Objective measurement of stimulus intensity • Where stimulation doesn’t change or increases!

Trans-situationality • A stimulus that is a reinforcer in one situation will be a reinforcer in others • Subsets of behaviour • Reinforcing behaviours • Reinforcable behaviours • Often works with primary reinforcers • Problems with other stimuli

Primary and Incentive Motivation • Where does motivation to respond come from? • Primary: biological drive state • Incentive: from reinforcer itself

But… Consider: • What if we treat a reinforcer not as a stimulus or an event, but as a behaviour in and of itself • Fred Sheffield (1950s) • Consummatory-response theory • E.g., not the food, but the eating of food that is the reinforcer • E.g., saccharin has no nutritional value, can’t reduce drive, but is reinforcing due to its consumability

Premack’s Principle • Reinforcing responses occur more than the responses they reinforce • H = high probability behaviour • L = low probability behaviour • If L ---> H, then H reinforces L • But, if H ---> L, H does not reinforce L • “Differential probability principle” • No fundamental distinction between reinforcers and operant responses

Premack (1965) • Two alternatives • Eat candy, play pinball • Phase I: determine individual behaviour probability (baseline) • Gr1: pinball (operant) to eat (reinforcer) • Gr2: eating candy (operant) to play pinball (reinforcer) • Phase II (testing) • T1: play pinball (operant) to eat (reinforcer) • Only Gr1 kids increased operant • T2: eat (operant) to play pinball (reinforcer) • Only Gr2 kids increased operant

Premack in Brief Any activity… …could be a reinforcer … if it is more likely to be “preferred” than the operant response.

Response Deprivation Hypothesis • Restriction to reinforcer response • Theory: • Impose response deprivation • Now, low probability responses can reinforce high probability responses • Instrumental procedures withhold reinforcer until response made; in essence, deprived of access to reinforcer • Reinforcer produced by operant contingency itself

Behavioural Regulation • Physiological homeostasis • Analogous process in behavioural regulation • Preferred/optimal distribution of activities • Stressors move organism away from optimum behavioural state • Respond in ways to return to ideal state

Behavioural Bliss Point • Unconstrained condition: distribute activities in a way that is preferred • Behavioural bliss point (BBP) • Relative frequency of all behaviours in unconstrained condition • Across conditions • BBP shifts • Within condition • BBP stable across time

Imposing a Contingency • Puts pressure on BBP • Act to defend challenges to BBP • But requirements of contingency (may) make achieving BBP impossible • Compromise required • Redistribute responses so as to get as close to BBP as possible

Minimum Deviation Model • Behavioural regulation • Due to imposed contingency: • Redistribute behaviour • Minimize deviation of responses from BBP • Get as close as you can

restricted running 40 30 20 10 Time drinking restricted drinking 10 20 30 40 Time running

Strengths of BBP Theory • Reinforcers: not special stimuli or responses • No difference between operant and reinforcer • Explains new allocation of behaviour • Fits with findings on cognition for cost:benefit optimization

Instrumental Conditioning: Motivational Mechanisms