Operant Conditioning

Operant Conditioning Year 12 Psychology Unit 4 Area of Study 1(chapter 10, page 476)

Trial and Error Learning • Learning by trying different possibilities until the correct outcome is achieved. • Also known as ‘instrumental learning’ because the individual is ‘instrumental’ in learning the correct response. • More recently known as ‘Operant Conditioning’ because the individual ‘operates’ on the environment to solve a problem.

Trial and Error Learning:Edward Thorndike’s Cats • First studies of trial and error learning; he was interested in the study of animal intelligence. • Hungry cat put in a ‘puzzle box’; piece of fish put outside box (could be seen and smelt but was just out of cat’s reach). • To get fish, cat had to push a lever to open door on side of box. • Learning was measured as the time it took to escape from the box. • Cat tried numerous ineffective strategies (trial and error). • Eventually, cat accidentally pushed the lever and the door opened. • The cat was then rewarded with the food. • Cat put into the box again to repeat test: each time, cat used trial and error but became progressively quicker at using the lever. • Number of incorrect behaviours was also reduced • After approximately 7 trials, cat went directly to lever. • It became a deliberate response due to the cat learning the positive consequence of making that response.

Trial and Error Learning:Edward Thorndike’s Cats Activity: 10.11 • Based on his results, Thorndike developed the Law of Effect: • Behaviour that is accompanied or followed by ‘satisfying’ consequences is strengthened (more likely to occur). • E.g. pushing the lever is followed by getting the fish. • A behaviour that is followed by an ‘annoying’ consequence is weakened (less likely to occur). • E.g. not pushing the lever (doing anything else) is followed by still being stuck in the box.

Operant Conditioning • An organism will tend to repeat behaviours (operants = responses) that have desirable consequences (i.e. rewards) or that will enable it to avoid undesirable consequences (i.e. punishments). • Also, an organism will tend not to repeat behaviours that lead to undesirable consequences. • Stemmed from Thorndike’s work on ‘Instrumental Learning’ with cats. • Most famous experiments in Operant Conditioning were conducted by B.F. Skinner using his ‘Skinner Box’.

Operant Conditioning:Three-Phase Model • Based on Thorndike’s law of effect. • Stimulus (S); • Operant Response (R); • Consequence (C); • Sometimes also referred to as (S) because it is a stimulus in the form of a consequence. SO, S R C Where the probability of (R) occurring after (S) depends on the previous experiences of (C).

Operant Conditioning:Skinner’s Rats • Hungry rat was placed in a Skinner Box. • Scurried around randomly touching floor, walls etc. • Eventually accidentally pressed lever, which dispensed a food pellet: rat ate. • Rat continued random movements and eventually pressed the lever again: rat ate. • With additional repetitions of lever pressing followed by food, the rat’s random movements began to disappear and were replaced by more consistent lever pressing. • Eventually the rat was pressing the lever as fast as it could eat each pellet. • Pellet was a reward (reinforcer) for the correct response.

Elements ofOperant Conditioning Activities: 10.13 & 10.17 • Reinforcement: applying a reward/positive stimulus (positive reinforcement) or removing a negative stimulus (negative reinforcement) to encourage the production of desired behaviour. • Reinforcer: any object/event that increases the probability that an operant behaviour will occur again. • Punishment: applying a negative/unpleasant stimulus to discourage unwanted behaviour. • Schedules of Reinforcement: frequency and manner in which a desired response is reinforced (either positively or negatively).

Punishment Positive Reinforcement (Reward) Negative Reinforcement (if they lay eggs, they don’t get cooked!)

This SHOULD be punishment, but…

Elements of Operant Conditioning:Schedules of Reinforcement Activity: 10.14 • Continuous Reinforcement: reinforcer is applied immediately after every correct response/behaviour. • Partial Reinforcement: reinforcer is only applied after some correct responses, but not all. More difficult to change the behaviour, more resistant to extinction. • Ratio: reinforcement given after a certain number of correct responses. • Interval: reinforcement given after a certain amount of time has passed since the last correct response. • Fixed: reinforcement given on a regular basis, such as after every 3rd response or ever 10 seconds. • Variable: reinforcement given in an unpredictable or random way.

Elements of Operant Conditioning:Schedules of Reinforcement • So, using the info from the previous slide, there are four main schedules of partial reinforcement: • Fixed-ratio schedule: ? • Variable-ratio schedule: ? • Fixed-interval schedule: ? • Variable-interval schedule: ? See pages 484-485

Elements of Operant Conditioning:Schedules of Reinforcement • So, using the info from the previous slide, there are four main schedules of partial reinforcement: • Fixed-ratio schedule: reinforcer given after a set (fixed) number (ratio) of correct responses. • Variable-ratio schedule: reinforcer given after an unpredictable (variable) number (ratio) of correct responses. • Fixed-interval schedule: reinforcer given after a set (fixed) period of time (interval) since the last correct response. • Variable-interval schedule: reinforcer given after an unpredictable (variable) period of time (interval) since the last correct response.

Which Schedule is More Effective?

Factors That Influence theEffectiveness of Operant Conditioning • Order of Presentation: reinforcement/punishment must be presented after behaviour so that it is learned as a consequence of that behaviour. • Timing: reinforcement/punishment are most effective when presented immediately after behaviour (also increases strength of response). • Appropriateness: reinforcement/punishment must be specific to the likes/dislikes of the individual (otherwise my ‘reward’ could be your ‘punishment’).

Key Processes in Operant Conditioning Activity: 10.21 • Acquisition: speed may vary depending on complexity of behaviour being learned. • Extinction: less likely to occur when partial reinforcement is used. • Organism is used to not getting reinforcer every time. • Spontaneous Recovery, Stimulus Generalisation and Stimulus Discrimination: same as when discussed in Classical Conditioning.

Applications of Operant Conditioning Activity: 10.22 • Shaping: reinforcement is given for each response that moves closer to the final goal behaviour. • e.g. teaching a baby to talk: “Ddd”, “Daaa”, “Dad”. • Also known as ‘method of successive approximations’. • Token Economies: reinforcers (tokens) are given for desired behaviour and can then be exchanged for other reinforcers (rewards). • Tokens may also be removed as punishment. • Ensures reinforcement (reward) is appropriate. • Could backfire if token is misunderstood or underlying cause of behaviour is not addressed (see page 500).

Classical vs. Operant Conditioning Activity: 10.26 • Role of the Learner: • Passive (classical) vs. active (operant). • Timing of the Stimulus and Response: • Immediate (classical) vs. delayed (operant); • Response depends on stimuli (classical) vs. reinforcer depends on response operant; • Nature of the Response: • Reflexive/involuntary (classical) vs. voluntary (operant).

Reminders… • The next section of your textbook is ‘One-Trial Learning’ but we have already discussed this in the Classical Conditioning slides. • Page 507 outlines a good experiment. • Don’t forget to keep track of the key knowledge dot points that we are covering and tick each one as you become confident with it. • The person who can best monitor your progress and understanding is YOU – don’t cheat yourself. • Miss Moore is awesome. As if you’d forget that.

Operant Conditioning