I sometimes tie myself in knots trying to pick apart operant and classical conditioning. But it transpires this isn’t only a newbie question, it’s actually something behaviourists ponder over a lot. So that put my mind at ease a touch, haha.
I thought I would do a post revising what I’ve learned. Or some of it, at least. This will be a wall of text… And of course, in a case it’s not already clear(!), I’m just learning all this stuff. For proper advice seek out a science-based equine behaviourist.
A, B, C.
Antecedent – Behaviour – Consequence
Antecedent is a big, odd, word. But I guess we can just think of it as a “prompt” or “cue”, either environmental or man-made.
The Behaviour is what follows. The Consequence determines whether the behaviour is likely to happen, in that particular circumstance, again.
An environmental example… It rains at an angle (antecedent/prompt). Horse puts itself at a tall hedge (behaviour). Horse avoids the worst of the weather (consequence). The horse has found relief and is likely to do so again in the future, all things being equal. This would be an environmental type of negative reinforcement (-R), more on that later. If the horse doesn’t get relief from the rain, it has no reason to choose standing at the hedge in the future.
Riding example… Rider puts their leg on lightly (cue). Horse moves forward (behaviour). Rider removes pressure (consequence). This would be a man-made type of -R. If the horse doesn’t get relief from the pressure, it has no reason to be “off the leg” in future.
Antecedent Arrangements are ways of altering the environment or prompt or cue so as to swerve the behaviour happening at all. Horse kicks stable door when waiting for food at mealtimes. If the horse isn’t in his stable when food is being prepared, he won’t be there to kick it in frustration. Swerve.
Appetitive = desirable/pleasant. Aversive = undesirable/unpleasant.
Food is appetitive to all horses. Pain is aversive to all horses. But circumstance means a lot. A worried animal might stop accepting food. A dog that wants you to throw his ball might no longer seem to care about treats. Pain is unpleasant, but self-harm can be somehow addictive and releasing to traumatised individuals. Overall, food, touch, freedom, and choice are appetitive whilst pain, discomfort, fear, and lack of choice are aversive.
But experiences are appetitive or aversive to the individual learner. We each decide what matters to us.
Blackberries and thistleheads are appetitive to my horse Skye, but they mean nothing to my friend’s pony Basil. My friend’s cob Monty intensely loves scratches, but Skye mostly only tolerates human touch at this stage and is sometimes uncertain of it. Some horses don’t mind a pat, but for others its a worryingly violent thing.
We might judge ourselves as being nice or nasty, but it isn’t up to us. What does the learner think of it?
Is the appetitive we have chosen appropriate? We can’t use scratches as a positive reinforcement unless a scratch is something the animal would actively seek out or “work for”.
Is the aversive we are using affective and ethical? Is it as light as possible and does the animal understand how to behave to get it to stop? Eg: does if truly understand how to respond to rider aids?
If we’re thinking about external motivation, appetitives and aversives are going to motivate the learner for very different reasons.
Primary reinforcers are things which hold intrinsic value to the individual. Food, care, and play are the most obvious examples. Food, in particular, is hugely powerful as a reinforcer for horses because so much of their life is devoted to it. Grazing, browsing.
Secondary reinforcers are things which come to hold appetitive value due to their association with primary reinforcers, through Classical Conditioning.
The pairing of a neutral stimulus with a meaningful one. Pavlov’s dogs! It can also be called Respondent Conditioning.
In clicker training, the most obvious example is that the click (which at first meant nothing) is now a predictor of food.
But as Shawna Karresch at Connection Training says in nearly all her videos, “the classical conditioning never stops.” You, the handler, may have begun neutral (assuming the horse doesn’t already find people aversive due to prior experience). If you are frequently paired with something pleasant/appetitive, you will take on some of that meaning. If you are frequently paired with something unpleasant/aversive, you will take on some of that meaning. How would we rather our animals, friends, and family members think of us? As a predictor of pleasant or unpleasant feelings?
The pairing of an aversive stimulus with an appetitive one, in teeny-tiny steps, to change the associations the animal has about it. Eg: horse is afraid of clippers. You find the point at which clippers are tolerated (maybe its on the other side of the yard, just in sight, smelling of oil, but not turned on), and provide something appetitive (generally food) at the same time. The next day you repeat, maybe bringing the clippers a tiny step closer. Maybe you have to break down the various scary aspects of the clippers (the smell, the sound, the feeling of vibration) over many many many sessions. You are stretching the comfort zone very very slowly. So slowly, the animal doesn’t even perceive it happening. Soon clippers = good things. They’ve been counter-conditioned.
Essentially as above, except without the starting point of something already aversive that needs its meaning changed. Letting the learner slowly discover, at their own pace, that new things are fine. During desensitisation they don’t need to”earn” their food like during clicker training. And they aren’t presented directly with the scary thing if it takes them over threshold (ie: if they show any signs of alarm).
Flooding is deeply problematic. A flag on a stick is the classic one. You chase or worry the animal with the flag (in an enclosed space or on the end of a lead) until it stops bothering to shy away from the flag. At that point, you take the flag (the pressure) away. Or some people don’t, they just carry on rubbing it all over the body. In the latter example, in particular, the animal has learned that it cannot avoid or escape the aversive thing. Flight hasn’t worked, telling you how it feels hasn’t worked, so it stops doing anything at all. The animal is now “quiet”. This is called Learned Helplessness and it looks like a safe horse. But suppressing fear isn’t the same thing as getting over fear. Suppressed behaviours/feelings reoccur at times of stress. Not safe.
Classical Conditioning is about learning that X = Y. Operant Conditioning is about learning that your actions have a consequence. That you can “operate” within the environment.
Operant Conditioning is concerned with consequences that have a feedback/influence on behaviour. Because sometimes, I guess, our behaviour has no real consequence at all and so we don’t learn anything from the experience. I’m ignoring habits and stereotypies, in this post, which are self-reinforcing, as I don’t know enough about it yet.
Consequences can be good or bad.
The Operant Conditioning Quadrant is made up of scenarios that can be either Reinforced or Punished.
Reinforced behaviours are those which persist or grow.
Punished behaviours are those which cease as a result of the punishment. Behaviours can also cease as a result of having no real consequence (this is called Extinction), which is the more effective way of getting rid of “bad” behaviours.
Reinforcers and Punishers are either added or removed to the situation.
In operant conditioning, Positive just means added and Negative just means removed. Positive does not mean “good” and negative does not mean “bad”.
|POSITIVE REINFORCEMENT: The addition of an appetitive (something desirable) which makes the behaviour more likely to happen again.||NEGATIVE REINFORCEMENT: The removal of an aversive (something undesirable) which makes the behaviour more likely to happen again.|
|POSITIVE PUNISHMENT: The addition of an aversive (something undesirable) which makes the behaviour less likely to happen again.||NEGATIVE PUNISHMENT: The removal of an appetitive (something desirable) which makes the behaviour less likely to happen again.|
If the behaviour increases, it is being reinforced. If the behaviour decreases it is either being effectively punished or having no consequence which makes it worth continuing with (Extinction).
In more detail…
POSITIVE REINFORCEMENT (+R)
- The addition of an appetitive (something desirable) which makes the behaviour more likely to happen again.
- Extrinsic motivation to works towards gaining something.
- Activates the SEEKING and CARE systems of the brain. These, along with PLAY, are the most useful systems to activate for effective learning and safe behaviour.
- Releases dopamine.
- As the “game” is understood, dopamine spikes shift from the receipt of the appetitive (normally food) to the moment where a task is cued. Ie: the learner enjoys figuring stuff out and gains confidence from knowing s/he can have an impact on their environment.
- Desired behaviour increases and is reliable.
- Horse offers more effort/ideas.
- A reward isn’t automatically a reinforcer. It is only technically a reinforcer if the target behaviour increases or sustains as desired.
- Environmental example: horse walks over a sapling. Discovers it’s the right height to have a nice scratch of the inner thigh. Horse seeks out low trees and branches in the future, to enjoy a scratch.
- Training example: super-early clicker training to teach manners around food. Handler stands at horse’s shoulder. Horse nudges and muzzles handler (the smell of food!). Handler ignores and stays safe. Horse gets bored, sighs, and swings its head away from handler. Handler clicks and treats. Horse quickly learns that when the handler says “stand” (prompt/cue), standing with eyes front (behaviour), will get a click and a treat (consequence). The “mugging” fades through a process of Extinction, as it serves no purpose. The behaviour of standing quietly in the presence of food increases. Due to Classical Conditioning, the horse perhaps now considers the following things appetitive: humans, particular humans, certain clothes/tools/tack (and this is one way of teaching animals when clicker is available and when it isn’t… it’s called Sign Tracking), certain smells, human voices, human laughter/giggling, human touch, the places where it happens (eg: arenas), maybe the time of day if you have a routine, and a dozen other things I can’t think of right now.
- Where it goes wrong: we can create conditions where the animal is reinforced even if we think it isn’t. If a behaviour persists or grows, something is reinforcing it (excepting things affected by health, pain, etc.). Eg: the animal is bored and wants attention or something to do. We tell it off for the behaviour. The animal repeats the behaviour more and more. Our “punishment” is, in this instance, actually a reinforcer. In a clicker context, poor timing or poor choices can result in us reinforcing problematic behaviours. For example, one might want to be careful to balance calm behaviours with energetic ones, if working with a pony that has to function with many handlers, children, etc. It wouldn’t be fair to teach an animal like that to always give 100% energy as in a different context this would be deemed dangerous and would be punished. So +R goes wrong when people are either unaware of the reinforcement the animal is getting, or when they don’t care to improve their timing/knowledge. But overall, the odd ill-timed click isn’t going to be a problem.
NEGATIVE REINFORCEMENT (-R)
- The removal of an aversive (something undesirable) which makes the behaviour more likely to happen again.
- Extrinsic motivation to work towards avoiding something.
- Possibly activates the SEEKING system if done as very light pressure/release on an animal with no prior unhappy associations? [This is now me thinking aloud, not something I’ve yet learned or figured out through study.]
- The animal works for release of pressure. Relief does not equal reward. A reward is technically something added, not something taken away.
- Behaviour increases and is reliable.
- Horse offers as much effort/thought as is needed.
- Environmental example: horse A is at a pile of hay. Horse B resource guards food and comes towards horse A pulling a face with a threatening posture. Horse B responds to the aversive pressure (body language) by leaving the pile of hay to find another one. Horse A has found relief from the situation.
- Training example: we put our leg on lightly and the horse moves forward. We instantly take our leg off, regardless of whether the horse is “forward enough” at this stage to signal that, “yes, moving away from the leg is the right answer!” This is why transitions are so much more valuable than just keeping going once moving.
- Where it goes wrong: if the aversive (normally some form of physical pressure) increases (escalates) further and further with no release the horse has no incentive to do the behaviour or to figure out what behaviour will work to make the pressure stop. If responding to one aversive conflicts with another, the horse has no good choice available to it. Eg: trying to get a horse forward off the leg when its prior experiences of forwardness are a yank in the gob or a fearful, punishing rider.
POSITIVE PUNISHMENT (+P)
- The addition of an aversive (something undesirable) which makes the behaviour less likely to happen again.
- Horse motivated to escape or avoid the punishment happening again.
- Activates the FEAR and possibly RAGE systems in the brain.
- Due to the learner being in a fearful or angry state of mind, lessons are over-learned and generalised in ways we can’t control.
- The behaviour stops and doesn’t return, in that one particular context in which it was punished.
- Environmental example: horse touches electric fence and receives a shock/fright. Horse avoids electric fences in the future. Doesn’t always work, as everyone will know!
- Training example: you have an “aggressive” horse in cross-ties on the yard. Each time it puts its ears back you spray its face with water. Horse learns that to avoid the annoying spray it shouldn’t express its feelings with its ears.
- Where it goes wrong: risk of poor judgement, generalisation by animal, and an inability to learn whilst afraid/angry. In the above example, the horse doesn’t stop feeling angry, he’s just stopped showing it. Which is far more dangerous. He’s also learned that hoses/water around the face aren’t nice (unhelpful if you ever want to bathe your horse). And perhaps his *reason* for the aggression is something easily fixed or avoided in the first place. Another example: horse is being led in from the field. Gets a fright from behind, but the handler is unaware. Horse runs forward and into the handler (safety in numbers). Handler yanks on a chifney bit, growls and shouts, smacks horse around head with the lead or reins. Horse is now worried not only of the thing which scared her from behind, but also of the human that she thought she could trust. Due to Classical Conditioning, horse perhaps now considers the following things aversive: humans (possibly even a specific, colour, height, or gender), distinctive items of clothing that human wore, associated smells, chifney bits, possibly all bridles, possibly therefore any feeling of “contact” on the reins, being lead from the field, seeing a human coming to lead her from the field, various other things I can’t even think of… And of course, whatever it was that frightened her in the first place. Punishment has to be lightning fast if the horse is going to have any chance of knowing what exact behaviour is being punished. It has to be so fast that the handler won’t actually have time to assess whether it’s appropriate. It has to be scary/horrible enough that the horse won’t repeat the unwanted behaviour and end up in a cycle of continued punishment (as then punishment may just stop working entirely). But not so scary/horrible that it causes generalised fear or anger. It has to be done when the animal is in a thinking frame of mind (not when their FEAR, PANIC, LUST, or RAGE systems are engaged), else they won’t learn the right lesson. We should avoid punishing behaviours that we could have swerved or faded in the first place. We have to be 100% confident that the behaviour isn’t a fair communication on the part of the horse (eg: pain or fear). And the punishment has to work. It has to stop the unwanted behaviour, otherwise it’s just aggression. That’s a lot of hoops to jump through.
NEGATIVE PUNISHMENT (-P)
- The removal of an appetitive (something desirable) which makes the behaviour less likely to happen again.
- Horse learns that behaviour X results in the loss of Z.
- Can activate the RAGE system if not careful, leading to frustration in the learner.
- Behaviour stops and doesn’t return (if the punishment has worked).
- Environmental example: horse A wants to play with horse B. Horse A is too rambunctious and so horse B disengages. Horse A learns to be a bit less bolshy when playing with that particular horse. This is apparently how dogs teach their puppies about acceptable bite pressure and how foals learn about acceptable play and grooming.
- Training example: you’re saying hello to a horse at the fence and giving him a wither scratch. Everything is nice until the horse gets too keen and nibbles at your clothes. Horse only thinks he is grooming, but you don’t want him to learn to groom humans with his teeth… So you walk away and in doing so take your pleasant scratches with you. The horse learns that using his teeth makes the nice scratches go away, so he uses only his lips in the future.
- Where it goes wrong: taking away something desirable can be very frustrating for the animal, especially if it is something highly desirable like their favourite food. Clicker trainers can accidentally cause this frustration. If they haven’t set up the situation quite right or are expecting too much from the animal for that particular moment, they might have to use -P to stop unwanted behaviours. Better by far to swerve that problem entirely, if possible, as frustration comes from the RAGE system and does not make for good or safe learning.
Salience is about what matters most to the individual. What is most pertinent in any given situation. Perhaps I teach a dog to sit and I say the word “sit” thinking that is my cue. But the dog, being so much more on body language than me, begins sitting at the shifting of my arm as I have also been lifting my clicker hand to create the behaviour. The movement is more salient than the voice, to that dog.
Perhaps I teach a horse to lift it’s foreleg by tap-tap-tapping with a schooling whip, but the horse isn’t quite doing what I want (maybe I want the leg higher or straighter) so I keep tapping until they finally get the right answer (-R, though not very cleverly done perhaps). The horse is pulling faces and getting annoyed and starting to think I’m not very nice. Perhaps they do something “naughty” and I use the schooling whip to smack them as punishment (+P), then carry on tapping. When they finally “get it” I remove the tapping/pressure, click, and treat (+R). Is it the click/treat or the removal of the annoying tapping and reduction of threat that is most important to that horse? Which is most salient?
TRUST ACCOUNTS AND RESILIENCE
As put beautifully in that quote by Max Easey the other day, relationships are classically conditioned.
This is often also referred to as the Trust Account between any two individuals.
We should aim to make more appetitive deposits in the account than aversive withdrawals. Ie: we’re sweet more often than we’re critical… we’re generous more often than greedy… we praise more than we chastise…
If we keep that in mind with our animals (and, let’s be honest, our friends!), if we treat them sweetly, then on the rare and unfortunate occasions that we might need to be aversive we will (hopefully) be forgiven for it. This ability to recover is Resilience. But it has to balance out from the animal’s perspective. We need to be paying in vastly more than we’re taking out.
And with an animal that already considers humans (or the things we do) aversive, we need to take affirmative action. We need to go consciously to the side of appetitive stimulation, to bring the balance back up. To get out of the huge overdraft we’ve inherited!
My head is now done for the day. In terms of classical and operant conditioning, I think these are the most pertinent points. I’ve referenced the emotional brain systems without going into detail on them as they’re already mentioned in other posts.
Useful revision session though. Things are starting to click (ha) and slowly become second-nature now. And writing it all down let’s me see where the gaps in knowledge are. I do love learning. If only one could get paid for studying!