iStock via sorbetto
Wolfram Schultz sat in front of an oscilloscope, gazing at the neural activity inside an awake monkey’s brain. Something wasn’t right.
He was attempting to figure out how neurons that release the neurotransmitter, dopamine, contribute to motor control. The work could help us understand Parkinson’s disease: a disease associated with the loss of dopamine-producing cells and movement problems.
Schultz and his colleagues wanted to monitor the neural activity of dopamine cells as a monkey moved. So, they developed an experiment where they monitored a monkey as it grabbed a treat from a little box with a door. The researchers assumed dopamine neurons would fire as the door popped open and the monkey moved to grab the treat.
“You’d pull a release, and a rubber band was kicked up, ‘pluck,’ pulling up the door,” says Schultz. “And immediately there was a very solid, prominent response in almost every dopamine neuron.”
But the timing was wrong. The neurons weren’t responding as the monkey moved for the snack—they lit up as the door snapped open. What’s more, as the monkey got used to the task, receiving the same snack each time, the neurons went silent.
Schultz published his findings, relating the activity of these neurons to learning and the “occurrence of an unpredicted reward,” but knew there was more to the story. “We needed to find out what that response was, but we didn’t have the right concept for it.” “We needed to find out what that response was, but we didn’t have the right concept for it.”
Schultz’s meticulous work piqued the interest of mathematician Peter Dayan. Dayan was on the hunt for real-world evidence of a crucial variable involved in an artificial intelligence technique called reinforcement learning. That variable, known as “reward prediction error,” or RPE, describes how animals repeat behaviors only if they hold the promise of a reward—or, more specifically, an outcome that exceeds expectations.
For Dayan, Schultz’s data, shining like a holy grail, was “perfect.”
“The activity in these dopamine neurons fitted exactly what we'd expect from a reward prediction error from a theoretical perspective,” says Dayan. “And what's critical about that is it's about predicting the future.”
In 2017, Dayan, Schultz, and psychiatrist Ray Dolan received the Brain Prize from the Lundbeck Foundation for their work on the neural underpinnings of reward and its implications for treating a variety of psychiatric disorders, like depression and addiction. Dayan’s chance connection of RPE to Schultz’s brain recordings continues to bring together theorists and experimentalists to better understand how the brain enables us to make the most rewarding decisions and what can go wrong when the reward system malfunctions.
The brain’s simplest unit of prediction
According to Dayan, reinforcement learning is best known in artificial intelligence, but similar concepts have long guided research in other fields. Engineers have the delta rule; psychologists have the Rescorla-Wagner theory.
In these concepts, there is an actor—whether an algorithm, a person, or an animal—to test behavior. If the behavior is more beneficial than expected, the actor learns the behavior is worth repeating in the future. If the behavior is less than fulfilling, the actor learns to avoid the behavior. The RPE is the feedback from each event that teaches the actor to pursue rewarding behaviors.
In developing his concept, Schultz talked to Tony Dickinson in Cambridge who pointed him to the Rescorla-Wagner learning theory. This insight proved groundbreaking: dopamine was signaling more than just pleasure for the animal. It was a teaching signal for learning.
iStock via sesame
Then came Dayan, discovering even more about the dopamine signal and conforming the reinforcement learning model with the more efficient temporal differences. Schultz could now explain the curious nature of how and when dopamine was released as the monkey progressed through the task.
“We kept running into the same problem: why is this neuron not responding to reward, although the animal is getting a reward, when a minute ago, it was responding to the reward?” says Schultz. “Peter helped us all see it. There are two events, the reward, and the prediction. But it’s confusing if you’re just expecting the dopamine to be a signal of reward.”
Computers can learn to do complicated tasks, like play chess, thanks to relatively simple algorithms that use RPE to reinforce rewarding behaviors. Animal brains appear to use a similar strategy, releasing dopamine to hone their behaviors towards the greatest perceived benefit.
Armed with a better framework for testing brain and behavior, Schultz and many others probed deeper into the biology of reward, hoping to tease apart the circuitry that gave the brain its exceptional ability to make the right decisions.
From simple reward to a rewarding life
The dopamine signals Schultz measured are part of a complex circuit spanning multiple areas of the brain, like the striatum (which directly encodes RPE), the prefrontal cortex (which enables higher-level decision making about actions), and the amygdala (which contextualizes actions with emotions, like fear).
This means the release of dopamine in the brain reflects much more than a simple positive or negative RPE, as Dayan states.
“We now know that there are dopamine neurons that respond not only to RPE but also to punishment,” says Dayan. “But punishment is bizarrely more complicated than reward.”
It’s likely several different layers process reward or punishment signals in the brain, according to Dayan, like neurons that combine information from other brain regions to help us make nuanced choices.
“It’s actually quite amazing that we go through our lives happily making decisions all the time..."
“It’s actually quite amazing that we go through our lives happily making decisions all the time, like what to take off an IKEA shelf,” says Dayan. “Our reward system is fairly general, and the cortex provides the cues, having learned from prior RPEs where we can expect future rewards.”
Schultz likes to think of reward in economic terms—how does an animal maximize its reward over time? Schultz’s latest experiments, still harnessing the behaviors of well-trained monkeys, continue to benefit from a conceptual understanding of reward, for instance, showing how a monkey’s brain activity correlates with the changing and abstract value of a reward, as the monkey takes its fill of a sweet beverage and eventually loses interest.
“This understanding of neural reinforcement learning and decision making in the brain has fostered a new generation of people who are trained in both experiments and theory,” says Dayan. “There is really no other area in neuroscience where the two sides mutually benefit, and happily work together, to such a fruitful degree.”
Yet there is a third potential beneficiary to this interdisciplinary study of reward, one that psychiatrist Ray Dolan has long sought to help: patients who struggle with dysfunctional reward-seeking behavior.
Modest expectations for maximum reward
Dolan has been particularly interested in how reward works in people with depression and anxiety, two disorders frequently occurring alongside anhedonia, a dysfunction with feeling pleasure. Using functional magnetic resonance imaging (fMRI), Dolan had found dopamine-driven signals of RPE in the human brain analogous to those Schultz found in monkeys.
Dolan’s research has since demonstrated these signals of RPE are still present in people with major depressive disorder, even though depression can interfere with motivation and reward—a finding with implications for treating depression.
“With [cognitive] behavioral therapy, for example, we think [we] might optimize how [people with disorders] experience the world and perhaps lead them to have more positive prediction errors,” Dolan told the Lundbeck Foundation in an interview in 2017. “And that leads to a general improvement in their mood state.”
Dayan, too, hopes an understanding of RPE will lead to new forms of cognitive behavioral therapy that focus on the specific issues a patient is experiencing.
“If we could observe how a patient’s decision-making changes, that might be a hint that something is different about their disease,” says Dayan. “This could give us what we need to design the right behavioral or pharmacological intervention to help them.”
iStock via sorbetto
On the other end of the spectrum, people with addiction seem to struggle with too much reward-seeking behavior. Drugs like cocaine produce a dopamine surge that is “way beyond the normal,” says Schultz. This has consequences for a person’s ability to pursue everyday rewards because they may no longer hold a strong appeal.
“You could say that people with addiction are getting used to a very high reward expectation,” Schultz speculated. “They can only get a positive prediction error by having drugs because normal rewards are not strong enough to produce a positive reward prediction error against the super high prediction that comes from cocaine or opiates.”
Ultimately, the increased understanding of RPE from Schultz, Dayan, and Dolan has implications for everyone. Modest expectations put modest rewards within our reach, and occasionally an unexpected reward surprises us—and that’s a good thing.
“Those surprises will be signaled by an outpouring of dopamine, and that output of dopamine will help you to learn,” says Dolan. “But it will also have an impact on your subjective state of happiness.”
Dabney W, Kurth-Nelson Z. Dopamine and temporal difference learning: A fruitful relationship between neuroscience and AI
Dabney W, Kurth-Nelson Z, Uchida N. et al. A distributional code for value in dopamine-based reinforcement learning. Nature 2020; 577: 671–675
Mirenowicz J, Schultz W. Importance of Unpredictability for Reward Responses in Primate Dopamine Neurons. J Neurophysiol. 1994 Aug 1; 72(2):1024-7.
Montague PR, Dayan P, Sejnowski TJ. A framework for mesencephalic dopamine systems based on predictive Hebbian learning. J Neurosci. 1996 Mar 1; 16(5):1936-47.
Ljungberg T, Apicella P, Schultz W. Responses of monkey dopamine neurons during learning of behavioral reactions. J Neurophysiol. 1992 Jan; 67(1):145-63.
Humphries M. Why does the brain have a reward prediction error? Medium 2019.
Pessiglione M, Seymour B, Flandin G, Dolan RJ, Frith CD. Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans. Nature. 2006;442(7106):1042-1045. https://www.nature.com/articles/nature05051
Rutledge RB, Moutoussis M, Smittenaar P, et al. Association of Neural and Emotional Impacts of Reward Prediction Errors With Major Depression. JAMA Psychiatry. 2017;74(8):790–797. https://jamanetwork.com/journals/jamapsychiatry/fullarticle/2635343