Two researchers’ early theoretical work on reinforcement learning was recognized Wednesday, as the Association for Computing Machinery named researchers Andrew G. Barto and Richard S. Sutton as the winners of the 2024 ACM A.M. Turing Award.

ACM
Both researchers were crucial in developing the conceptual and algorithmic foundations of reinforcement learning, a bedrock of current AI-based agent technologies.
They will collectively carry off a $1 million prize (courtesy of Google) for their labors.
The ACM A.M. Turing Award is often known as the “Nobel Prize in Computing,” and is named after Alan M. Turing, the British mathematician who articulated the mathematical foundations of computing, as well as coined the Turing Test, a thought experiment (and current benchmark) for evaluating whether a machine has achieved human-like intelligent behavior.
So this year’s award is quite apropos to its namesake.
“In a 1947 lecture, Alan Turing stated ‘What we want is a machine that can learn from experience,’” noted Jeff Dean, Google’s Chief Scientist for Google DeepMind, in a statement. “Reinforcement learning, as pioneered by Barto and Sutton, directly answers Turing’s challenge. Their work has been a lynchpin of progress in AI over the last several decades.”
Barto is Professor Emeritus of Information and Computer Sciences at the University of Massachusetts, Amherst. Sutton is a Professor of Computer Science at the University of Alberta, as well as a research scientist at Keen Technologies (“John Carmack’s AGI Effort”), and a fellow at the Alberta Machine Intelligence Institute.
Full Agency
Reinforcement learning, inspired by ideas in neuroscience and even psychology, formed the basis of Agentic AI, or the basis of computer entities that perceive and act, preferably acting in a way that fulfills the intent of users. To do this, agents rely on “rewards,” or feedback on the quality of their behavior,
Barto and Sutton developed many of the basics of reinforcement learning, and shared their learning in the seminal 1998 textbook “Reinforcement Learning: An Introduction.”
The work built on Markov Decision Processes (MDPs), wherein an agent makes decisions in a random environment, and gets a reward signal after each action, with the goal of maximizing its rewards.
MDP assumed that the agent knew about its environs. Reinforcement learning took the next step and assumed agents knew nothing about the environment or its rewards.
“The minimal information requirements of reinforcement learning, combined with the generality of the MDP framework, allows reinforcement learning algorithms to be applied to a vast range of problems,” The ACM announcement summarized.
The duo were the first to discover that neural networks can represent learned functions and that agents could combine learning and planning. Acquiring knowledge of the environment could then be the basis for planning.
Some of the other techniques the duo pioneered — working with each other or other researchers — include temporal difference learning, which helped solve reward prediction problems, and policy-gradient methods to address those high-dimensional action spaces where reinforcement learning falls short.
Successful Applications
Reinforcement Learning got its first big win beating best human Go players in 2016 and 2017, via the AlphaGo computer program.
AI systems descended from AlphaGo have been adapted to tackle other problems. In 2022, researchers used one such system to discover new algorithms for a fundamental mathematical task called matrix multiplication. 4/6 https://t.co/9Yku0j8C6H pic.twitter.com/pjpeBczc1M
— Quanta Magazine (@QuantaMagazine) March 5, 2025
OpenAI’s ChatGPT also owes its success to reinforcement learning. According to ACM, to train its large language models, the service uses a technique called reinforcement learning from human feedback (RLHF) to capture human expectations.
The post Reinforcement Learning Pioneers Honored With ACM Turing Prize appeared first on The New Stack.