Home / Technology / Reinforcement Learning Pioneers Honored With ACM Turing Prize

Reinforcement Learning Pioneers Honored With ACM Turing Prize

March 6, 2025

Two researchers’ early theoretical work on reinforcement learning was recognized Wednesday, as the Association for Computing Machinery named researchers Andrew G. Barto and Richard S. Sutton as the winners of the 2024 ACM A.M. Turing Award.

ACM

Both researchers were crucial in developing the conceptual and algorithmic foundations of reinforcement learning, a bedrock of current AI-based agent technologies.

They will collectively carry off a $1 million prize (courtesy of Google) for their labors.

The ACM A.M. Turing Award is often known as the “Nobel Prize in Computing,” and is named after Alan M. Turing, the British mathematician who articulated the mathematical foundations of computing, as well as coined the Turing Test, a thought experiment (and current benchmark) for evaluating whether a machine has achieved human-like intelligent behavior.

So this year’s award is quite apropos to its namesake.

“In a 1947 lecture, Alan Turing stated ‘What we want is a machine that can learn from experience,’” noted Jeff Dean, Google’s Chief Scientist for Google DeepMind, in a statement. “Reinforcement learning, as pioneered by Barto and Sutton, directly answers Turing’s challenge. Their work has been a lynchpin of progress in AI over the last several decades.”

Barto is Professor Emeritus of Information and Computer Sciences at the University of Massachusetts, Amherst. Sutton is a Professor of Computer Science at the University of Alberta, as well as a research scientist at Keen Technologies (“John Carmack’s AGI Effort”), and a fellow at the Alberta Machine Intelligence Institute.

Full Agency

Reinforcement Learning book cover

Reinforcement learning, inspired by ideas in neuroscience and even psychology, formed the basis of Agentic AI, or the basis of computer entities that perceive and act, preferably acting in a way that fulfills the intent of users. To do this, agents rely on “rewards,” or feedback on the quality of their behavior,

Barto and Sutton developed many of the basics of reinforcement learning, and shared their learning in the seminal 1998 textbook “Reinforcement Learning: An Introduction.”

The work built on Markov Decision Processes (MDPs), wherein an agent makes decisions in a random environment, and gets a reward signal after each action, with the goal of maximizing its rewards.

MDP assumed that the agent knew about its environs. Reinforcement learning took the next step and assumed agents knew nothing about the environment or its rewards.

“The minimal information requirements of reinforcement learning, combined with the generality of the MDP framework, allows reinforcement learning algorithms to be applied to a vast range of problems,” The ACM announcement summarized.

The duo were the first to discover that neural networks can represent learned functions and that agents could combine learning and planning. Acquiring knowledge of the environment could then be the basis for planning.

Some of the other techniques the duo pioneered — working with each other or other researchers — include temporal difference learning, which helped solve reward prediction problems, and policy-gradient methods to address those high-dimensional action spaces where reinforcement learning falls short.

Successful Applications

Reinforcement Learning got its first big win beating best human Go players in 2016 and 2017, via the AlphaGo computer program.

AI systems descended from AlphaGo have been adapted to tackle other problems. In 2022, researchers used one such system to discover new algorithms for a fundamental mathematical task called matrix multiplication. 4/6 https://t.co/9Yku0j8C6H pic.twitter.com/pjpeBczc1M

— Quanta Magazine (@QuantaMagazine) March 5, 2025

OpenAI’s ChatGPT also owes its success to reinforcement learning. According to ACM, to train its large language models, the service uses a technique called reinforcement learning from human feedback (RLHF) to capture human expectations.

The post Reinforcement Learning Pioneers Honored With ACM Turing Prize appeared first on The New Stack.

Kubefeeds Team

A dedicated and highly skilled team at Kubefeeds, driven by a passion for Kubernetes and Cloud-Native technologies, delivering innovative solutions with expertise and enthusiasm.

Reinforcement Learning Pioneers Honored With ACM Turing Prize

Full Agency

Successful Applications

KubeCon + CloudNativeCon Europe 2025 co-located event deep dive: OpenFeature Summit

How a DevEx Initiative Aims To Save 500,000 Developer Hours

Reinforcement Learning Pioneers Honored With ACM Turing Prize

Full Agency

Successful Applications

KubeCon + CloudNativeCon Europe 2025 co-located event deep dive: OpenFeature Summit

How a DevEx Initiative Aims To Save 500,000 Developer Hours

Related Posts

Understanding the Purpose of Docker & Kubernetes Through a Simple ...

Kubernetes Myth #12: K8s Service Account Pulls Images

Cloud Native Computing Foundation Announces Line-up for KubeCon + ...