Search
  • News
  • Artificial Intelligence

Outsmarting humans at their own game

by Krista Davidson Dec 9 / 19

In a game that has more possible configurations than the number of atoms in the universe, how do you predict the winning move?

Csaba Szepesvari
Photo courtesy of Csaba Szepesvári


Csaba Szepesvári is one of the brains behind an algorithm that helped a computer program accomplish the mathematically difficult task of outwitting professional human player, Lee Sedol, in the ancient game of Go.

He joins the ranks of other prestigious researchers as a Canada CIFAR AI Chair, a cornerstone program under the Pan-Canadian AI Strategy. The program provides dedicated research funding for Canada’s leading AI researchers. It means Szepesvári will continue his groundbreaking research in the area of reinforcement learning as a fellow at the Alberta Machine Intelligence Institute (Amii), a professor with the department of computing science at the University of Alberta, and a senior staff research scientist at DeepMind.

Using AI to outsmart humans at GO

Szepesvári’s research has had an influential impact on the development of the popular AI techniques - Monte Carlo tree search and bandit algorithms. His work has been successful in helping computers outsmart professional human players in the game of Go, a board game similar to chess, which requires a long sequence of strategic steps between opponents.

The term Monte Carlo tree search, which was coined by one of his colleagues, Rémi Coulom in 2006, is an algorithm which uses a randomized traversal of possible ways of continuing a game to predict the winning moves.

 “I’ve always been interested in intelligence and creating intelligent agents. I thought the framework provided by reinforcement learning was a perfect fit for modeling intelligence.”

In the same year, Szepesvári along with Levente Kocsis, developed an algorithm where they refined the initial algorithm by adjusting the value predictions using upper confidence bounds (also known as UCT) to remove the inconsistency of the initial version of the algorithm. A variation of this modification was critical to the success of Google DeepMind’s AlphaGo and AlphaZero computer programs in defeating a human professional player, becoming the first computer program to accomplish that feat in October 2015.

“DeepMind's achievement took the world, and even the experts in the field, by surprise. It is a wonderful demonstration of the power of reinforcement learning algorithms combined with search. I am very happy to have witnessed this milestone event,” says Szepesvári .

The Monte Carlo tree search algorithm uses randomness for deterministic problems that are difficult or impossible to solve using other approaches. It relies on the exploration-exploitation approach: the exploration of possible moves or steps, and the exploitation of the path with the greatest reward. The purest example where the exploration-exploitation dilemma arises is known as bandit problems, an area that has come of great interest and expertise of Szepesvári’s.

His contributions to bandit-based Monte Carlo planning have led to him and Levente Kocsis receiving the Test of Time Award at the ECML/PKDD 2016 international conference, the leading European machine learning and data mining conferences.

World-class talent remains in Canada

Szepesvári was introduced to reinforcement learning as a PhD student. “I’ve always been interested in intelligence and creating intelligent agents. I thought the framework provided by reinforcement learning was a perfect fit for modeling intelligence,” says Szepesvári.

Szepesvári, who comes from Hungary and completed his PhD at the Attila József University (Hungary), has been with the University of Alberta since 2006.

“It’s not an overstatement to say that Canada is a leader in reinforcement learning. Machine learning can contribute to many positive changes in the world, but we have a better chance of doing so if we address the challenges that arise in reinforcement learning."

He is widely considered an established expert on the convergence of reinforcement learning algorithms, Monte Carlo tree search, and exploration in bandit problems. His contributions to the field led him to joining DeepMind in 2017, where he leads the Foundations team. Having authored more than 140 conference publications, 40 journal publications and three books, he has contributed significantly to the field of reinforcement learning.

Szepesvári is the author of three books: Performance of Nonlinear Approximate Adaptive Control (Published by Wiley, 2003) addresses theoretical guarantees on the performance of adaptive control designs. Algorithms for Reinforcement Learning (published by Morgan & Claypool Publishers in 2010) addresses the fundamental theoretical and algorithmic issues of reinforcement learning, and is considered required reading for researchers new to the field.

A third book, Bandit Algorithms, is in the works for early 2020. Co-authored by Tor Lattimore, it will be published by Cambridge University Press.

His Canada CIFAR AI Chair award means he will continue to conduct groundbreaking research in Canada.

“It’s not an overstatement to say that Canada is a leader in reinforcement learning. Machine learning can contribute to many positive changes in the world, but we have a better chance of doing so if we address the challenges that arise in reinforcement learning. It’s all about learning from feedback through interaction with an environment,” says Szepesvári.

“There are many exciting new developments in the field of reinforcement learning. It’s a good time to lead the way with new advances.”


The Canada CIFAR AI Chairs Program is the cornerstone program of the CIFAR Pan-Canadian AI Strategy. A total of $86.5 million over five years has been earmarked for this program to attract and retain world-leading AI researchers in Canada. The Canada CIFAR AI Chairs that have been announced to-date are conducting research in a range of fields, from machine learning for health, autonomous vehicles, artificial neural networks, climate change and more.