Search
  • News
  • Artificial Intelligence

Training AI to reason

by Krista Davidson Mar 11 / 20
Pierre-Luc Bacon


Facebook CIFAR AI Chair Pierre-Luc Bacon discovers breakthrough approach, enabling machines to reason and make decisions independently.

Medical devices such as pacemakers and insulin pumps have saved millions of lives, but despite many technological advances, they are limited in their ability to adapt to the body’s changing needs in real-time.

Artificial intelligence (AI) could revolutionize these devices and others by enabling them to predict and adapt to patient needs without human intervention.

Medical devices that can predict the appropriate treatment needed for patients could be made possible thanks to a significant breakthrough in how algorithms can be designed to make decisions in real-time by Pierre-Luc Bacon.

Bacon, a Facebook CIFAR AI Chair affiliated with Mila, proposed the first-ever method for discovering temporally extended actions, also known as options, that could truly scale to large problems. The technique, called the option-critic architecture, enables systems to simplify and optimize decisions in order to achieve an end goal. It allows for a system to discover the best way to solve tasks.

By using this method researchers were able to guarantee that an algorithm could deliver the desired results without human intervention, while continuing to learn and adapt to different situations. 

For the past seven years, Bacon has been working in reinforcement learning, specifically the area of temporal abstraction, which enables systems to temporally abstract predictions and set goals for themselves in order to continually and efficiently learn on their own.  They do this by reasonsing on a high level while carrying out small decisions. 

He uses a city’s subway as an example of temporal abstraction. “The subway will take you where you want to go in just a few stops. Whereas if you were driving, you’d encounter dozens of smaller details and complications at the street level which compound your decision-making. The subway makes the problem much simpler and easier to predict and to make decisions that will help you achieve your end goal,” he says.

Bacon’s work in the field earned him the Outstanding Student Paper Award at the 32nd Annual AAI Conference on Artificial Intelligence (AAAI). “It was the first time we were able to come up with a way of discovering temporal abstraction without having to specify them in advance. That’s something we’ve not been able to do for the past twenty years. This was the first time we were able to propose a solution,” he says.

It’s a difficult problem because researchers are trying to develop decision-making systems without knowing what those good decisions are in advance.

“Rather than specifying whether to take one action over another, we need to define a reward function that indicates whether or not an action is desirable in the present,” he explains.

Another challenge is developing algorithms that require little or less interaction with their environment.

“Algorithms develop good reflexes through repetition, but they still have difficulty learning and reasoning in new situations. For that, we need systems that are able to understand how the world around them works and predict the consequences of their actions,” Bacon says.

This approach has a range of applications - from programming and customizing pacemakers and insulin pumps to the needs of patients, to enabling more fluid movement and control of prosthetic limbs, even to autonomous driving.

As a Facebook CIFAR AI Chair, Bacon has taken up his first faculty position in Canada as an assistant professor with the department of computer science and operations research (DIRO) at the Université de Montréal.  He completed his PhD at McGill University and worked at the Stanford AI for Human Impact Lab, studying learning over long horizons, off-policy and temporal abstractions.

His goal is to build systems that are robust, reliable and intuitive. 

“There’s a lot of work to be done to bring our systems closer to the people, to make them more robust and systems that we can trust, but it’s exciting that we haven’t solved everything.”



The Canada CIFAR AI Chairs Program is the cornerstone program of the CIFAR Pan-Canadian AI Strategy. A total of $86.5 million over five years has been earmarked for this program to attract and retain world-leading AI researchers in Canada. The Canada CIFAR AI Chairs that have been announced to-date are conducting research in a range of fields, from machine learning for health, autonomous vehicles, artificial neural networks, climate change and more.