Photo of Machines learn new ways of learning

Machines learn new ways of learning

by Juanita Bawagan News Learning in Machines & Brains 13.09.2017

Intelligent machines have learned to read and write, recognize images, and predict dangerous mutations. But how does a machine learn to learn in the first place?

The art of ‘learning to learn’ (or meta-learning) is now widely recognized as a cornerstone of artificial intelligence research. Traditionally, computer scientists hand-engineered the learning algorithms. Over the last few years, the idea of using data to learn the learning algorithms has gained momentum — and massive computational resources and datasets have made it possible.

In 2016, Nando de Freitas, a Senior Fellow in CIFAR’s Learning in Machines & Brains program, demonstrated a novel approach to learning to learn. He and his colleagues showed how a learning algorithm can be cast as a neural network, which in turn trains another neural network.

Now, de Freitas has built on this research with two new papers on learning to learn presented at the International Conference on Machine Learning (ICML). Taken together, they demonstrate learned algorithms’ ability to outperform hand-designed algorithms, transfer across tasks and produce better choices.

In the first ICML paper, researchers show that these learned algorithms can solve different and more complex tasks. This is an important development as previous learned learning algorithms were unable to generalize to new problems and required too much memory and computation power to take on larger problems.

De Freitas likens learning algorithms and the algorithms they produce to evolution and creatures like humans.

“More importantly, in some cases they are faster and more efficient than algorithms designed and refined by people over decades,” explains de Freitas (DeepMind). This research was done in collaboration with his colleagues at Google Brain, including Olga Wichrowska and Jascha Sohl-Dickstein, who he met at a CIFAR meeting.

De Freitas likens learning algorithms and the algorithms they produce to evolution and creatures like humans.

“Evolution is a slow learning process that has led to the emergence of animals that are capable of learning at a rapid pace during their lifetime. Babies are born with brain structures that encode core knowledge, including algorithms necessary for learning English, manipulation, tennis, and Peppa Pig’s little brother’s name, provided they grow in suitable environments,” he says.

AI scientists are striving to replicate the process of how our hardwired learning algorithms improved through evolution — only faster. One of the techniques researchers use is gradient descent. Gradient descent is a form of learning by optimization whereby one follows the slope of an error function so as to minimize the number of errors as fast as possible.

“You can think of learning algorithms using a mountain range analogy. At a particular location, the height above sea level indicates how good a solution is; the lower the height the less prediction errors the neural network makes. Here, the solution is parameterised by two coordinates: latitude and longitude. The learning algorithm is therefore a recipe for finding the location with the lowest height above sea level. The recipe prescribes how to change the latitude and longitude.

“In broad strokes, evolution is a recipe whereby one searches for the best location by trying many locations at random. In contrast, what we do with the gradient descent recipe is more like skiing. That is, we start at a random location and look for the slope with the biggest gradient, the double black diamond, and then go in that direction as fast as possible to find the solution with the lowest height or error,” he says.

De Freitas’ 2016 paper on learning to learn used gradient descent for both learning and meta-learning. In the second ICML paper, he and his colleagues train the meta-learner with gradient descent, but the new meta-learner learns algorithms that are capable of learning in the absence of gradients. Gradient descent is effective and has brought about the successes of image and speech recognition. However, the ability to learn without gradients is important in situations where there are no mathematical models, like questions of eliciting preferences from people.

“Our hope is that by letting the data speak, we will learn powerful learning algorithms that we had not thought of with our human intuitions.”

“In some cases, such as trying different product design choices, mathematical models are not available. All we can hope is to repeatedly try different design choices and see how much users like them. The hope however is that in the process of trying a few times we form a memory model of the users’ preferences and reactions, and become better at designing products that meet their needs. The products can be anything, including new medical treatments for disease,” de Freitas says.

Learning how to better learn is a fundamental research question across fields. Psychologists have studied learning to learn for almost a century. In AI, early approaches to meta-learning date back to the 1990s with important foundational work by Learning in Machines & Brains Co-Directors Yann LeCun and Yoshua Bengio and Senior Fellow Max Welling. Discussions with them inspired and informed some of de Freitas’ research. De Freitas is currently exploring issues of scaling, generalization, application to reinforcement learning agents, and using this approach to advance computational neuroscience. 

“Our hope is that by letting the data speak, we will learn powerful learning algorithms that we had not thought of with our human intuitions,” he says.

Learning to Learn without Gradient Descent by Gradient Descent” and “Learned Optimizers that Scale and Generalize” were presented at the International Conference on Machine Learning in Sydney from August 8-11. De Freitas was one of 12 CIFAR Fellows presenting at ICML 2017.

Related Ideas

News | Child & Brain Development

Study finds potential ‘master key’ to brain plasticity

Why is it that even well into adulthood we prefer the music we listened to as young adults? And how...

News | Learning in Machines & Brains

A ‘surprisingly popular’ way to extract group wisdom

Is Philadelphia the capital of Pennsylvania? The answer may surprise you. Even more important, just how many people are surprised...

Video | Institutions, Organizations & Growth

 Institutions & Societal Prosperity

CIFAR has partnered with Research2Reality to capture insights from a variety of our leading fellows. In this video, Daron Acemoglu,...

News | Humans & the Microbiome

Mummy DNA reveals recent history of smallpox

The 17th century Dominican Church of the Holy Spirit is one of the oldest in Vilnius, Lithuania. Tourists and parishioners...

News | Genetic Networks

Researchers uncover thousands of genes that may be implicated in autism

Researchers have identified 2,500 new genes that may be implicated in autism, using a machine learning technique that analyzed the...