The journey towards creating artificial intelligence has been slower and more difficult than early pioneers predicted. Modern computers are truly impressive in many ways.
But they don’t hold a candle to the human brain when it comes to picking a face out of a crowd, understanding a pun, composing a symphony, or any of hundreds of things humans do.
But a new approach pioneered by CIFAR fellows is shaking up the world of artificial intelligence. Over the past decade the fellows have championed a technique called deep learning and made it one of the hottest areas in artificial intelligence.
If you use the voice recognition feature on an Android phone you already benefit from deep learning networks, which improved voice recognition by 25 per cent over the best existing techniques. A similar improvement in image recognition led Google to implement deep learning techniques on their Google+ service last year.
In fact, the Internet giants are happily snatching up CIFAR fellows to work on their artificial intelligence efforts. Last year Google recruited Geoffrey Hinton, until recently director of CIFAR’s Learning in Machines & Brains program (formerly known as Neural Computation & Adaptive Perception), to work in its artificial intelligence laboratory. Shortly after, Facebook followed suit, hiring Senior Fellow Yann LeCun to set up a new artificial intelligence laboratory for them. And since 2011 Senior Fellow Andrew Ng has been at Google, where he developed the Google Brain neural network.
“NCAP was crucial,” says Hinton. “The fundamental idea of CIFAR, which is to get the best people and put them in contact where they can exchange ideas, worked really well.”
The digital giants are interested in deep learning because the technique promises to allow their computers to sift through millions of photos and videos and describe them as accurately as any human could; or to understand natural language beyond the level of simple keyword searching; or, perhaps, to make better predictions about which ads we’re likely to click on.
“The interest of large companies in artificial intelligence is really focused today on deep learning,” says LeCun. “And deep learning was basically a CIFAR-funded conspiracy.”
From pong to neurons
From the most primitive game of Pong to the most sophisticated supercomputer climate model, conventional computer programs consist of precisely-written, step-by-step instructions that have to be carried out exactly as written. Although these programs can be fantastically complex, they consist of steps written down by a human programmer who had to figure out just what he wanted the program to do and how he wanted it done.
But as early as the 1950s, some computer scientists became interested in another direction. They began to experiment with artificial neural networks, loosely modelled on the workings of the human brain. Rather than being programmed, these networks were trained, learning from experience to arrive at the right answer.
It seemed like a good idea. Consider all of the ways a picture of a cat can look. It can be in different colours, taken from different angles, by itself or in the frame with other objects or animals, etc. The neural network between our ears does a great job of picking out cats. But how do you write an algorithm describing a step-by-step process of recognizing a cat?
The promise of neural networks was that you wouldn’t have to. You could simply show the neural network a lot of pictures of cats and let it learn what they looked like.
There was a lot of interest, but there were also a lot of problems. For one thing, the networks weren’t always easy to train. You had to collect a lot of data and label it – picture a team of grad students getting together hundreds or thousands of photos and making sure the ones labeled “cat” really had a cat, and the ones labeled “not cat” really didn’t. Then after you’d trained the network, you had to use even more examples to make sure the network worked on photos it hadn’t been trained on.
Neural nets could be tough to train for other reasons. When they failed to work it could be hard to figure out why. Or they would seem to work, only to reveal later that they had been “overtrained” – they might have only learned to recognize some other common feature, like a random pattern of pixels that the particular collection of cat photos accidentally had in common.
After a surge of research in the 1980s, interest in neural nets had largely fallen away by the 1990s, to be replaced by other forms of machine learning. In fact, one respected journal was said to have stopped considering any paper with the term “neural network” in the title.
But Hinton didn’t give up on neural nets. He says it made sense to look towards the workings of the human brain to figure out better ways of achieving machine learning. After all, if we want to teach computers to perceive things the way we do, why not use a model that has been shaped by evolution to do just that? He wasn’t put off by the consensus of the field – he grew up as an atheist in a Christian school, and was used to trusting his own beliefs.
“When people said it’s irrelevant how the brain works, they were just utterly and obviously wrong,” Hinton says.
In 2004 Hinton and a number of other researchers including Senior Fellow Yoshua Bengio (McGill) formed CIFAR’s NCAP program and immediately began discussing how to make neural networks work better.
“It was a matter of time, but we had to convince the community that it was worth the effort to work on this,” LeCun says.
The community finally began to sit up and take notice in 2006 when Hinton and colleagues published a paper called “A fast learning algorithm for deep belief nets” in the journal Neural Computation. The paper described a new way to design better “deep” neural networks – that is, neural networks with three or more “hidden” layers between the input layer and the output layer.
The new technique trained one layer of the network at a time. Neurons in the first layer would learn to represent some feature of the data, for instance to distinguish a horizontal line. When the first layer had learned something, the data would be passed on to the next layer, which would learn to represent some other feature of the data – perhaps combining two or more shapes to learn to recognize an eyebrow. The next layer might contain a neuron that recognized the combination of an eyebrow and an eye. Essentially, each higher layer of the network would learn to operate at a higher and higher level of abstraction.
Even more cats
Possibly even more exciting was that Hinton’s paper showed that these networks didn’t need to be supervised. You could set them loose on an unlabeled collection of images, and they could learn to recognize relevant features for themselves. After the initial learning was done you could come along and fine tune the process and add labels, telling the network that this image was a car, this one an airplane, etc.
“If you think about how babies learn,” says LeCun, “they learn by themselves the notion of objects, the properties of objects, without being told specifically what those objects are. It’s only later that we give names to the objects. So most of the learning takes place in an unsupervised manner.”
In fact, last year Andrew Ng made news with a Google network that did this on a giant scale. Working with colleague Jeff Dean and the Google “brain team,” he created a deep learning network composed of 16,000 computer processors, and fed it 10 million images extracted at random from the Internet, with no labels. The network taught itself to recognize human faces, human bodies – and yes, cats.
Hinton’s 2006 paper created a groundswell of interest in neural networks, and researchers began working on them again.
Hinton says that part of the recent success of neural networks comes from the huge advances in both computing power and availability of data. Neural nets that were too complex to be practical on older, slower processors hummed along perfectly on modern work stations. And bigger computer memories and the Internet made it much easier to get big data sets to train the networks on.
“The main issue was that there wasn’t enough data and the computers weren’t fast enough,” Hinton says. “As soon as we got 1,000 times the data and computers a million times as fast, neural networks started beating everything else.”
What lies ahead
The NCAP program was created with the dual interest of figuring out how neurons can be used in computation, and how computers can be made to perceive patterns. In many ways, Hinton says, vision is a perfect problem for machine learning. We actually know a lot about how the brain processes vision, compared, for instance, to how it processes language.
As he continues his research, he sees two major challenges. First, he wants to push forward on unsupervised learning. Second, he has to tackle the problem of how to make neural networks work at larger and larger scales.
First is the unsupervised learning problem. Although Hinton’s 2006 paper gained a lot of attention because of the promise of unsupervised learning, once researchers dusted off the old neural nets and started running them on modern computers with big data sets, they realized that the old techniques worked well enough. Most of the neural net applications in use now were actually trained with labeled data. Nevertheless, supervised learning still has limitations.
“What we really want is something that will be like a person and will just understand the world. And there, unsupervised learning will be crucial,” Hinton says.
“For example, we’d like to be able to understand every video on Youtube. It would be nice if you could say, find me a video of a cat trying to jump on a shelf and falling. A person would understand exactly what you were saying. Right now, maybe machine learning methods could say there’s probably a cat in this video. They might even be able to say there’s a shelf. But the idea of a cat trying to jump on a shelf and failing, they don’t understand that. My prediction is that over the next five years, we’ll be able to understand that.”
The other problem is how to make neural networks “scale” – that is, how to make them work efficiently as they get bigger and bigger. Right now, Hinton says, the computing power you need is roughly the square of the speed increase you want. In other words, twice as much speed requires four times the computing power; 10 times the speed requires 100 times the computing power.
Hinton will split his time between the University of Toronto and Google, spending four months of the year at the corporation’s headquarters in Mountain View, as well as working in its Toronto office. He says he’s looking forward to the resources Google has – especially the data – as well as the researchers.
“They’ve got really smart people there, and they’ve got very interesting problems there. It’s quite nice when you do something for it to be in a billion Android phones.”
At Facebook, LeCun will have a similar situation. He’ll continue to teach at NYU, and will be able to set up the Facebook artificial intelligence lab practically across the street from his campus office. “The nice thing about Facebook is that if you come up with a better way of understanding natural language, or image recognition, its not like you have to create a whole business around it. It’s just a matter of setting it up for 1.3 billion users.”
Although Hinton has stepped down as director of NCAP, LeCun and Bengio have stepped up as co-directors of the program. LeCun says the program will continue to explore improvements in deep learning networks.
“There’s no question in my mind that the problem of learning representations of the world will have to be solved by an AI system we build. Deep learning is the only answer we have to that problem now,” he says.