A new computer model learns to recognize and create handwritten characters just as well as people, and can even invent new characters that look correct to the human eye.
The model learned to recognize 1,600 types of handwritten characters in 50 alphabets including Latin, Greek and Sanskrit. To build it, researchers used a new framework called Bayesian Program Learning that imitates the way that humans learn.
“The ultimate learning machines are humans,” says Ruslan Salakhutdinov (University of Toronto), a senior fellow in CIFAR’s program in Learning in Machines & Brains (formerly known as Neural Computation & Adaptive Perception). He co-authored the paper in Science describing the new research, along with Joshua Tenenbaum (Massachusetts Institute of Technology) and lead author Brenden Lake (New York University).
Humans can see an example of something — say a car — only once, and immediately learn a great deal about it. They can classify it in a category of objects they already know, such as bikes and motorcycles, draw what it looks like and understand its components, such as wheels and doors. By contrast, it takes hundreds or thousands of examples for a computer to learn to recognize and classify an object in an image.
Lake, who is a cognitive scientist and a computer scientist, noticed that humans were also excellent at learning and reproducing hand-written characters with only one example, even when the characters were in an unfamiliar script such as Tibetan. He found people could also invent new characters with similar styles easily.
“People are very creative. It’s very hard for machines to do that,” says Salakhutdinov. Bayesian Program Learning tries to mimic that ability, in part by incorporating prior knowledge in order to learn new concepts. For example, the model can learn that in handwriting people tend to write in short, unbroken strokes. Building on that knowledge, it learns to recognize new concepts about writing in other languages and generate new examples.
For instance, the model might first learn only how to generate a simple stroke, Then it can build on that ability to combine strokes. Based on what it learns, the model generates hypotheses about what characters look like and how each stroke leads to the next.
The approach drastically shortens the number of examples needed to learn a hand-written character.
The researchers also tested the model’s knowledge by comparing its outputs to humans’ and asking judges to tell them apart. Three quarters of the judges had difficulty telling the difference between the computer-generated characters and the human-drawn ones.
Salakhutdinov was a PhD student of CIFAR Distinguished Fellow Geoffrey Hinton (University of Toronto). They co-authored a seminal machine learning paper a decade ago describing an algorithm that learned handwritten numerals through thousands of training examples. That model used an approach called deep learning, which is now in use by major companies such as Google, Microsoft and Baidu.
The new model performed better than recent deep learning models, but Salakhutdinov says there could be ways of integrating them to improve performance even more. The main difference is that deep learning is “data hungry,” he says, requiring thousands of examples, whereas Bayesian Program Learning tries to learn from prior knowledge and incorporate creativity.
He says applications could include speech recognition that is more human. Current speech recognition models struggle to learn rare words. Salakhutdinov gives the example of the word “Darth Vader.” The first time a person hears it, they may learn some context about where it comes from, but after that they will recognize it and connect it with Star Wars. An algorithm that could learn like that would have a much better vocabulary.
“Imagine that you are building a robot that can roll around and learn tens of thousands of different concepts. It needs to learn just like kids do — the first time they see something, they learn it.”
Just as kids quickly learn what a high five means, a robot built successfully to learn this way could recognize gestures upon first encounter too. So your hypothetical robot could discuss the next Star Wars movie and give you a high five when you buy the tickets.