photo: Barni via VisualHunt
Computers have a new music teacher to help them master the likes of Johann Sebastian Bach and Wolfgang Amadeus Mozart — starting with the basics.
Their teacher, MusicNet, is the first classical music data set of its kind for machine learning algorithms. Algorithms can be trained on hundreds of hours of labelled recordings and automatically learn the features of music, but until now music data sets were often too specific or used songs that weren’t publicly available. This new music collection contains 330 freely-licensed recordings with more than a million labels indicating each note and instrument that is playing it.
Associate Fellow Zaid Harchaoui (University of Washington), with colleagues John Thickstun and Sham Kakade, released MusicNet Nov. 30 to help fellow researchers push music AI forward.
“As soon as we have large labelled music data sets we can design more expressive music features that would lead to better performances,” says Harchaoui, who was attending the Learning in Machines & Brains program meeting in Barcelona.
So far the MusicNet team has tested its ability to teach an algorithm to predict a missing note. Harchaoui says MusicNet is the first step towards models that could generate new music. There are programs with hard-coded composition rules that can generate music, but they do not learn from raw audio recordings.
“During my PhD in Paris I was looking for music data sets to automatically learn music features. The most advanced music feature representations were very smart and sophisticated, but they were also very specific and very tailored for particular instruments and music genres,” Harchaoui says.
The largest commercially available data set focuses on pop music, which can cause problems for learning music from raw audio recordings since pop songs are copyrighted.
MusicNet was inspired by a call to action from Program Co-Director Yann Lecun (New York University). “Deep architectures often require a large amount of labeled data for supervised training, a luxury music informatics has never really enjoyed,” Lecun wrote in a 2012 paper. Since then, there have been successes in image recognition and generation from the ImageNet data set and similar breakthroughs in speech and video recognition.
“An enormous amount of the excitement around artificial intelligence in the last five years has been driven by supervised learning with really big datasets, but it hasn’t been obvious how to label music,” lead author John Thickstun, a doctoral student co-advised by Harchaoui and Kakade, told UW Today.
“You need to be able to say from three seconds and 50 milliseconds to 170 milliseconds, this instrument is playing an A. But that’s impractical or impossible for even an expert musician to track with that degree of accuracy.”
MusicNet can include this level of detail in a large-scale data set. A method known as dynamic time warping allowed researchers to combine the data from musical scores, including information such as notes and instruments, with live recordings. To begin, researchers synthesized a version of the score so that they weren’t comparing apples with oranges. Next, they used time warping to align the two sets of music moving at different speeds. The result was 34 hours of Beethoven, Mozart and Bach complete with labels algorithms can recognize.
This method yields highly accurate labels, but the researchers still checked the hours of aligned recordings systematically. Thankfully, they all say they love classical music.
“I play the piano and I enjoy listening to music. It makes working on the data easier,” Harchaoui says.
The MusicNet data set and the companion paper “Learning Features of Music from Scratch
” were published Nov.29 and is under review as a conference paper at ICLR 2017.