DARPA-funded researchers at the University of Maryland have developed a system that enables robot a robot to interpret, learn and perform the tasks demonstrated in YouTube cooking videos.

Engadget has details on the work led by University of Maryland computer scientist Yiannis Aloimonos:

The team's research is funded by DARPA's Mathematics of Sensing, Exploitation and Execution (MSEE) program, which aims to teach machines not only how to collect data, but also how to act on it. For this particular study, the researchers have developed a system that allowed their test robots to learn from a series of "how-to" cooking videos on YouTube. During testing, the robots were able to perform the tasks shown in the videos using the right utensils and with zero human input.


It makes you wonder what else these bots might be capable of learning from YouTube. Surely, there are some videos that should go unseen by robot eyes. For instance, can we all agree, here and now, that these machines should never be shown this video of human hands titillating a pair of robotic butt cheeks?

I mean, knowledge is power and all that. But we don't want these robots getting any ideas.


Jokes aside, this is pretty cool. The team describes its two biggest contributions in a working paper (which you can access here):

(1) A convolutional neural network (CNN) based method has been adopted to achieve state-of-the-art performance in grasping type classification and object recognition on unconstrained video data; (2) a system for learning information about human manipulation action has been developed that links lower level visual perception and higher level semantic structures through a probabilistic manipulation action grammar.

More on the contemporary applications of convolutional neural networks here.


[DARPA via Engadget]