Artificial Intelligence Goes to the Arcade

February 25, 2015

A shaky video, recorded with a mobile phone and smuggled out of the inaugural First Day of Tomorrow technology conference, in April, 2014, shows an artificially intelligent computer program in its first encounter with Breakout, the classic Atari arcade game. Prototyped in 1975 by Steve Wozniak, the co-founder of Apple, with assistance from Steve Jobs, the other co-founder of Apple, Breakout is a variant of Pong, in which a player knocks bricks from a wall by hitting a ball against it. After half an hour of play, the A.I. program is doing about as well as I would, which is to say not very—but it is trying to move its paddle toward the ball, apparently grasping the rudiments of the game. After another thirty minutes, two hundred rounds in, the A.I. has become a talented amateur: it misses the ball only every third or fourth time. The audience laughs; isn’t this cool?

Then something happens. By the three hundredth game, the A.I. has stopped missing the ball. The auditorium begins to buzz. Demis Hassabis, the program’s creator, advances to the next clip in his video presentation. The A.I. uses four quick rebounds to drill a hole through the left-hand side of the wall above it. Then it executes a killer bank shot, driving the ball into the hole and through to the other side, where it ricochets back and forth, destroying the entire wall from within. Now there are exclamations, applause, and shocked whispers from the crowd. Hours after encountering its first video game, and without any human coaching, the A.I. has not only become better than any human player but has also discovered a way to win that its creator never imagined.

Today, in a paper published in Nature, Hassabis and his colleagues Volodymyr Mnih, Koray Kavukcuoglu, and David Silver reveal that their A.I. has since achieved the same feat with an angling game (Fishing Derby, 1980), a chicken-crossing-the-road game (Freeway, 1981), an armored-vehicle game (Robot Tank, 1983), a martial-arts game (Kung-Fu Master, 1984), and twenty-five others.* In more than a dozen of them, including Stargunner and Crazy Climber, from 1982, it made the best human efforts look pathetic. The Nature article appears just over a year after Hassabis’s company, DeepMind, made its public début; Google bought the firm for six hundred and fifty million dollars in January, 2014, soon after Hassabis first demonstrated his program’s superhuman gaming abilities, at a machine-learning workshop in a Harrah’s casino on the edge of Lake Tahoe. That program, the DeepMind team now claims, is a “novel artificial agent” that combines two existing forms of brain-inspired machine intelligence: a deep neural network and a reinforcement-learning algorithm.

Deep neural networks rely on layers of connections, known as nodes, to filter raw sensory data into meaningful patterns, just as neurons do in the brain. Apple’s Siri uses such a network to decipher speech, sorting sounds into recognizable chunks before drawing on contextual clues and past experiences to guess at how best to group them into words. Siri’s deductive powers improve (or ought to) every time you speak to her or correct her mistakes. The same technique can be applied to decoding images. To a computer with no preëxisting knowledge of brick walls or kung fu, the pixel data that it receives from an Atari game is meaningless. Rather than staring uncomprehendingly at the noise, however, a program like DeepMind’s will start analyzing those pixels—sorting them by color, finding edges and patterns, and gradually developing an ability to recognize complex shapes and the ways in which they fit together.

The program’s second, complementary form of intelligence—reinforcement learning—allows for a kind of unsupervised obedience training. DeepMind’s A.I. starts each game like an unhousebroken puppy. It is programmed to find a score rewarding, but is given no instruction in how to obtain that reward. Its first moves are random, made in ignorance of the game’s underlying logic. Some are rewarded with a treat—a score—and some are not. Buried in the DeepMind code, however, is an algorithm that allows the juvenile A.I. to analyze its previous performance, decipher which actions led to better scores, and change its future behavior accordingly. Combined with the deep neural network, this gives the program more or less the qualities of a good human gamer: the ability to interpret the screen, a knack for learning from past mistakes, and an overwhelming drive to win.

Whipping humanity’s ass at Fishing Derby may not seem like a particularly noteworthy achievement for artificial intelligence—nearly two decades ago, after all, I.B.M.’s Deep Blue computer beat Garry Kasparov, a chess grandmaster, at his own more intellectually aspirational game—but according to Zachary Mason, a novelist and computer scientist, it actually is. Chess, he noted, has an extremely limited “feature space”; the only information that Deep Blue needed to consider was the positions of the pieces on the board, during a span of not much more than a hundred turns. It could play to its strengths of perfect memory and brute-force computing power. But in an Atari game, Mason said, “there’s a byte or so of information per pixel” and hundreds of thousands of turns, which adds up to much more and much messier data for the DeepMind A.I. to process. In this sense, a game like Crazy Climber is a closer analogue to the real world than chess is, and in the real world humans still have the edge. Moreover, whereas Deep Blue was highly specialized, and preprogrammed by human grandmasters with a library of moves and rules, DeepMind is able to use the same all-purpose code for a wide array of games.

That adaptability holds promise. Hassabis has begun partnering with satellite operators and financial institutions to see whether his A.I. could eventually “play” their data sets, perhaps learning to make weather predictions or trade oil futures. In the short term, though, his team has a more modest next step in mind: to design a program that can play video games from the nineteen-nineties. Hassabis, who began working as a game designer in 1994, at the age of seventeen, and whose first project was the Golden Joystick-winning Theme Park, in which players got ahead by, among other things, hiring restroom-maintenance crews and oversalting snacks in order to boost beverage sales, is well aware that DeepMind’s current system, despite being state of the art, is at least five years away from being a decade behind the gaming curve. Indeed, the handful of games in which DeepMind’s A.I. failed to achieve human-level performance were the ones that required longer-term planning or more sophisticated pathfinding—Ms. Pac-Man, Private Eye, and Montezuma’s Revenge. One solution, Hassabis suggested, would be to make the A.I. bolder in its decision-making, and more willing to take risks. Because of the rote reinforcement learning, he said, “it’s overexploiting the knowledge that it already knows.”

In the longer term, after DeepMind has worked its way through Warcraft, StarCraft, and the rest of the Blizzard Entertainment catalogue, the team’s goal is to build an A.I. system with the capability of a toddler. But this, Hassabi said, they are nowhere near reaching. For one thing, he explained, “toddlers can do transfer learning—they can bring to bear prior knowledge to a new situation.” In other words, a toddler who masters Pong is likely to be immediately good at Breakout, whereas the A.I. has to learn both from scratch. Beyond that challenge lies the much thornier question of whether DeepMind’s chosen combination of a deep neural network and reinforcement learning could, on its own, ever lead to conceptual cognition—not only a fluency with the mechanics of, say, 2001’s Sub Command but also an understanding of what a submarine, water, or oxygen are. For Hassabis, this is “an open question.”

Zachary Mason is less sanguine. “Their current line of research leads to StarCraft in five or ten years and Call of Duty in maybe twenty, and controllers for drones in live battle spaces in maybe fifty,” he told me. “But it never, ever leads to a toddler.” Most toddlers cannot play chess or StarCraft. But they can interact with the real world in sophisticated ways. “They can find their way across a room,” Mason said. “They can see stuff, and as the light and shadows change they can recognize that it’s still the same stuff. They can understand and manipulate objects in space.” These kinds of tasks—the things that a toddler does with ease but that a machine struggles to grasp—cannot, Mason is convinced, be solved by a program that excels at teaching itself Breakout. They require a model of cognition that is much richer than what Atari, or perhaps any gaming platform, can offer. Hassabis’s algorithm represents a genuine breakthrough, but it is one that reinforces just how much distance remains between artificial intelligence and the human mind.

*Correction: An earlier version of this post mischaracterized the video game Freeway.

Daily