In a development eerily reminiscent of Skynet’s evolution in The Terminator, the world’s greatest Go-playing computer has itself been thrashed 100-0 by a newer version of itself.
Crucially, the new program, called AlphaGo Zero, learned its skills with zero human training – working out its own style entirely from scratch.
The autodidactic achievement, described in the journal Nature, marks a milestone in artificial intelligence research that could have far reaching impacts beyond the realm of the board game’s 19×19 grid.
“Ultimately, we want to harness algorithmic breakthroughs like this to help solve all sorts of pressing real world problems, like protein-folding or designing new materials,” says Demis Hassabis, co-founder and CEO of UK company DeepMind, which developed the system.
“If we can make the same progress on these problems that we have with AlphaGo, it has the potential to drive forward human understanding and positively impact all of our lives.”
The sophistication of artificial intelligence has been marked for decades by game-playing ability. A computer first beat the world checkers champion in 1994. IBM’s Deep Blue beat chess grandmaster Gary Kasparov in 1997. But Go, called the most complex game that humans play, is another level again and took the guts of another two decades for computers to crack.
Go possesses 10170 possible board configurations. Chess, by comparison, has a puny 10120 possible games. It’s often pointed out that the number 10170 is more than the number of atoms in the universe. It is actually way more than that.
There are only about 1080 atoms in our universe. If every atom in the universe was home to another entire universe, containing the same number of atoms, and you added up all of the atoms in all of these universes … the number of possibilities in a single game of Go is still higher.
Phew.
So, this is why it’s taken so long for computers to surpass humans at the game. There are far too many possibilities to tackle by the brute force processing that computers are naturally good at. And that’s why it was such a big deal when AlphaGo defeated Go champion Lee Sedol by 4-1 last year.
To do this, DeepMind abandoned the brute force approach, and instead programmed AlphaGo to use deep learning, running on neural networks — a style of processing that mimics the connections of the brain.
Deep learning is the process by which a program comes to recognise particular objects after being exposed to multiple depictions of them. It is the same kind of technique that lets Facebook identify photos of your friends. (To do this, in 2012 Google built a massive computer comprising 16,000 processors and instructed it to pick out faces from 20,000 randomly selected YouTube clips. It succeeded more than 80% of the time.)
To decide on a move, AlphaGo ‘looks ahead’, essentially playing out millions of scenarios (“If I go here, then he goes there … “) following branching possibilities and weighing them according to the likelihood of future success. Then it picks the move that seems most promising.
Successful outcomes feed back into the system, strengthening the neural network connections in a way analogous to how you or I learn a new skill.
The version of AlphaGo that scalped Lee Sedol (called AlphaGo Lee) was taught through a kind of supervised learning, where it digested the records of millions of human Go moves, as well as self-play. Its training took several months.
But the new upstart, AlphaGo Zero, is different. It is much more sophisticated in how its neural network is tuned and updated to predict moves and the eventual winner of the games. It also requires much less computing power.
All this means AlphaGo Zero can learn solely from playing itself. The only human input is to tell it the rules of the game.
“AlphaGo Zero becomes its own teacher,” wrote DeepMind researchers recently on the company blog.
This self-learning ability sets Zero apart from other artificial intelligences, which typically need to be fed terabytes of data to learn anything of use. That makes the system far more nimble than other more narrowly focussed computing celebrities, such as the chess-playing Deep Blue, or Watson, the Jeopardy champion.
Over three days of training, the DeepMind team tracked AlphaGo Zero’s fascinating learning process.
The program started out tossing stones on the board at random. After three hours, the system’s strategy was “greedy stone-capturing”, indicative of the human beginner. But by the 70-hour mark, it had developed a mature, sophisticated style.
Along the journey, AlphaGo Zero rediscovered many known corner sequences, or joseki, that players have gradually catalogued over the centuries. Ultimately, it favoured other, previously unknown joseki. These seem to be superior to any worked out by humans.
After three days, Zero was pitted against AlphaGo Lee.
The showdown was a David vs Goliath match-up. AlphaGo Zero, the contender, was running on single machine and just four processors, having been introduced to the game only a few days before. AlphaGo Lee, the champ, was using 12 times more processing power, spread over many machines, and was a veteran of the sport.
Perhaps the ‘zero’ refers to the likely scoreline of its opponent. AlphaGo Zero crushed its predecessor a whopping 100 games to zero. Fortunately, the loser was never programmed to feel humiliation.