How an intern helped build the AI that shook the world

AlphaGo’s victory braodcast on TV

Im Hun-jung/Yonhap/AP Photo via Getty Images

In March 2016, Google DeepMind’s artificial intelligence system AlphaGo shocked the world. In a stunning five-match series of Go, the ancient Chinese board game, the AI beat the world’s best player, Lee Sedol – a moment that was televised in front of millions and hailed by many as a historic moment in the development of artificial intelligence.

Chris Maddison, now a professor of artificial intelligence at the University of Toronto, was then a master’s student and helped get the project off the ground. It all began when Ilya Sutskever, who later went on to found OpenAI, got in touch…

Alex Wilkins: How did the idea for AlphaGo first come about?

Chris Maddison: Ilya [Sutskever] gave me the following argument for why we should be working on Go. He said, Chris, do you think when an expert player looks at the Go board, they can pick the best move in half a second? If you think they can, then that means that you can learn a pretty good policy to pick the best move using a neural net.

The reason is that half a second is about the time it takes for your visual cortex to do one forward pass [a round of processing], and we already knew from ImageNET [an important AI image-recognition competition] that we’re pretty good at approximating things that only take one forward pass of your visual cortex.

I bought that argument, so I decided to join [Google Brain] as an intern in the summer of 2014.

How did AlphaGo develop from there?

When I joined, there was another little team at DeepMind that I was going to work with, which was Aja Huang and David Silver, that had started working on Go. It was basically my charge to start building the neural networks. It was a dream.

There were a bunch of different approaches that we tried, and a lot of the initial things we tried failed. Eventually, I just got frustrated and tried the dumbest, simplest thing, which was to try to predict the next move that an expert would make in a given board position, training a neural network on a big corpus of expert games. And that turned out to be the approach that really got us off the ground.

By the end of the summer, we hosted a little match with DeepMind’s Thore Graepel, who considered himself a decent Go player, and my networks beat him. DeepMind then started to be convinced that this was going to be a real thing and started putting resources towards it and building a big team around it.

How difficult of a challenge was it seen beating Lee Sedol?

I remember in the summer of 2014, we practically had Lee Sedol’s portrait on our desk next to us. I’m not a Go player, but Aja [Huang] is. Every time I would build a new network, it would get a little bit better, and I would turn to Aja and I’d say, OK, we’re a little bit better, how close are we to Lee Sedol? And Aja would turn to me and say, Chris, you don’t understand. Lee Sedol is one stone from God.

You left the AlphaGo team before the big event. Why?

David [Silver] said we’d like to keep you on and really drive this project to the next level, and, in retrospect, this was maybe one of the stupider decisions I made, I turned him down. I said I think I need to focus on my PhD, I’m an academic at heart. I went back to my PhD and loosely consulted with the project from that point on. I’m a little proud to say it took them a while to beat my neural networks. But then, ultimately, the artefact that played Lee Sedol was the product of a big engineering effort and a big team.

What was the atmosphere like in Seoul when AlphaGo won?

Being there in Seoul at that moment was hard to express. It was emotional. It was intense. There was a sense of anxiety. You go in confident, but you never know. It’s like a sports game. Statistically speaking, you’re the better player, but you never know how it’s going to shake out. I remember being in the hotel where we played the matches and looking out the window. We were at a high-enough level that you could look out onto one of the major city intersections. I realised there was a big screen, sort of like Times Square, that was showing our match. And then I looked along the sidewalks, and people were just lined up standing looking at the screen. I had heard numbers like hundreds of millions of people in China watched the first game, but I remember that moment as like, oh God, we’ve really stopped East Asia in its tracks.

How important has AlphaGo been for AI more generally?

A lot has changed on a surface level about the world of large language models (LLMs), they are now quite different in some ways from AlphaGo, but actually there’s an underlying technological thread that really hasn’t changed.

So the first part of the algorithm is to train a neural network to predict the next move. Today’s LLMs begin with what we call pretraining to predict the next word, from a big corpus of human text found largely on the internet.

For the second step in AlphaGo, we took the information from that human corpus that was compressed into these neural networks, and we refined it using reinforcement learning, to align the behaviour of the system towards the goal of winning games.

When you learn to predict an expert’s next move, they are trying to win, but that’s not the only thing that explains the next move. Perhaps they don’t understand what the best move is, perhaps they made a mistake, so you need to align the overall system with your true goal, which in the case of AlphaGo was winning.

In large language models, it’s the same after pretraining. The networks are not aligned with how we want to use them, and so we do a series of reinforcement learning steps that align the networks with our goals.

In some ways, not much has changed.

Does it tell us anything about where we can expect AIs to succeed?

It has consequences in terms of what we choose to focus on. If you’re worried about making progress on important problems, the key bottlenecks that you should be worried about are do you have enough data to do pretraining, and do you have reward signals to do post-training. If you don’t have those ingredients, there’s no amount of clever – you know, this algorithm versus that algorithm – that’s going to get you off the ground.

Did you feel any sympathy for Lee Sedol?

Lee Sedol had been this idol over the summer of 2014, this unachievable milestone. To then suddenly be there in person, watching the matches, his stress, his anxiety, his realisation that this was a much worthier opponent than maybe he had thought going in, that was very stressful. You don’t want to put someone in that position. When he lost the match, he apologised to humanity, and said, “This is my failing, not yours.” That was tragic.

There is also a custom in Go to review the match with your opponent. Someone wins or loses, but you review the match at the end, unwind the game and explore variations with each other. Lee Sedol couldn’t do that because AlphaGo wasn’t human, so instead he had his friends come in and review the match, but it’s just not the same. There felt something heartbreaking about that.

But I didn’t appreciate all the man-versus-machine narratives around the match, because a team of people built AlphaGo. That was the effort of a tribe building an artefact that could achieve excellence in a human game. It was ultimately the artefact that all our blood, sweat and tears went into.

Do you think there is still a place for humans in the world as AI accomplishes more human thinking work?

We are learning more about the game of Go, and if we think that game is beautiful, which we do, and AIs can teach us more about that beauty, there’s a lot of inherent good in that as well. There’s a difference between goals and purposes. The goal of the game of Go is to win, but that’s not its only purpose – one purpose is to have fun. Board games are not destroyed by the presence of AI; chess is a thriving industry. We still appreciate the intrigue and the human achievement of that sport.

Topics:

Source link

How a Futurama plotline led to a totally new math proof

Rare double supernova discovered hiding in Jellyfish Nebula

This tiny, forgotten organ could help us live longer, healthier

A single aspen clone in Utah called Pando covers 106 acres, weighs about 6,000 tonnes, and shares one root system that makes it a single organism roughly 14,000 years old, quietly dying now from mule deer eating every young shoot before it can grow

A bitter irony sits in the credits of The Wolf of Wall Street: prosecutors say this film about financial fraud was itself part-funded with money looted from Malaysia’s 1MDB fund — fraud, in effect, bankrolling its own movie

Psychology researchers know why The Odyssey transcends time

New Scientist recommends 28 Years Later: The Bone Temple

Netherlands vs Sweden: Gakpo, Brobbey doubles give Dutch 5–1 World Cup win | World Cup 2026 News

Flight cancellations update: American, Delta, and United Airlines issue airport alerts as blizzard batters Northeast

Bird retinas work without oxygen, and now scientists know how

Jalen Brunson sends message to critics at championship parade

Latest Posts

MrBeast Secretly Marries Thea Booysen On Private Island

Canada evaluating ‘all options’ after fresh US tariffs threat

US judge blocks Trump bid to strip work permits from immigrants | Courts News

How an intern helped build the AI that shook the world

Related Posts

Subscribe for Updates