That post about the art of the trivia question is still brewing, but I got sidetracked this week by another event in the trivia world. You may have heard about it. Watson, an IBM supercomputer, played two games of Jeopardy! against that show’s most famous champions, and thoroughly trounced the both of them.
A number of friends who watched the match complained that it was boring. If what you were looking for was a tense, movie-like contest with the drama of close scores or a come-from-behind victory, I can certainly see why you’d be disappointed. It had all the drama of the 49ers annihilating the Broncos 55-10 in Super Bowl XXIV. On the other hand, if what you were looking for was a glimpse of the world to come, in the form of a breathtaking technical achievement, this match absolutely delivered the goods.
See, some people tend to think computers are smart, and that of course a computer could beat a human at Jeopardy!, given a sufficiently broad knowledge base for its answers. But really, that’s a case of misplaced signifiers. Many human brains find rapid mental arithmetic of large or complex numbers difficult, and therefore associate it with intelligence. Computers happen to be fantastic at this kind of thing. The chess club is full of smart kids, and therefore chess must be a smart person’s game. Knowing that a computer could defeat the chess world champion must mean that computers are smart, right?
Here’s the thing, though. Computers are great at one thing: computing. Arithmetic is computation. Chess, at a sufficient level of abstraction, is also computation. The further away from numbers you move, the dumber computers become, meaning that for the vast majority of tasks our brains do each day, computers are extremely stupid. “Natural language”, aka the way we humans talk to each other, is an enormous challenge for a computer to deal with, as anyone playing interactive fiction for the first time could tell you. (Though the idea that better parsing of natural language will automatically make for better IF is another case of misplaced signifiers — better understanding of language is great and everything, but the more important part of IF is its model world. Advancing the parser just means the model world’s seams show more quickly.) Because computers lack human experience, they are stunningly bad at dealing with linguistic context, and are therefore capable of spectacular misunderstandings when faced with any language outside the very limited domains for which they’re programmed.
Watson is no exception to this, but it has a few advantages that other machines lack. For one thing, there’s an enormous amount of processing power behind it: some 90 servers, over 21 terabytes of data, 15 terabytes of RAM, and 80 teraflops of throughput. More important, though, are a couple of its conceptual approaches to knowledge.
First, through a paradigm called machine learning, Watson learns by example, getting better and better at the game as he sees more and more Jeopardy [leaving the exclamation point off from here on out] clues and their correct answers. It would be ridiculously impractical to try to construct a set of rules that would allow a computer to recognize every possible Jeopardy question, so instead Watson’s creators gave it a framework for recognizing associations between question words, answer words, and source texts, then fed it tens of thousands of Jeopardy clues as examples. This technique enabled Watson to make a huge leap in its Jeopardy prowess.
The other key aspect of Watson is its embrace of uncertainty. Watson doesn’t deal in right answers and wrong answers. It deals in answers that are more likely to be right vs. less likely to be right. Thus, when faced with the clue, “The parents of this 52nd governor of New York immigrated to the United States from Salerno, Italy,” we see its top three answers thus:
Watson was quite certain that “Mario Cuomo” was the correct answer, but hadn’t entirely ruled out the far crazier answers “motorcycle club” and “Marine Corps.” Indeed, if what you’re seeking is comedy, look no further than Watson’s runner-up answers.
Laughs aside, though, it’s this uncertainty which makes Watson so formidable. In a frequently-cited example, Watson can look at the name “Alice Cooper” and weigh the evidence that Alice is a woman’s name against the evidence that Alice Cooper is a man, give each pile of evidence a score, and come to its own conclusion. A strictly rule-bound computer would have to be given a specific exception to handle this case. Watson can generate its own exception, thereby improving its knowledge base. As a co-worker of mine pointed out, isn’t this a hallmark of intelligence? The capacity to allow for the possibility that we may not know everything or fully understand the world is an incredibly powerful tool in the search for truth.
So as a computer, Watson rocks. But Jeopardy is an entertainment program, not a science program. Is it fun to watch Watson play Jeopardy? George Doro, my teammate in the Anti-Social Network, called it “more fascinating than exciting,” and that’s right on target. IBM branded the hell out of this show, and it would have been a black eye for them had Watson lost. Consequently, a few gameplay decisions were made which helped Watson win, but made the show a little less fun.
First off, Watson was allowed to be lightning-fast on the buzzer. People think of Jeopardy as a purely mental game, but unlike chess, there’s a physical component of Jeopardy. People (and computers) with faster reflexes do far better on the show — it doesn’t matter if you know 100% of the answers when you’re getting outbuzzed 80% of the time. Trying to play buzzer-beaters against a computer is like running a 500-yard dash against a car. Watson didn’t have to be this quick — just subtract a little of that processing power until the computer’s average buzz-in time equals the average human’s buzz-in time (or even Ken Jennings’ average) and you’ve got a fairer battle, but instead, when Watson was certain enough of its answer, no human thumb could possibly outrace its mechanical plunger. (There were a few exceptions, but overall it was clear that Watson’s buzzing speed was what allowed it to dominate the match.)
Secondly, there’s the fact that each human had not only Watson to contend with, but also another top-notch Jeopardy player! Consequently, anytime Watson doesn’t pick up a clue in time, the two humans tended to split the points between them. I know Jeopardy is traditionally played by three contestants, but there was plenty about this match that was non-traditional. I would be very interested to see how Jennings would do against Watson by himself, especially if the buzzer advantage were corrected. As he put it in an NPR interview: “It’s the worst of both worlds, you know? The ideal scenario would be to have a human versus a computer, or maybe a computer versus a very good human and a lousy ‘Jeopardy!‘ player. I don’t know if you saw Wolf Blitzer on the show, but I’d like to have Wolf back.”
That’s not to say that Watson was flawless. One of its major weaknesses was its inability to see or hear. Instead of listening to Alex Trebek read the clue, Watson was fed the clue via (essentially) a text message, so it saw and started processing the clue at the same time as Ken and Brad saw it. The show neutralized the most obvious disadvantage of this blindness and deafness by eliminating the audio or visual clues it often features. Jeopardy has made this sort of accommodation before, to serve disabled human players, and while it’s certainly true that Ken and Brad could have whomped the computer on those clues, that’s really not what Watson was built to do, so it would rather miss the point. A more pertinent disadvantage was that it could not hear what the other contestants were answering. It was told whether its own answer was correct, and told the correct answers provided by humans, but was not told of wrong answers, leading to this exchange:
Ken: “‘Name That Decade’ for a thousand.”
Alex: “The first modern crossword puzzle is published & Oreo cookies are introduced.” [Ken buzzes in] “Ken?”
Ken: “What are the ’20s?”
Alex: “No.” [Watson buzzes in] “Watson?”
Watson: “What is 1920s?”
Alex: “No. Ken said that.”
[The correct answer was “The 1910s.”] Trebek’s schoolmarmish correction of a machine that had just that moment proven it can’t hear him was amusing, and perhaps reflexive. Watson’s error was the kind of mistake that humans rarely make, though it’s not unheard of. When a human does it, though, it’s a sign of frazzled nerves. With Watson, it’s an Achilles heel. Well, maybe an Achilles toenail.
Another major weakness Watson displayed was its difficulty leveraging the category title to come up with the answer. Humans completely dominated that “Name The Decade” category — Watson was having trouble processing quickly enough to outbuzz them, and at one point its top guess for one of the clues was “2002,” even though it did come up with decades for the others. Most famously, in the Final Jeopardy round of the first game, it encountered the category “U.S. Cities,” and the clue, “Its largest airport is named for a World War II hero; its second largest, for a World War II battle,” which it answered thus:
(This inspired the funniest Watson joke I’ve yet seen: “Me: Hey Doc, I’ve got this pain in my left arm and an awful headache. Doc: What is Toronto?????”) The answer was in fact “Chicago,” but even if a human didn’t know the answer, he very likely would have guessed an actual U.S. city based on the category, rather than a Canadian city.
As some of the IBM guys pointed out, Daily Doubles and Final Jeopardy are a tough area for Watson, because it has to guess something, and therefore risk looking stupid. When it’s not sure about its answers on a regular clue, it can just refrain from buzzing in. Watching the show, I thought perhaps that Watson’s creators forced it to simply focus on the question, more or less ignoring the category. Turns out this isn’t quite true. In fact, it considers the category in its approach, but it’s learned from its thousands of Jeopardy clues that category is often only weakly tied to the answer. For instance, that Chicago question could have been reworded, “Chicago’s O’Hare airport is named after a World War II hero; this airport, its second largest, was named after a World War II battle.” The question still would have fit the category, but the answer would have been an airport, not a city. Watson has seen that scenario play out many times, and is thus wary of assuming that the answer in a “U.S. Cities” category will always be a U.S. city.
In the end, Watson defeated the humans soundly, with a score of $77,147 to Jennings’ $24,000 and Rutter’s $21,600. A lot of the press coverage has focused on the “man vs. machine” angle, and of course the match was set up to emphasize that. In fact, it was rather poignant to see Watson beat one of its human practice match opponents on the clue, “This African-American folklore laborer: ‘Before I let that steam drill beat me down I’ll die with my hammer in my hand.'” I guess there’s this sort of pastoral vs. industrial thing that gets set up when machines attempt a traditionally human activity, even though people holding buzzers and answering trivia questions doesn’t exactly fit neatly into the pastoral mold.
I don’t feel much solidarity with the OMG SKYNET IS HERE!!!!! response. As somebody who works in IT, I’m fascinated by the achievement. I think about how satisfying it must have been to have worked on the team that created this. Those people just finished a massive four-year project, and the result was an incredible leap forward in information processing, with a world-famous, historic, televised, wildly successful debut. I just finished my time as a team member on a three-year project, and the result is a shakily implemented student system whose portal is currently driving everyone crazy with how incomplete and slow it is. I’m sure there is mental, emotional, and physical damage associated with both project teams, but wouldn’t it have been wonderful to have been on the one whose final product worked so well?
In his Final Jeopardy answer, Ken Jennings wrote, “(I, for one, welcome our new computer overlords.)” It’s a reference to a hilarious moment on The Simpsons. And interestingly, it may not have been one Jennings thought of himself. Here’s an excerpt from his NPR interview with Neal Conan:
Mr. JENNINGS: Maybe it’s just my own ego, but yeah, I feel like I’ve somehow, through some weird coincidence, been elected as the champion of carbon-based life on Earth against, you know, our new future oppressor.
CONAN: Silicon, yeah.
Mr. JENNINGS: And I would like to strike a blow while I have the chance.
CONAN: I, for one, welcome our robot overlords.
Mr. JENNINGS: You may have no choice, Neal.
Then again, it’s quite possible that this interview was taped after the Jeopardy challenge was taped, so who knows? But whether Jennings was lifting a joke or simply making a reference, isn’t this the skill for which we celebrate him? He gathers knowledge from various sources, and retrieves it quickly, using it when it can make the most impact. His graciousness and humor in that final moment certainly set him apart from his predecessor in IBM challenge history, Garry Kasparov, who famously stalked away in an enormous huff after being beaten by Deep Blue. But in that graciousness and humor, he also subtly made the point that for all Watson’s skill and speed at information retrieval, humans can still wield that information with a precision and effect that Watson could never hope to achieve.