Watson, game 2

Not much new to report after the second game of the Watson Jeopardy Challenge. I've added a few updates to yesterday's post on Watson and the result was as expected, though Watson struggled a lot more in this game than in the prior round, deciding not to answer many questions due to low confidence and making a few mistakes. In a few cases it was saved by not buzzing fast enough even though it had over 50% confidence, as it would have answered slightly wrong.

Some quick updates from yesterday you will also find in the comments:

  • Toronto's 2nd busiest airport, the small Island airport, has the official but rarely used name of Billy Bishop. Bishop was one of the top flying aces of WWI, not WWII. Watson's answer is still not clear, but that it made mistakes like this is not surprising. That it made so few is surprising
  • You can buzz in as soon as Trebek stops speaking. If you buzz early, you can't buzz again for 0.2 seconds. Watson gets an electronic signal when it is time to buzz, and then physically presses the button. The humans get a light, but they don't bother looking at it, they try timing when Trebek will finish. I think this is a serious advantage for Watson.
  • This IBM Blog Post gives the details on the technical interface between Watson and the game.
  • Watson may have seemed confident with its large bet of $17,973. But in fact the bet was fixed in advance:
    • Had Jennings bet his whole purse (and got it right) he would have ended up with $41,200.
    • If Watson had lost his bet of 17,973, he would have ended up with $41,201 and bare victory.
    • Both got it right, and Jennings bet low, so it ended up being $77,147 to $24,000.
    • Jennings' low bet was wise at it assured him of 2nd place and a $300K purse instead of $200K. Knowing he could not beat Watson unless Watson bet stupidly, he did the right thing.
    • Jennings still could have bet more and got 2nd, but there was no value to it, the purse is always $300K
    • If Watson had wanted to 2nd guess, it might have realized Jennings would do this and bet appropriately but that's not something you can do more than once.
    • As you might expect, the team put a bunch of thought into the betting algorithm as that is one thing computers can do perfectly sometimes. I've often seen Jeopardy players lose from bad betting.
  • It still sure seemed like a program sponsored by IBM. But I think it would have been nice if the PI of DeepQA was allowed up on stage for the handshake.
  • I do wish they had programmed a bit of sense of humour into Watson. Fake, but fun.
  • Amusingly Watson got a category about computer keyboards and didn't understand it.
  • Unlike the human players who will hit the buzzer before they have formed the answer in their minds, in hope that they know it, Watson does not hit unless it has computed a high confidence answer.
  • Watson would have bombed on visual or audio clues. The show has a rule allowing those to be removed from the game for a disabled player, these were applied!
  • A few of the questions had some interesting ironies based on what was going on. I wonder if that was deliberate or not. To be fair, I would think the question-writers would not be told what contest they were writing for.

Comments

Supplying the answer as text to Watson gives the machine an
advantage, unless it is supplied word by word as Alex speaks
it. It's hard to ignore Alex and read ahead -- try it sometime.
And as you said, giving Watson a "OK to buzz" signal is a big
advantage as well. If the human contestants use the light signal,
they are subject to the delays introduced by the nervous system.
If they ignore the light signal and try to time it, they will
miss occasionally, while Watson will not. I think this is why
Watson was able to win the race to buzz in so often. I've read
accounts of people who have been on the show, and they say that
being able to buzz in first is a BIG part of winning. Watson
will never buzz in early, and it can always buzz in instantly.

The odd (to humans) wagers on daily doubles and final Jeopardy
drew some laughs from the audience, but they're probably due to
carrying out fractional calculations to the dollar. The wagers
would have seemed more 'natural' had they been rounded to the
nearest hundred.

These are advantages. Not easy to figure out how to eliminate them (other than a rules change to make buzz speed not a factor, ie. if you buzz at all near the mark, you get a random chance of being chosen.) I think Watson could learn to follow the cadence as the humans do. They clearly are reading ahead since they know when he is finished before he finishes in order to buzz in. It may be hard to do, but I think a good player can train to ignore Trebek and just read the clue, and I think a great player has to do that, just for buzzing. People however can figure out a clue only from the first few words, something Watson is not good at, so in many cases they know whether they will buzz when they have only read (or heard) half the clue. Watson as yet doesn't do that.

Well, human players can start working out the solution after only a few words, but Watson gets ALL the words simultaneously.

Indeed, it would be interesting to see what Watson would do if it were presented the clues "one word at a time", similar to the (presumed) way that the humans receive them. It would be neat to see the answer evolve as the clue was completely revealed!

I'd have *liked* to see an indicator of whether the human players buzzed in, and when they buzzed in relative to Watson; but, of course, that would reveal that it was indeed just a game of beat-the-switch.

They do actually discuss this. They say that since the human's retinas get the image when it shows and Watson gets the IP packet, this is "fair" as a contest of man and machine. Yes, humans can't grok and entire screen of text in an instant to turn it into words, but neither can Watson understand the words instantly.

One could argue a fair test might be to watch human eyes to figure out the average time it takes them to read the passage, and give the words to Watson in groupings matching typical human reading. That would probably eat up a second or so from Watson's time, and IBM would have answered that by just adding more cores. So it may not have been fruitful.

One could also ask Watson to do OCR. It would do that faster than humans can read though -- today's OCR systems on a single core can read an entire page in a second or two. On 2800 cores it would be a blip.

But the contest was not about reading, and it was not really supposed to be about buzzing. It was supposed to be a test of question answering. It just happens that Jeopardy is the most famous and well understood way to do that, one the public would tune in for.

It's been stated in several places that the games for the Watson match were chosen at random by a third party from games written by the usual J! writers for regular J!. And then either scrubbed for visual/audio clues and categories that only make sense after Alex explains them with an example, or boards with such clues/categories weren't included in the set the selection was made from.

Though I would disagree that it was fair to remove categories that need explanation. Watson should just get the explanation. As it turns out, Watson did OK at learning about categories by example.

Watson would also do almost perfectly at audio and video clues with a large library of fingerprinted audio and video. Recognizing very similar pictures and sounds is mostly a solved problem. As such it would not be impressive, nor good for the humans, but it would be a special algorithm in Watson and some work, so it is fine to have left them out. Leaving out a hard, unsolved problem reduces the contest a small bit.

The humans had trouble with the keyboard keys category too, but I doubt Watson would ever have gotten it.

Another oddity in Watson's game play was how it picked the next
answer. It seemed to be going across the board from category to
category at the same dollar level, or bouncing around the board
seemingly at random, as opposed to working down a single category
as most humans do. I have occasionally seen people play this way,
most likely looking for the Daily Doubles. I don't know what
Watson's algorithm would be.

They learned daily doubles appeared mostly in row 4, then in rows 3 and 5, and for some reason mostly in column 1.

Hunting for them is exactly what it was doing, not to bet heavily, but mostly to deprive the other players of the chance to use them, since with those two players, these would have been their real chance to win enough money to beat Watson.

Add new comment