Deep Plane Thoughts: A Man Walks into a Bar and Says “Ouch”

Image by Getty Images via @daylife

I skipped my flight to SLC this week, so I’m relegated to writing this in the Night Kitchen over a basket of fried cheese curds.

Last weekend I stumbled across a video of IBM’s new Watson supercomputer kicking Ken Jennings’ ass at Jeopardy (which is well worth the 4 mins to watch).  Now all the cool kids are talking about it.  In addition to the sheer geeky joy I felt watching Watson do some damage to that smug Ken Jennings, the technical achievement is stunning to us lay persons.

Some of the technology employed by Watson reminded me to a common theme expressed by  peeps I’ve talked to over the past 2 years and reflected  in our division’s mission “to deliver knowledge by computationally understanding user intent”: namely to begin thinking of the web as a digital representation of the physical world complete with relationships among the objects it represents.  It begins to look a lot like the fabled semantic web that has been promoted for years by winged unicorns riding on rainbows (kind of like mobile couponing…oh wait…).  In particular, the ability for Watson to distinguish puns from literal speech got me thinking about just how it could do that.  When I, in an attempt to make the bartender here laugh, I say:

Did you hear they found a narcissistic male lion whose females had turned on him?
No, really?
Yeah. ‘Course it was his pride that did him in… “

Most of us with a grasp of the English language can read that and chuckle as we know pride could mean both ‘satisfaction with oneself’ and ‘group of lions’ (note: bartender – not so much).  To a computer, however, it would likely make no sense unless it had a map of the language and its connections that could tell it lions are part of a pride, narcissism and pride are linguistically related, and that female lions ‘turning on him’ is related more ‘did him in’ (as in killed) rather than them literally turning around to face him.  Holy crap that’s a lot of work to make sense of a bad pun.

This Watson development (and many others) are so very interesting for a number of reasons.  Indeed the Web of Objects/Internet of Things/Web of Things has been around as an idea for years.  It’s increasingly important, however, because as we all know people are using the web differently than they did a decade ago.  It’s no longer a directory but more an extension of physical world interactions.  You do stuff on the web, you don’t just read stuff.

And in order for people to ‘do stuff’ we need the engines to be able to understand the web is simply an overlay on top of existing physical objects.  In the same way  a Yellow  Pages book or a paper map was a proxy for walking down main street in a lower-fidelity (albeit more efficient) manner, the web really is a massively scaled but remarkably lossy representation of the real world.  I call it lossy because while links between chunks of information are explicit on the web (in the form of the hyperlink), the structure that could enable the engines to understand that a squash facility has a finite number of courts, those courts require reservations, reservations are made in a number of ways (web, phone), there are online services that exist to do just that (make a reservation at a particular squash court), and a given court near your house uses a given online service to facilitate that reservation – that structure doesn’t really (and probably will never) exist.  So the web is this lossy representation of reality because – man – look at all the gaps that somehow machines have to fill to better realize the vision for how people want to use the web.  So how do we help systems understand the real world’s connections?

Luckily there are advancements that seem to show promise.  I’m not nearly smart enough to know which of these will pan out, which will flame out, and which are simply interesting roadside attractions – there are people who’ve spent their entire lives on this problem.  But these things certainly are buzzy:

  • Facebook’s OpenGraph protocol: While super simple (some argue too simple to be useful), it lets webpage authors actually markup their pages with code that can tell engines what is on the page.  In other words, by using some really simple markup (code…) that “enables publishers to say what object is on the page – a movie, a book, a recording artist, an event, a sports team, etc.”, engines can begin to understand that “The Rock” is a movie (in addition to an actor and eventually other stuff as the vocab expands).
  • Neato work from MSR (albeit six years old) on a service to show you how words can be related (like lion and pride)
  • Ways for sites to give engines ‘hints’ about what data (and ostensibly) services they can provide from our SQL guys who are promoting a new standard called OData.  OData enables sites and services to publish what data they have and how other systems can access it.  You could imagine how this will evolve to enable sites to attach semantic meaning to their structured data.
  • Quora’s ontology model: Here is what Bing looks like.  It’s a crowdsourced model that enables the Quora users to place things in the real world in context.  For example, this helps a system know TED is a conference.

Unlike centralized semantic models that have been attempted over the past decades, could it be these things (and others I know I haven’t touched on) could generate an accidental semantic model?  The amount of user generated and publisher generated semantic clues about what used to be simply .htm pages is pretty stunning.  Yet it’s one thing to have a bunch of these hints or linked data scattered across the web and quite another to turn those loose connections into a logical model of the physical world that machines can understand.  Some pioneering work by Cray Supercomputing with their new XMT is showing promise to make sense of this mass of data.  But once we even get that model, imbuing machines with the intelligence to make use of it at scale is quite another problem.  In other words, just because machines know a restaurant accepts reservations for tables, they still have to figure out how to make use of that knowledge.  Some great work is happening across the industry to attempt to do just that and I can’t wait to see it.

Til next week


Enhanced by Zemanta

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s