Insights & Lessons: My Journey of Failed Predictions and Learnings

How I was right, even when I was wrong Chris Welty IBM Research

Outline • Opening Joke • Some personal history • My failed predictions • Lessons learned? • A glimpse into the future • Closing joke

The birth of a know-it-all • Born in NY early 60s • Early expert on everything • Disappointment with Buck Rogers • 2cnd grade • Summed up numbers from 1-100 in five minutes • Got 5100 • The answer is 5050 • First prediction (1975) • I will marry Farrah Fawcett-Majors

Insult no group unless it includes me

First exposure to email • 1981 uucp mail • allegro!batcave!cornell!rpics!weltyc • Prediction: • No one will ever use email • Why I was right: • Usenet paths were ridiculous • What I missed: • Paths and email were not tightly bound • People really wanted email WRONG!

Next generation email • 1983 CSNet • weltyc@rpics • Prediction: • As I said… • Why I was right: • Someone still has to maintain the list • Won’t scale • What I missed • People really needed email • It was better, not perfect WRONG!

. .edu .com .org Domain Naming Service • Proposed to IETF in 1985 • Distributed hierarchical database • Distributed not only the data, but the maintenance • might make email work • Prediction: • The .edu top-level will become overloaded • Why I was right: • The hierarchy was unbalanced • What I missed: • People were willing to invest in scale • Money to be made in supplying domain names! • It was better, not perfect WRONG!

HTTP/HTML • Proposed to IETF in 1990 • Hypertext is decades old, this just adds tags • Prediction: • No big deal, unimportant • Porn will make the InterNet unusable • Why I was right: • Porn really was king of the early web (~70%) • What I missed: • People were willing to invest in scale • It was more than just tags for hypercard WRONG!

Web 2.0 (i.e. Social Web) • Started roughly 2002 • Web of people instead of machines • Wikis, social tagging, social networks • Prediction • TEENAGE NONSENSE • Will be poisoned by stupidity, negativity, misdirection, spam • Will not scale • Why I was right • Most of it is teenage nonsense • Most people really are idiots • What I missed • People want to share their knowledge • People scale on the web • Quality seems to be self governing in certain areas WRONG!

Blind Men and Elephants Which one are you? I was right about the trunk.

Semantic Web • The idea has been around for about a decade • You may have heard of it • I got the pitch from TimBL… • Prediction: • KR is decades old, this just adds tags • Will not scale (KA, Reasoning) • Proliferation of bad ontologies will lead to bad systems • Why I was right: • Reasoning doesn’t really scale (exptime is incomplete) • Bad ontologies do lead to bad systems • What I missed • Its not just tags • KA does scale – people want to share their knowledge • A lot of people don’t care about reasoning • Better not perfect • KA not needed – the actual vision WRONG!

The Semantic Web Vision • ~80% of web pages are generated from back end databases • Publish the semantics (schema?) as well as the data • URIs provide a web-based form of identity • It’s the semantic WEB, not the SEMANTIC web • NOT: humans will markup their web pages • NOT: NLP will populate the SW from web pages

Lessons learned • People who make bad predictions still get to be invited speakers! • The unimpressed scientist syndrome • Applications that are needed will just happen • Better not perfect • People really want to share their knowledge • Scalability of people on the web • Scale happens

The Unimpressed Scientist • Be more open minded • Tend to “accept” rather than “reject” • Don’t confuse the trunk for the elephant • The evaluation criteria is not whether it will work, but whether it is needed

Better not Perfect • Improvements are important • So ask yourself, “Is this better” • Nit-picking usually is not important • The boundary conditions matter, but aren’t everything • Measurement, experimental conditions, become critical • What is “better”? • NLP perhaps takes this too far

Scalability • Faster, bigger computers • Better distribution • People on the web • The Captchas story • Heuristics, statistics

People want to share their knowledge • Shouldn’t be a surprise, this is what motivates us • Still, most people are idiots • …so… • Pure openness doesn’t work, but • Reviews, feedback, “how valuable”, etc. seem to work

Promising trends • Almost back to the 80s • KA with semantic wikis • E.g. ontoworld.org, Halo • NLP and KR are coming back together • Powerset, etc. • Collaborative, large, KBs • Dbpedia, freebase • Imdb, wordnet • Cyc • Scalable reasoning • SHER • Rules • RIF BLD released (http://www.w3.org/TR/rif-bld) • RDF compatibility (http://www.w3.org/TR/rif-rdf-owl)

Important Problems • API incompatibility • Connotation vs. Denotation • URIs provide identity, but what do they mean • Coreference, disambiguation, word sense • Experimental methodology, measurement • E.g. precision & recall • Dependencies of results • The very long tail • Wherefore reasoning? • Ontology Quality, Evaluation

Grassroots to the Web • Early web dominated by “what it looks like” in Mosaic • Unimpressed UI and Hypertext researchers • Focus on spreading the word, not doing it right • Many early web pages didn’t have links in text at all • “Catalog” pages with lists of links • “Text” pages with few or no links • Embedded images more interesting than links • Just do it rather than do it right • But… • When the web became serious, the research started to matter

A little semantics… • The SW catchphrase • “A little semantics goes a long way” • Sometimes strengthened • A lot of semantics is too much • 80/20 rule • Double-edged sword • FOAF doesn’t look like even 1% • The simplicity of FOAF hides any serious value proposition for SW • SW not for people, for data • Important to get it right?

Some evidence • Does quality matter? • Good quality ontologies cost more • Required for some applications • Improvements in quality can improve performance [Welty, et al, 2004] • 18% f-improvement in search • Cleanup cost ~1mw/3000 classes • BUT … low quality ontology still improved base

Wherefore Reasoning? • Very hard to “sell” OWL reasoning • Many users want very simple reasoning • Simple subclass • Simple range/domain constraints • Simple rules • Some users want more than OWL • But just to express their semantics • Improving precision? • Improving recall? Must be measured.

The very long tail Ontologies, explicit semantics frequency Something else?

Question Answering • Q: What weapon was featured in the ballet “Fall River Legend?” • A: American Ballet Theatre • OK, add “weapon” to ontology…

Question Answering • Q: What gum’s motto was “Double your pleasure, double your fun”? • A: personal lubricant

Vision Speech Natural Language Context awareness Tacit knowledge Learning Socialization Organization Perfect memory Calculation speed Planning & scheduling Games & simulation Search Networks Humans vs. Machines

Question Answering • Q: What president gave the longest inaugural speech? • A: Dieter Fensel • “Improvements” need to be measured • P α 1/R Leader Talk, presentation

Insights & Lessons: My Journey of Failed Predictions and Learnings

Insights & Lessons: My Journey of Failed Predictions and Learnings

Presentation Transcript

When I was One-and-Twenty

When I was a boy….

When I created TBI Raiders I was…

When I Was Young at Christmas

When I was your man

When he was wrong he was very wrong!

When I Was a Little Girl

When I Was Born

Where was I?

When I was very shocked

What was I before I was bread?

When I was a child 

1 When I was 13 years old, I

When I was Hostile

When I was young in Michigan

WHEN I WAS YOUNG

When I was young ...

... When I was a little girl ...

What I Wish I Knew When I Was 20…

“If I Said Something Wrong, I Was Afraid”

Was Darwin Wrong?

When I was young!!!