1 / 44

Watson System

Watson System. By : Devendra Chaplot Priyank Chhipa Pratik Kumar. What Computers Find Easier. 0.00885. ln ((12,546,798 * π ) ^ 2) / 34,567.46 = . What Computers Find Easier. Select Payment where Owner =“David Jones” and Type(Product)=“ Laptop”, . Dave Jones. ≠.

vern
Télécharger la présentation

Watson System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Watson System By : DevendraChaplot Priyank Chhipa Pratik Kumar

  2. What Computers Find Easier 0.00885 ln((12,546,798 * π) ^ 2) / 34,567.46 =

  3. What Computers Find Easier Select Payment where Owner=“David Jones” and Type(Product)=“Laptop”, Dave Jones ≠ David Jones = David Jones David Jones

  4. What Computers Find Hard Computer programs are natively explicit, fast and exacting in their calculation over numbers and symbols….But Natural Language is implicit, highly contextual, ambiguous and often imprecise. Structured Unstructured • Where was X born? One day, from among his city views of Ulm, Otto chose a water color to send to Albert Einstein as a remembrance of Einstein´s birthplace. • X ran this? If leadership is an art then surely Jack Welch has proved himself a master painter during his tenure at GE.

  5. A Grand Challenge Opportunity • Capture the imagination • The Next Deep Blue • Engage the scientific community • Envision new ways for computers to impact • society & science • Drive important and measurable scientific advances • Be Relevant to Important Problems • Enable better, faster decision making over unstructured and structured content • Business Intelligence, Knowledge Discovery and Management, Government, Compliance, Publishing, Legal, Healthcare, Business Integrity,Customer Relationship Management, Web Self-Service, Product Support, etc.

  6. Real Language is Real Hard • Chess • A finite, mathematically well-defined search space • Limited number of moves and states • Grounded in explicit, unambiguous mathematical rules • Human Language • Ambiguous, contextual and implicit • Grounded only in human cognition • Seemingly infinitenumber of ways to express the same meaning

  7. Automatic Open-Domain Question Answering A Long-Standing Challenge in Artificial Intelligence to emulate human expertise • Given • Rich Natural Language Questions • Over a Broad Domain of Knowledge • Deliver • Precise Answers:Determine what is being asked & give precise response • Accurate Confidences:Determine likelihood answer is correct • Consumable Justifications:Explain why the answer is right • Fast Response Time:Precision & Confidence in <3 seconds 7

  8. You may have heard of IBM’s Watson… A. What is the computer system that played against human opponents on “Jeopardy”…and won. Why Jeopardy? The game of Jeopardy! makes great demands on its players – from the range of topical knowledge covered to the nuances in language employed in the clues. The question IBM had for itself was“is it possible to build a computer system that could process big data and come up with sensible answers in seconds—so well that it could compete with human opponents?”

  9. Some Basic Jeopardy! Clues The type of thing being asked for is often indicated but can go from specific to very vague • This fish was thought to be extinct millions of years ago until one was found off South Africa in 1938 • Category: ENDS IN "TH" • Answer: • When hit by electrons, a phosphor gives off electromagnetic energy in this form • Category: General Science • Answer: • Secy. Chase just submitted this to me for the third time--guess what, pal. This time I'm accepting it • Category: Lincoln Blogs • Answer: coelacanth light (or photons) his resignation 9

  10. Lexical Answer Type • We define a LAT to be a word in the clue that indicates the type of the answer, independent of assigning semantics to that word. For example in the following clue, the LAT is the string “maneuver.” • Category: Oooh….Chess • Clue: Invented in the 1500s to speed up the game, this maneuver involves two pieces of the same color. • Answer: Castling

  11. Lexical Answer Type • About 12 percent of the clues do not indicate an explicit lexical answer type but may refer to the answer with pronouns like “it,” “these,” or “this” or not refer to it at all. In these cases the type of answer must be inferred by the context. Here’s an example: • Category: Decorating • Clue: Though it sounds “harsh,” it’s just embroidery, often in a floral pattern, done with yarn on cotton cloth. • Answer: crewel

  12. How we convert data into knowledge for Watson’s use Three types of knowledge Domain Data(articles, books, documents) Training and test question sets w/answer keys NLP Resources(vocabularies, taxonomies, ontologies) Converted to Indices for search/passage lookup Named entity detection, relationship detection algorithms Used to create logistic regression model that Watson uses for merging scores Redirects extracted for disambiguation Frame cuts generated with frequencies to determine likely context Custom slot grammar parsers, prolog rules for semantic analysis Pseudo docs extracted for Candidate answer generation

  13. Machine learning • One of the core components of the system • Multiple models • 14000+ training questions • Every candidate answer gets hundreds of features/scores associated with it. There features/scores are passed through previously trained ML model for candidate answer scoring • It's not just one model. In fact there is a chain of models, each subsequent one utilizes scores produced by previously run models • Machine learning also used in other parts of the system, such as LAT confidence analysis.

  14. NLP • Used in many places (Question Analysis, Evidence Analysis, Content Pre-processing) • Combines both rule and statistic based approaches • Full NLP stack (used in QA) • Tokenization • Named Entity Recognition • Deep Parsing and Predicate Argument Structure creation • Lexical Answer Type (LAT) and Focus detection • Anaphora resolution • Semantic Relationships extraction • Various technologies and techniques are used (English Slot Grammar parser, R2 NED, machine learning for LAT confidence analysis, custom annotators written in Prolog and Java)

  15. NLP Examples • LAT and Focus • It's the Peter Benchley novelabout a killer giant squid that menaces the coast of Bermuda • Named Entity Recognition • It's the {Person::Peter Benchley} novel about a killer giant {Animal::squid} that menaces the {Location::coast of Bermuda} • Anaphora Resolution • Columbus embarked on his first voyage to this continent in 1492. In the next two decades he led three more expeditions there.

  16. NLP in evidence analysis and content pre-processing • Why do NLP on evidence passages and ingested content? • NLP in Evidence Analysis allows: • LAT based scoring • Named entities alignment based scoring • NLP in Content Pre-processing • Extracting and accumulating “knowledge” frames from the content • For instance • SVO frame cuts will contain frequencies of Subject-Verb-Object occurrences in the content that Watson has ingested. • e.g squid menaces coast 809 • These “knowledge” frames are then used to generate candidate answers

  17. Broad Domain We do NOT attempt to anticipate all questions and build databases. We do NOT try to build a formal model of the world In a random sample of 20,000 questions we found 2,500 distinct types*. The most frequent occurring <3% of the time. The distribution has a very long tail. And for each these types 1000’s of different things may be asked. Even going for the head of the tail will barely make a dent *13% are non-distinct (e.g, it, this, these or NA) Our Focus is on reusable NLP technology for analyzing vast volumes of as-is text. Structured sources (DBs and KBs) provide background knowledge for interpreting the text.

  18. Automatic Learning for “Reading” Generalization & Statistical Aggregation Sentence Parsing Volumes of Text Syntactic Frames Semantic Frames verb object subject Inventors patent inventions (.8) Officials Submit Resignations (.7) People earn degrees at schools (0.9) Fluid is a liquid (.6) Liquid is a fluid (.5) Vessels Sink (0.7) People sink 8-balls (0.5) (in pool/0.8)

  19. Evaluating Possibilities and Their Evidences In cell division, mitosis splits the nucleus & cytokinesis splits this liquidcushioning the nucleus. • Many candidate answers (CAs) are generated from many different searches • Each possibility is evaluated according to different dimensions of evidence. • Just One piece of evidence is if the CA is of the right type. In this case a “liquid”. • Organelle • Vacuole • Cytoplasm • Plasma • Mitochondria • Blood … ↑ “Cytoplasm is a fluidsurrounding the nucleus…” Is(“Cytoplasm”, “liquid”) = 0.2 Is(“organelle”, “liquid”) = 0.1 Wordnet  Is_a(Fluid, Liquid)  ? Is(“vacuole”, “liquid”) = 0.2 Learned  Is_a(Fluid, Liquid)  yes. Is(“plasma”, “liquid”) = 0.7

  20. InMay,Garyarrived in Indiaafter hecelebratedhisanniversaryin Portugal. In May 1898 Portugal celebrated the 400th anniversary of this explorer’s arrival in India. arrived in celebrated celebrated In May 1898 In May 400th anniversary anniversary Portugal in Portugal arrival in India India Gary explorer Different Types of Evidence: Keyword Evidence Keyword Matching Keyword Matching Keyword Matching Evidence suggests “Gary” is the answer BUT the system must learn that keyword matching may be weak relative to other types of evidence Keyword Matching Keyword Matching 21

  21. In May 1898 Portugal celebrated the 400th anniversary of this explorer’s arrival in India. On 27th May 1498, Vasco da Gama landed in Kappad Beach On 27th May 1498, Vasco da Gama landed in Kappad Beach On 27th May 1498, Vasco da Gama landed in Kappad Beach On the 27th of May 1498, Vasco da Gama landed in Kappad Beach celebrated landed in Portugal 27th May 1498 May 1898 400th anniversary arrival in Kappad Beach India Vasco da Gama explorer Different Types of Evidence: Deeper Evidence • Search Far and Wide • Explore many hypotheses • Find Judge Evidence • Many inference algorithms Date Math Temporal Reasoning Statistical Paraphrasing Para-phrases Stronger evidence can be much harder to find and score. GeoSpatial Reasoning Geo-KB The evidence is still not 100% certain.

  22. DeepQA The technology & architecture behind Watson

  23. The Difference Between Search & DeepQA Decision Maker Has Question Search Engine Distills to 2-3 Keywords Finds Documents containing Keywords Reads Documents, Finds Answers Delivers Documents based on Popularity Finds & Analyzes Evidence Expert Understands Question Decision Maker Produces Possible Answers & Evidence Asks NL Question Analyzes Evidence, Computes Confidence Considers Answer & Evidence Delivers Response, Evidence & Confidence

  24. DeepQA: the technology & architecture behind Watson Learned Models help combine and weigh the Evidence model model model Evidence Sources Answer Sources Initial Question model model model Deep Evidence Scoring Candidate Answer Generation Evidence Retrieval Answer Scoring PrimarySearch model model model Final Confidence Merging & Ranking Hypothesis Generation Hypothesis & Evidence Scoring Synthesis Question Decomposition Question & Topic Analysis Answer &Confidence Hypothesis Generation Hypothesis and Evidence Scoring Hypothesis Generation Hypothesis and Evidence Scoring

  25. DeepQA: the technology & architecture behind Watson 1 Initial Question Formulated: “The name of this monetary unit comes from the word for "round"; earlier coins were often oval” Initial Question 3 It decides whether the question needs to be subdivided. Question Decomposition Question & Topic Analysis Watson performs question analysis, determines what is being asked. 2

  26. DeepQA: the technology & architecture behind Watson 5 In creating the hypotheses it will use, Watson consults numerous sources for potential answers… Answer Sources Initial Question Candidate Answer Generation PrimarySearch 4 Hypothesis Generation Question Decomposition Question & Topic Analysis Watson then starts to generate hypotheses based on decomposition and initial analysis…as many hypothesis as may be relevant to the initial question… Hypothesis Generation Hypothesis Generation

  27. DeepQA: the technology & architecture behind Watson 7 Watson uses Evidence Sources to validate it’s hypothesis and help score the potential answers Evidence Sources Answer Sources Initial Question Deep Evidence Scoring Candidate Answer Generation Evidence Retrieval Answer Scoring PrimarySearch 8 If the question was decomposed, Watson brings together hypotheses from sub-parts Hypothesis Generation Hypothesis & Evidence Scoring Synthesis Question Decomposition Question & Topic Analysis 6 Watson then uses algorithms to “score” each potential answer and assign a confidence to that answer… Hypothesis and Evidence Scoring Hypothesis and Evidence Scoring

  28. DeepQA: the technology & architecture behind Watson Learned Models help combine and weigh the Evidence 9 Using models on the merged hypotheses, Watson can weigh evidence based on prior “experiences” model model model Answer Sources Initial Question Initial Question model model model Candidate Answer Generation PrimarySearch model model model Final Confidence Merging & Ranking Hypothesis Generation Hypothesis & Evidence Scoring Synthesis Question Decomposition Question & Topic Analysis 10 Answer &Confidence Hypothesis Generation Once Watson has ranked its answers, it then provides its answers as well as the confidence it has in each answer. Hypothesis Generation

  29. DeepQA: the technology & architecture behind Watson Learned Models help combine and weigh the Evidence model model model Evidence Sources Answer Sources Initial Question Initial Question model model model Deep Evidence Scoring Candidate Answer Generation Evidence Retrieval Answer Scoring PrimarySearch model model model Final Confidence Merging & Ranking Hypothesis Generation Hypothesis & Evidence Scoring Synthesis Question Decomposition Question & Topic Analysis Answer &Confidence Hypothesis Generation Hypothesis and Evidence Scoring Hypothesis Generation Hypothesis and Evidence Scoring

  30. Step 0 : Content Acquisition Content acquisition is a combination of manual and automatic steps. The first step is to analyze example questions from the problem space to produce a description of the kinds of questions that must be answered and a characterization of the application domain. Analyzing example questions is primarily a manual task, while domain analysis may be informed by automatic or statistical analyses, such as the LAT analysis.

  31. Step 1 : Question Analysis Initial Question Initial Question Question Decomposition Question & Topic Analysis The system attempts to understandwhat the question is asking and performs the initial analyses that determine how the question will be processed by the rest of the system. Question Classification e.g. puzzle/math Focus and Lexical Answer Type (LAT) e.g. “On this day” LAT – date/day Relation Detection e.g. sea (India, x, west) Decomposition - divide and conquer.

  32. Step 2 : Hypothesis Generation Answer Sources Candidate Answer Generation PrimarySearch Hypothesis Generation Question Decomposition Hypothesis Generation Hypothesis Generation • Primary search : • Keyword based search • Top 250 results are considered for CandidateAnswer generation. • Empirical statistics : 85% time answer is withintop 250 results. • CA generation :generates CAs using results ofPrimary Search • Soft Filtering • lightweight (less resource intensive) scoring algorithms to a larger set of initial candidates to prune them down to a smaller set of candidates • Reduction in number of CA to approx. 100 • Answers are not fully discarded , may be reconsidered at final stage.

  33. Step 2 : Hypothesis Generation 4. Each CA plugged back into the question is considered a hypothesis which the system has to prove correct with some threshold of confidence. 5. If failed at this state , system has no hope of answering the question whatsoever. • Noise tolerance - tolerate noise in the early stages of the pipeline and drive up precision downstream • Favors recall over precision, with the expectation that the rest of the processing pipeline will tease out the correct answer, even if the set of candidates is quite large

  34. Step 3 : Hypothesis & Evidence scoring • Candidate answers that pass the soft filtering threshold undergo a rigorous evaluation process that involves 2 steps :- • Evidence retrieval : • Gathers additional supporting evidence for each candidate answer, or hypothesis. e.g. Passage search: gathering passages by adding CA to primary search query. • Scoring: • Deep content analysis – includes many different components, or scorers, that consider different dimensions of the evidence • Produce a score that corresponds to how well evidence supports a candidate answer for a given question.

  35. Step 4 : Final Merging and Ranking • Merging: • Multiple candidate answers for a question may be equivalent despite very different surface forms. • Using an ensemble of matching, normalization and co-reference resolution algorithms, Watson identifies equivalent and related hypothesis. • Without merging, ranking algorithms would be comparing multiple surface forms that represent the same answer and trying to discriminate among them.

  36. Step 4 : Final Merging and Ranking • Ranking and confidence estimation: • After merging, the system must rank the hypotheses and estimate confidence based on their merged scores • These hypothese are ran over set of training questions with known answers. • Watson’s metalearner uses multiple trained models to handle different question classes as, for instance, certain scores that may be crucial to iden- tifying the correct answer for a factoid question may not be as useful on puzzle questions

  37. The Final Blow! (ctd.) “I for one welcome our new computer overlords” - Jennings

  38. Watson – a Workload Optimized System 1 Note that the Power 750 featuring POWER7 is a commercially available server that runs AIX, IBM i and Linux and has been in market since Feb 2010 90 x IBM Power 7501 servers 2880 POWER7 cores POWER7 3.55 GHz chip 500 GB per sec on-chip bandwidth 10 Gb Ethernet network 15 Terabytes of memory 20 Terabytes of disk, clustered Can operate at 80 Teraflops Runs IBM DeepQA software Scales out with and searches vast amounts of unstructured information with UIMA & Hadoop open source components Linux provides a scalable, open platform, optimized to exploit POWER7 performance 10 racks include servers, networking, shared disk system, cluster controllers

  39. Watson: Precision, Confidence & Speed Deep Analytics – We achievedchampion-levels of Precision and Confidence over a huge variety of expression Speed – By optimizing Watson’s computation for Jeopardy! on 2,880 POWER7 processing cores we went from 2 hours per question on a single CPU to an average of just 3 seconds – fast enough to compete with the best. Results – in 55 real-time sparring against former Tournament of Champion Players last year, Watson put on a very competitive performance, winning 71%. In the final Exhibition Match against Ken Jennings and Brad Rutter, Watson won!

  40. Potential Business Applications • Healthcare Analytics • Analyzing: E-Medical records, hospital reports • For: Clinical analysis; treatment protocol optimization • Benefits: Better management of chronic diseases; optimized drug formularies; improved patient outcomes • Customer Care • Analyzing: Call center logs, emails, online media • For: Buyer Behavior, Churnprediction • Benefits: Improve Customer satisfaction and retention,marketing campaigns, find new revenue opportunities • Crime Analytics • Analyzing: Case files, police records, 911 calls… • For: Rapid crime solving & crime trend analysis • Benefits: Safer communities & optimized force deployment • Insurance Fraud • Analyzing: Insurance claims • For: Detecting Fraudulent activity & patterns • Benefits: Reduced losses, faster detection, more efficient claims processes • Automotive Quality Insight • Analyzing: Tech notes, call logs, online media • For: Warranty Analysis, Quality Assurance • Benefits: Reduce warranty costs, improve customer satisfaction, marketing campaigns • Social Media for Marketing • Analyzing: Call center notes, SharePoint, multiple content repositories • For: churn prediction, product/brand quality • Benefits: Improve consumer satisfaction, marketing campaigns, find new revenue opportunities or product/brand quality issues 41

  41. References • The AI magazine • Ferrucci, David, et al. "Building Watson: An overview of the DeepQA project." AI magazine 31.3 (2010): 59-79. • Watson Systems: • http://www-03.ibm.com/innovation/us/watson/ • Wiki Page • http://en.wikipedia.org/wiki/Watson_%28computer%2 • Building Watson A Brief Overview of the DeepQAProject by Joel Farrell,IBM • http://www.medbiq.org/sites/default/files/presentations/2011/Farrell.ppt

  42. References • What is Watson, really? • http://www-01.ibm.com/software/ebusiness/jstart/downloads/IOD2011.ppt • Authors : KeyurDalal (IBM), Vladimir Stemkovski (IBM) and Jeff Sumner (IBM) • Jeopardy! IBM Watson Day 1 (Feb 14, 2011) • http://www.youtube.com/watch?v=seNkjYyG3gI&feature=related • Science Behind an Answer- • http://www-03.ibm.com/innovation/us/watson/what-is-watson/science-behind-an-answer.html • Video:  http://youtu.be/DywO4zksfXw

  43. Questions?

  44. Thank You!

More Related