1 / 90

Artificial Intelligence Technologies for Web Intelligence

Artificial Intelligence Technologies for Web Intelligence. Guest Lecture to Singapore-MIT Alliance. Ah-Hwee Tan Laboratories for Information Technology, Singapore Oct 11, 2002. Outline. What is Web Intelligence (WI)? How to do WI? Technologies and Tools (disclaimer: snapshots only)

laverne
Télécharger la présentation

Artificial Intelligence Technologies for Web Intelligence

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Artificial Intelligence Technologiesfor Web Intelligence Guest Lecture to Singapore-MIT Alliance Ah-Hwee Tan Laboratories for Information Technology, Singapore Oct 11, 2002

  2. Outline • What is Web Intelligence (WI)? • How to do WI? • Technologies and Tools (disclaimer: snapshots only) • What’s next?

  3. Web Intelligence and … spying on the web

  4. Web Intelligence • Scanning, tracking, and analyzing information on the world wide web for the purpose of competitive intelligence • Intelligence as in Central Intelligence Agency

  5. The other definition of Web Intelligence • Web Intelligence Consortium (WIC) (http://wi-consortium.org/) • Artificial Intelligence (AI), Information Technology (IT), + Web • Intelligence as in Artificial Intelligence

  6. Competitive Intelligence (CI)(Fuld & Company, 2000, 2001) • Highlight the importance of gathering, analyzing, and distributing competitive information to gain competitive advantages • Too risky to do business without CI • SCIP grew from 150 (1991) to 7000 (2000) • Press articles has increased from 100 (1991) to 6000 (2000).”

  7. Competitive Intelligence Cycle(Fuld & Company, 2000, 2001) Planning & Direction Information Gathering Evaluation & Tracking Analysis & Production

  8. AI Technologies for Web Intelligence • Information Gathering • Getting the information (search, information retrieval) • Analysis and Production • Putting things in perspectives (clustering, categorization) • Gaining insights (info/knowledge extraction, discovery) • Evaluation and Tracking

  9. Technologies for Search • Purpose: Getting the right information • Challenges • Too much information, irrelevant information, out-of-date information • Technologies • Information retrieval, PageRank • Tools • General: Google, AltaVista, Excite, etc • Specialized: Patent (Delphion), News (LexisNexis)

  10. SMART (Salton, 1971) • One of the first, and still best IR systems • vector space model for representing documents • automatic indexing • Given a new query • converts to a vector • uses a similarity measure to compare it to the documents • Return top n documents • can perform relevance feedback

  11. Document Representation • Vector Space Model • Bag of words, e.g. operating, system • Terms/Phrases, e.g. operating systems • N-grams (Huffman, TREC-4, 1995) • Syntactic 3-tuples (Kanagasa & Pan, PRICAI- 2000) • Concept-Relation-Concept (Paik et al, US6,263,335)

  12. Indexing • Goal • To select a set of important keyword features among all words appear in the document set • How • remove stop words, reduce to root form • pick terms based on part-of-speech tagging • keyword weighting

  13. Feature Weighting • Goal • To represent a doc using a real-valued vector • How: An example • For doc dj and keyword wi, calculate • Term frequency (TF) = TF(wi,dj) • Inverse Document Frequency (IDF) = log (N/DF(wi)) • TF.IDFIij = TF.IDF • Normalize Ij = (Ij1/Im, Ij2/Im, …, IjN/Im) • where Im = max (Iij) for all i

  14. PageRank(Page & Brin, 1998) • using its vast link structure as an indicator of an individual page's value • A page that receives many links is important • A page receives a link from an important page is also important • combines PageRank with sophisticated text-matching techniques to find pages that are both important and relevant

  15. How to Search - Tips from an Intelligence scout (Courtesy of LIT’s Planning Group)

  16. LIT KSKS Process 1) KIT (Identify your Key Intelligence Topic) 2) Sources (and resources) 3) KIQ (Key Intelligence Questions) 4) Search Strategy

  17. Key Intelligence Topic • Identify your Key Intelligence Topic(s) • Drill down • instead of “Ubiquitous Computing”, what sub topics are you REALLY interested in? • a “taxonomy” will be useful

  18. KIT • Start with a good descriptive paragraph on your topic, name a few applications • Think out of the box - terminologies used by “reporters” “journalists” “laymen”

  19. Sources

  20. … and Resources • TIME and MANPOWER and TRAINING • Monitoring = Project • Monitoring : long periods of time, identify the delta (change) • Project: specific, determined period of time. Objective/goal is to know as much as possible on topic

  21. Key Intelligence Questions • Known Analysis Techniques: 5F, 5C, SCP, TOWS • LIT methodology: KIQ technique (Combo of above) • Your KIQs form the backbone of your analysis (WYAIWYG)

  22. …. KIQs • Ask yourself 5-8 Key Intelligence Questions • Establish key indicators or proxy indicators

  23. Sample KIQs • Top industry players? (big, small, listed, unknown) Region? Profiles. • R&D labs? Region? • Major research trends? • Products available? Prototypes? Technologies? • Research challenges? (problems and issues) • Upcoming markets (segments? size? Time frame) • IP and opportunities for LIT? Supply Environment Supply/weakness/ threats Environment/ opportunities Demand/ opportunities Strength/ opportunities

  24. Questions • Where are the markets for the applications? • What time frame for market release? • What are the price points? • Who are the top # players? (by countries/region/labs/companies) • What products available? Any prototypes? • What are the technologies behind these? • What are the research trends/ challenges? • Any IP opportunities?

  25. Search Strategy • Sources and URLs • Search “Magnets” (word/phrase spotting) • Tools • Reiterate!

  26. Magnets • Magnets are specific, well used terms to increase probability • append to your normal search string • Trends, surveys, forecasts, estimates, units shipped, scenarios • CEO + interview • market research report, table of contents • see handout “Appendix B. cheat sheet on magnets”

  27. Recap • KIT (sub topics) • terms (known to you): • terms (used elsewhere during a search) • Sources • Specific syntax • Magnets • KIQs • Tools - Search

  28. Tools for Search • Copernics (PC) • Google, AltaVista Link Search (web, free) • Lexis Nexis (web, subscription) • Use advance search • purpose: increase relevance • tablebase • InfoTech Trends (web, subscription) • Delphion Patent Server (web, subscription)

  29. Copernics:Search, File, Track

  30. Google(www.google.com)- a tool for search

  31. Google: Search

  32. Tips for using Google • Try the obvious first. If you're looking for information on java project , enter ”Java project" rather than ”java". • Use words likely to appear on a site with the information you want. ”Java Project Spanish Inquisition" gets better results than ”spanish java". • Make keywords as specific as possible.

  33. All terms • By default, Google only returns pages that include all of your search terms. There is no need to include "and" between terms. Keep in mind that the order in which the terms are typed will affect the search results.

  34. Stop words • If a common word is essential to getting the results you want, you can include it by putting a "+" sign in front of it. (Be sure to include a space before the "+" sign.) • Star Wars Episode +1 • “Star Wars Episode 1”

  35. Google: not case sensitive • Google searches are NOT case sensitive. All letters, regardless of how you type them, will be understood as lower case. For example, searches for "george washington", "George Washington", and "gEoRgE wAsHiNgToN" will all return the same results.

  36. Google: no stemming • Google does not use "stemming" or support "wildcard" searches. In other words, Google searches for exactly the words that you enter in the search box.

  37. Find out who links to you • Find out who links to the Java Project • link:www.xyz.com

  38. Google: Site search • The word "site" followed by a colon enables you to restrict your search to a specific site. To do this, use the site:sampledomain.com syntax • spanish inquisition site:www.javadeveloper.com

  39. Altavista: Link search • Useful if you are looking for news surrounding “small”“unknown”“unlisted” company which may be your competitor • Instead of searching for the small company, search for “who else” links or write about that “small” company. • Who else? (what can you find out about the small company) • its interested investors or alliances, its suppliers. Research collaborations • Use the Good Old Alta Vista

  40. Alta Vista “link” search • Link:infineon +”fabric” +”wearable” • who else links to infineon? Who else is interested in infineon? • note: why is www left out in the link search? • Link: lit.a-star.edu.sg -lit.a-star.edu.sg • everyone else except krdl (not interested in self citations) • link: lit.a-star.edu.sg -lit.a-star.edu.sg url:edu • who are the edu (usually univ, including research) with interest or collaborating with krdl • link: lit.a-star.edu.sg -lit.a-star.edu.sg url:edu -url:edu.sg • same as above not not interested in local univ.

  41. Lexis Nexis - The Legal and News Provider

  42. Lexis Nexis • - Power Search • - Relevance • e.g headline(“smart homes”) • - Proximity and Stemming • e.g comput! (stemming) • e.g w/10 (within 10 words) • e.g w/p (within paragraph) • - Limit currency (90 days, previous year), then expand

  43. Example “red-eye correction” - (red eye) w/p patent

  44. Lexis Nexis Power Tip 2 • - Find the Elusive “Market Numbers” • Specific source within Lexis Nexis • Select RDS TableBase • Text articles accompanied by tabulated data from market research consultants and investment house. • Supplement with another useful “table” database “Infotechtrends”

  45. Lexis Nexis’ RDS TableBase “market size” data

  46. Results

  47. Handset leaders? Strategy Analytics, a Boston-based research firm, estimates that Nokia and Samsung Electronics Co. Ltd. , Seoul, South Korea, were the only leading handset makers to make a profit last year.

  48. Data and Tables (2) • - InfoTech Trends • Data compiled from various IT related trade magazines • - Login with “ip” address

  49. Technologies for Organizing • Clustering • Organizing information into groups based on similarity functions and thresholds • e.g. NorthernLight, BullsEye, Vivisimo • Categorization • Organizing information into a “predefined” set of classes • e.g. Yahoo!, Autonomy Knowledge Server

More Related