1 / 24

Integrating Techniques for Event-based Business Intelligence Gathering

Integrating Techniques for Event-based Business Intelligence Gathering. Kareem S. Aggour John Interrante Ibrahim Gokcen July 16, 2006. Business Problems. Manual search of existing news sources/ aggregators Emergence of novel news sources

nuala
Télécharger la présentation

Integrating Techniques for Event-based Business Intelligence Gathering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Integrating Techniques for Event-based Business Intelligence Gathering Kareem S. Aggour John Interrante Ibrahim Gokcen July 16, 2006

  2. Business Problems • Manual search of existing news sources/ aggregators • Emergence of novel news sources • Dealing with information explosion vs. keeping abreast of important developments • Distributed data collection across marketing, sales

  3. Motivation • Identify sales/risk leads on 8 topics • Risk: Bankruptcy, Management Succession, Litigation, Change in Auditors, Rating Change • Sales: Bankruptcy, Outsourcing, Mergers & Acquisitions, Facility Expansions • Provide actionable and focused content to risk and sales reps in financial services businesses • Automate extraction and integration of events from multiple providers • Reduce repetitious work by centralizing event collection

  4. Anticipated Results (M&A examples) • First Financial Management Corp said it has offered to acquire Comdata Network Inc for $18 per share in cash and stock, or a total of about $342.7 million. • Delta announced last September that it was purchasing Western. • Nerco Inc said its oil andgas unit closed the acquisition of a 47% working interestin the Broussard oil and gas field from Davis Oil Co for about $22.5 million in cash. Extract key information from articles efficiently and with good precision/recall for all topics

  5. How?: EBIG Agent Architecture • Ontology generation • Named entity extraction • Targeted phrase extraction using a dependency grammar • Query generation & expansion • Data visualization • Text classification

  6. Integrating Techniques Query expansion Ontology generation Text classification Data visualization Event extraction Named entity extraction

  7. Extraction Pipeline Articles Sentences Events Text classification Ontology patterns Named entity extraction Phrase extraction

  8. Query Generation & Expansion • Store queries (to a news source for a given date range) to prevent duplicate retrieval • If articles exist in the DB, retrieve from DB • Expand queries based on previously retrieved articles • Word frequency analysis on bag of words • Present frequent words in relevant articles for review

  9. Text Classification with SVMs • Linear Support Vector Machines (SVMlight) • High-dimensionality enables good class separation • One-vs-all for 8 topics • Amenable to incremental learning • Label corrections by research analysts • Incoming new articles

  10. Data Visualization • Centroid algorithm for cluster-preserving dimension reduction (Kim et al. 2005): Compute a p-dimensional representation qi of an n-dimensional vector q (p << n) • Compute two centroids • C=[c1, c2] • Solve minqi||Cqi–q||2 Rating change articles Used primarily for article label validation and finding anomalies

  11. Ontology Generation • Topic patterns filter sentences • Key nouns and key verbs combined (accept*offer, agree*acqui) symmetrically • Refined after precision/recall analysis • Topic keywords are used to extract events • Key nouns, verbs themselves • Phrases are extracted “around” them

  12. Named Entity Extraction • Existence of an entity (company, organization) in a sentence indicates an event • Entities become a part of extraction rules • Sentences with at least one entity are sent to the event extractor • No anaphora resolution • Commercial and Open Source tools available • Connexor’s MEX, GATE • Ability to add custom lexicons in both

  13. Targeted Phrase Extraction (TPE) • Originates from Functional Dependency Grammar (Tapanainen et al.) • The syntax tree of a sentence has a unique root, which is the main verb of the sentence • All other verbs also are roots of subtrees “Delta announced last September that it was purchasing Western”

  14. Targeted Phrase Extraction (TPE) • Given a target string S (key noun, verb or company name) compute its subtree • If S is the main verb, output the entire parse tree (except “tmp”) • If S is a subject or an object in the sentence output the corresponding parse subtree • If S is a modifier of a subject or object, output the corresponding parse subtree

  15. Targeted Phrase Extraction (TPE) • Simple TPE rules become predicate-argument pairs: (word/concept, role) • (C-Company, SUBJ): Extract all clauses where a company name is a subject • “Company X acquired Company Y” • (C-Company, OBJ): Extract all clauses where a company name is an object • “Company X acquired Company Y” • ((C-Company, ), (“takeover”, MOD_OBJ)): Extract all clauses where a company name is present and the word takeover is an object modifier • “Company X rebuffed a takeover proposal from Company Y”

  16. Precision Recall F1 Reuters-M&A (2143) 0.70 0.67 0.68 WSJ-M&A (100) 0.78 0.66 0.72 WSJ-B (100) 0.87 0.65 0.74 WSJ-FE (100) 0.74 0.58 0.65 Experimental Results • Reuters M&A: Reuters-21578, Apte-90 split, ACQ category, • WSJ: The Wall Street Journal articles on M&A, Bankruptcy, Facility Expansions

  17. Extraction Results • First Financial Management Corp said it has offered to acquire Comdata Network Inc for $18 per share in cash and stock, or a total of about $342.7 million • Named entities: [First Financial Management Corp, Comdata Network Inc] • Delta announced last September that it was purchasing Western • Named entities: [Delta] • Nerco Inc said its oil and gas unit closed the acquisition of a 47% working interest in the Broussard oil and gas field from Davis Oil Co for about $22.5 million in cash. • Named entities: [Nerco Inc, Davis Oil Co] • Gander Mountain Inc said it acquired the privately held Western Ranchman Outfitters, a catalog and point-of-purchase retailer of western apparel based in Cheyenne, WY. • Named entities: [Gander Mountain Inc, Western Ranchman Outfitters]

  18. Using EBIG

  19. companies … … keywords … … Search Request Submitted Online Article Retrieval CEO Resigns New CEO Resigns CFO Replaced Events keywords … … Event Extraction Information Fusion Company Searching

  20. industries … … keywords … … United Airlines Jet Partners LLC Delta Skywest United Steelworker Article Retrieval Search Request Submitted Online Sony Events & Entities Company Matched Events Unmatched keywords … … Public + Private Event & Entity Extraction Name Matching & Fusion Industry Searching

  21. Event Reports

  22. Heatmap Event Visualization

  23. Conclusions • Illustrated an end-to-end business application of event extraction • Demonstrated the applicability of a multi-agent system integrating ML and NLP techniques to collection of focal events • Analyst relevance feedback will be critical in filtering content • Learning costs and benefits of news sources will improve information quality and system efficiency • Deliberative learning

  24. Q & A

More Related