460 likes | 735 Vues
L.A.S.I. Linguistic Analysis for Subject Identification. Feasibility Presentation Presented by: CS410 Red Group. November 12, 2012. Outline. Team Red Staff Chart Introduction Societal Problem Case Study Proposed Solution Major Component Diagram Algorithm The Competition Risk
E N D
L.A.S.I. Linguistic Analysis for Subject Identification Feasibility Presentation Presented by:CS410 Red Group November 12, 2012
November 12, 2012 Outline • Team Red Staff Chart • Introduction • Societal Problem • Case Study • Proposed Solution • Major Component Diagram • Algorithm • The Competition • Risk • Conclusion
410 Red Group November 12, 2012 Team Red Staff Chart Scott Minter Project Co Leader Software Specialist Brittany Johnson Project Co Leader Documentation Specialist Dustin Patrick Algorithm Specialist Expert Liaison Richard Owens Documentation Specialist Communication Specialist Erik Rogers Marketing Specialist GUI Developer Aluan Haddad Algorithm Specialist Software Specialist
410 Red Group November 12, 2012 What is a theme?
410 Red Group November 12, 2012 A specific and distinctive quality, characteristic, or concern.1 1“Theme” Merriam Webster
410 Red Group November 12, 2012 What are you looking for when you are identifying a theme?
410 Red Group November 12, 2012 5 W’s & 1 H • Who • What • When • Where • Why • How
410 Red Group November 12, 2012 Bill’s stove was broken. He has been saying for months that he would go to the appliance store to buy a new one. He had some free time yesterday, so he drove to the store to buy a new stove.
410 Red Group November 12, 2012
410 Red Group November 12, 2012 The Theme from the 5 W’s & 1 H Bill drove to the store yesterday to buy a new stove because his broke.
410 Red Group November 12, 2012 Why are themes important? • Comprehension • Summarization • Assists in communication between people
410 Red Group November 12, 2012 Societal Problem It is difficult for people to identify a common theme over a large set of documents in a timely, consistent, and objective manner.
410 Red Group November 12, 2012 How long does it take? • Finding a theme over multiple documents is a time-consuming process. • The average reading speed of an adult is 250 words per minute.2 2Thomas "What Is the Average Reading Speed and the Best Rate of Reading?"
410 Red Group November 12, 2012 Consistency and Objectivity • The criteria for evaluation may vary from person to person. • Large quantities of documents must be mentally digested, assessed, and interrelated.
410 Red Group November 12, 2012 Dr. Patrick Hester “My research interests include multi-objective decision making under uncertainty, probabilistic and non probabilistic uncertainty analysis, critical infrastructure protection, and decision making using modeling and simulation.” 3 - Dr. Hester Ph. D. from Vanderbilt University, 2007 Major: Risk and Reliability Engineering and Management 3Patrick Hester Website
410 Red Group November 12, 2012 • Dr. Hester is a systems analyst and researcher • He Must • Conduct extensive research • Quickly become familiar with client systems • Formulate concise, objective assessments • LASI will help with all of this
410 Red Group November 12, 2012 Assessment Improvement Design (A.I.D.) • Preliminary Problem statement Identified from document • Problem statement then used to find Critical Operational Issues (COI’s) • COIs used to find Measures of Effectiveness (MOE’s) • MOE’s used to find Measures of Performance (MOP’s)
410 Red Group November 12, 2012 Current Method Continue on to the rest of the A.I.D Process Customer Contact yes Is Customer satisfied? Situational Awareness Meeting Problem Statement Presentation no Will NCSOSE be needed? yes Document Gathering Process Document Analysis no Client Goes Elsewhere
410 Red Group November 12, 2012 LASI: Linguistic Analysis for Subject Identification THEMES LASI
410 Red Group November 12, 2012 Our Proposed Solution • LASI is a linguistic analysis decision support tool used to help determine a common theme across multiple documents. It is our goal with LASI to: • accurately find themes • be system efficient • provide consistent results
410 Red Group November 12, 2012 What do we mean by “linguistic analysis”? The contextual study of written works and how the words combine to form an overall meaning.
410 Red Group November 12, 2012 Linguistic analysis involves Syntactic Semantic • Logical grammar • Statistical Data • Alphabetical Frequencies • Word Counts • Parts of Speech • Word Dependencies • Relating syntactic structures to language-independent meanings • Extracting meaning and conceptional arguments • Summarization
410 Red Group November 12, 2012 The Wills and Will Nots of LASI What LASI Will Do What LASI WillNot Do • Analyze multiple documents to find common themes • Provide statistical data to help a user make a decision • Provide a concise synopsis • Provide a single theme
410 Red Group November 12, 2012 Who Would This Appeal To? • Researchers • Consultants • Academics • Students
410 Red Group November 12, 2012 Benefits To The Customer • Time saving • Objective output • Consistent output • Cost saving solution
410 Red Group November 12, 2012 How does LASI fit into our Case Study?
410 Red Group November 12, 2012 Before LASI Customer Contact Continue on to the rest of the A.I.D Process yes Is the Customer satisfied? Situational Awareness Meeting Problem Statement Presentation no Will NCSOSE be needed? yes Document Gathering Process Document Analysis no Client Goes Elsewhere
410 Red Group November 12, 2012 After LASI Customer Contact Continue on to the rest of the A.I.D Process yes Is the Customer satisfied? Situational Awareness Meeting Problem Statement Presentation no Will NCSOSE be needed? yes Document Gathering Process LASI Aided Document Analysis no Client Goes Elsewhere
410 Red Group November 12, 2012 Major Functional Components Hardware Software Algorithm: Extrapolates the most likely congruence of themes and ideas across all documents in the input domain • High End Notebook PC • - Computation • Quad-Core CPU • - Primary Memory • 8.0 GB DDR3 RAM • - Document Storage • Solid State Storage • ~$1500 USD User Interface: - Multi-Level Views - Weighted Phrase List - Detailed Breakdown - Step by Step Justification
410 Red Group November 12, 2012 Linguistic Analysis Algorithm Primary Analysis: Word Count and Syntactic Assessment Tertiary Analysis: Semantic Relationship Assessment Secondary Analysis: Associative Identification Traverse Document in Word-Wise Manner Bind Pronouns to Nouns, Updating Frequency Identify Potential Synonyms Assess Potential Subject-Object-Verb Relationships Identify Corresponding Parts of Speech Bind Adjectives to Nouns Output List of Weighted Themes Determine Frequency by Grammatical Role Identify Potential Noun Phrases
November 12, 2012 Milestone diagram
410 Red Group November 12, 2012 The Competition
410 Red Group November 12, 2012 The Competition
410 Red Group November 12, 2012 WordStat
410 Red Group November 12, 2012 Stanford CoreNLP
410 Red Group November 12, 2012 ReadMe
410 Red Group November 12, 2012 Automap
410 Red Group November 12, 2012 Risk Matrix Customer Risks C1 -- Product Interest C2 -- Maintenance C3 -- Trust Technical Risks T1 -- System Limitations T2 -- Scanned Text Recognition T3 -- Jargon Recognition T4 – Illegal Character Handling
410 Red Group November 12, 2012 Customer Risks C1. Product Interest Probability 2 Impact 4 Mitigation: LASI offers unique functionality and user friendliness. C2. Maintenance Probability 3 Impact 2 Mitigation: LASI will be a free, open source application allowing the community to maintain and extend it over time. C3. Trust Probability 3Impact 3 Mitigation: LASI will provide a step by step breakdown of output analysis and algorithm reasoning
410 Red Group November 12, 2012 Technical Risks T1. System Limitations Probability 4 Impact 2 Mitigation: LASI will be designed from the ground up in native C++ for memory and CPU efficient code. T2. Scanned Text Recognition Probability 4 Impact 3 Mitigation: LASI will implement an optical character recognition algorithm to handle scanned text
410 Red Group November 12, 2012 Technical Risks T3. Jargon Recognition Probability 3 Impact 2 Mitigation: LASI will have domain specific dictionaries and feature intuitive contextual inference. T4. Illegal Character Handling Probability 4 Impact 2 Mitigation: LASI will providers contextual inference, synonym recognition and statistical methods
410 Red Group November 12, 2012 • Conclusion • LASI is feasible. • LASI is a decision support tool not a decision making tool. • Implications of success affect a wide area of study and professions. • In order for LASI to succeed the output needs to immediately usable and the interface user-friendly.
410 Red Group November 12, 2012 References • "Theme." Def. 1b. Merriam Webster. N.p., n.d. Web. 19 Oct. 2012. <http://www.merriam-webster.com/dictionary/theme >. • Thomas, Mark. "What Is the Average Reading Speed and the Best Rate of Reading?" What Is the Average Reading Speed and the Best Rate of Reading? Web. 19 Oct. 2012. <http://www.healthguidance.org/entry/13263/1/What-Is-the-Average- Reading-Speed-and-the-Best-Rate-of-Reading.html>. • “Patrick Hester" Old Dominion University. N.p., n.d. Web. 24 Sept. 2012 <http://www.odu.edu/directory/people/p/pthester>. Stanislaw Osinski, Dawid Weiss. 13 August, 2012 . Carrot 2. 9/25/2012 <http://project.carrot2.org>. ”WordStat” Provalis Research. Web. 24 Sept. 2012. <http://provalisresearch.com/products/content-analysis-software/>. “ReadMe: Software for Automated Content Analysis” Web. 24 Sept. 2012. <http://gking.harvard.edu/node/4520/rbuild_documentation/readme.pdf> "AlchemyAPI Overview." AlchemyAPI. N.p., n.d. Web. 19 Oct. 2012. <http://www.alchemyapi.com/api/>. "AutoMap:." Project. N.p., n.d. Web. 19 Oct. 2012. <http://www.casos.cs.cmu.edu/projects/automap/>. "CL Research Home Page." CL Research Home Page. N.p., n.d. Web. 19 Oct. 2012. <http://www.clres.com/>.