120 likes | 220 Vues
Assignments. Improving an existing Natural Language Parser Identify context/metadata relation between documents in legal domain User Alert Service. Improving an existing NL parser. What do we have? A natural language parser, which does Spelling checker Term identification
E N D
Assignments • Improving an existing Natural Language Parser • Identify context/metadata relation between documents in legal domain • User Alert Service
Improving an existing NL parser What do we have? • A natural language parser, which does • Spelling checker • Term identification • Search term checking • Syntactic analysis • Semantic expansion • Query base generation
Improving an existing NL parser What is the problem? • The behavior needs to be tuned, based upon the results of a pre-defined set of queries on a pre-defined document collection
Improving an existing NL parser Assignment • Identifying problematic queries or query types (Requires at least one Dutch native speaker in the team) • Type of problems: • Low precision • Low recall • Misunderstood queries • E.g., does ‘information on cars and traffic jams’ mean that only documents containing both ‘cars’ and ‘traffic jams’ should be found or both documents containing either ‘cars’ or ‘traffic jams’ • …
Improving an existing NL parser Proposed steps • Define a set of natural language queries, including expected query results • define the Boolean query to be executed • list the documents to be found by the query • Run queries through the on-line website • Identify mismatch area’s
Improving an existing NL parser Proposed steps • Define a set of natural language queries, including expected query results • Run queries through the on-line website • Perform the queries on the information portal • Identify mismatch area’s
Improving an existing NL parser Proposed steps • Define a set of natural language queries, including expected query results • Run queries through the on-line website • Identify mismatch area’s • keep track of precision and recall percentages • if these percentages are low, what is wrong with the Boolean query • how could this be improved
Identify context/metadata relation between documents in legal domain What do we have? • Document collections, which • Contain metadata • Has contextual relations between documents What is the problem? • How to construct relations between documents based on existing metadata?
Identify context/metadata relation between documents in legal domain Assignment • Design and implement the algorithm to identify the possible relation between documents, including a link certainty indicator (Requires at least one Dutch native speaker in the team) Basis information • An ideal document collection relevant to one starting document • A mixed ideal and ‘noise’ documents collection • Description of metadata that can be used to identify the relation
Identify context/metadata relation between documents in legal domain Proposed Steps • Study description and the document collections delivered. • Design the algorithm to solve the problem • Implement the algorithm in a simple programming module • Analyze possible problems due to ambiguity in the context/metadata
User Alert Service What do we have? • Document collections, which • Contain metadata • Has contextual relations between documents • Structure of the context/metadata relation (assignment 2) What is the problem? • How to construct the user alert service based on the user profile information and the resulted document relation?
User Alert Service Assignment • Design and implement the user profile structure and the user alert service Proposed Steps • Design the user profile structure • Design the algorithm for the user alert service • Implement the algorithm in simple programming module • Analyze complexity level in the user profile structure and the consequence to the user alert service