120 likes | 223 Vues
The project aims to enhance an existing natural language parser by improving user interaction and identifying context/metadata relations between documents in the legal domain. Challenges include tuning behavior, identifying problematic queries, and constructing meaningful relations between documents based on metadata. Proposed steps involve running queries, assessing precision and recall, and designing algorithms for document relations and user alert services.
E N D
Assignments • Improving an existing Natural Language Parser • Identify context/metadata relation between documents in legal domain • User Alert Service
Improving an existing NL parser What do we have? • A natural language parser, which does • Spelling checker • Term identification • Search term checking • Syntactic analysis • Semantic expansion • Query base generation
Improving an existing NL parser What is the problem? • The behavior needs to be tuned, based upon the results of a pre-defined set of queries on a pre-defined document collection
Improving an existing NL parser Assignment • Identifying problematic queries or query types (Requires at least one Dutch native speaker in the team) • Type of problems: • Low precision • Low recall • Misunderstood queries • E.g., does ‘information on cars and traffic jams’ mean that only documents containing both ‘cars’ and ‘traffic jams’ should be found or both documents containing either ‘cars’ or ‘traffic jams’ • …
Improving an existing NL parser Proposed steps • Define a set of natural language queries, including expected query results • define the Boolean query to be executed • list the documents to be found by the query • Run queries through the on-line website • Identify mismatch area’s
Improving an existing NL parser Proposed steps • Define a set of natural language queries, including expected query results • Run queries through the on-line website • Perform the queries on the information portal • Identify mismatch area’s
Improving an existing NL parser Proposed steps • Define a set of natural language queries, including expected query results • Run queries through the on-line website • Identify mismatch area’s • keep track of precision and recall percentages • if these percentages are low, what is wrong with the Boolean query • how could this be improved
Identify context/metadata relation between documents in legal domain What do we have? • Document collections, which • Contain metadata • Has contextual relations between documents What is the problem? • How to construct relations between documents based on existing metadata?
Identify context/metadata relation between documents in legal domain Assignment • Design and implement the algorithm to identify the possible relation between documents, including a link certainty indicator (Requires at least one Dutch native speaker in the team) Basis information • An ideal document collection relevant to one starting document • A mixed ideal and ‘noise’ documents collection • Description of metadata that can be used to identify the relation
Identify context/metadata relation between documents in legal domain Proposed Steps • Study description and the document collections delivered. • Design the algorithm to solve the problem • Implement the algorithm in a simple programming module • Analyze possible problems due to ambiguity in the context/metadata
User Alert Service What do we have? • Document collections, which • Contain metadata • Has contextual relations between documents • Structure of the context/metadata relation (assignment 2) What is the problem? • How to construct the user alert service based on the user profile information and the resulted document relation?
User Alert Service Assignment • Design and implement the user profile structure and the user alert service Proposed Steps • Design the user profile structure • Design the algorithm for the user alert service • Implement the algorithm in simple programming module • Analyze complexity level in the user profile structure and the consequence to the user alert service