NaturalGeo : Final Presentation

NaturalGeo: Final Presentation Dr Kristin Stock and Mr Javid Yousaf University of Nottingham

Project Goal To develop methods for natural language spatial querying How to map natural language expressions to queries. car parks beside the river

What kinds of expressions? • the car park beside the river • a field on the corner of the lane • the route follows the lane • the hall is on this quadrangle • the tramline at town x • the park contains trails can be used as queries generically (e.g. all car parks that are beside a river), or with a place name (the car park beside the River Trent)

Scope • Containment • Collocation (same place) • Adjacency • Alignment • Object parthood • Sidedness

Why? • Easier access to OS data products, vs. • Limited place name/postcode search; • Advanced and complex tools. • Extraction of location from text documents. • Potentially, generation of language descriptions. • Easier access = increased potential for data use.

What has already been done? • Mainly mathematical models for specific natural language terms. e.g: • Topology, fixed formal model • Some models that include context, like near (e.g. model density etc).

What does NaturalGeo add? • Takes a ‘whole of language’ approach. • Considers context.

How? • Memory/instance based learning. • Use a store of expressions whose interpretation is known. • For next expressions, find most semantically similar known expression, and use that interpretation.

How do we represent interpretations? • Geometric Configuration Ontology (GCO). • 50 types of geometric configurations between pairs of objects • Each defined with a query.

GCO profiles We can represent the meaning of a geospatial expression using a GCO profile

GCO profiles and queries • Then we can create a query based on the GCO profile. • Query composition required. • Decision between: • conjunctive inclusion (multiple concepts to represent the relation) • eliminating some relations due to weakness in selection GCOConceptx⋀ GCOConcepty⋀ GCOConceptz

How do we know what the GCO profiles are for an expression? • Questionnaire of 2000 expressions. • Users selected best diagrams, diagrams depict GCO concepts. • So we have GCO profiles for 2000 expressions. • Use some as ‘known’ expressions, the rest for evaluation.

Interpreting a new expression • In simple terms: • For expression x, we find the most similar known expression y. • We know the GCO profile for y. • GCO profile for x = GCO profile for y. • But, we may look at a the most similar group of expressions (and their GCO profiles) to try to get best results.

The big question… • How do we find the ‘most similar’ expression?

First, we parse the expression… • Identify: • Locatum(object being located) • Relatum(object used as a reference) • Verb • Preposition • Spatial adverb • Division nouns for relatum and locatum (e.g. part of) • Div noun adjective the station isright by the side of the river

Then, we compare like with like… the stationisright by the side of the river the station is locatedin the city centre using 4 comparison methods

Method 0: Baseline • similarity score = count of matching components/max number of components the station isright by the side of the river 0 0 1 0 the station is locatedin the city centre similarity score = 1/6 = 0.16667

Method 1: Word Distribution Similarity • similarity score = ∑ word distribution similarity of element pairs/max number of populated elements • cosine method the station isright by the side of the river 0.3 0.5 1 0.6 the station is locatedin the city centre similarity score = 2.4/6 = 0.4

Method 2: Ontology-based Similarity • similarity score = ∑ (1-normalised semantic distance) of element pairs/max number of populated elements • dependent on ontology structure. the station isright by the side of the river 0.3 0.5 1 0.6 the station is locatedin the city centre similarity score = 2.4/6 = 0.4

Method 3: Geolinguistic Factor Similarity • Same as method 2 for all elements except relatum and locatum. • For relatum and locatum, we determine similarity of geolinguistic factors, not of the feature types themselves. • Geolinguistic factors, factors thought to be significant in use of language • image-schemata • geometry type • liquid/solid • scale • axial structure…..

the station isright by the side of the river 0.2 0.5 1 0.6 the station is locatedin the city centre image schemata: 1 shared, 3 max = 1/3 geometry: 0 shared, 1 max axial structure: 0 shared, 1 max scale: 2 shared, 3 max = 2/3 liquid/solid: 0 shared, 1 max Total 1/5 = 0.2 c.f. street/river, could be 0.7 or 0.8

LAGO • Geolinguistic factors contained in the Linguistically Augmented Geospatial Ontology (LAGO). • Extends OS ontologies with geolinguistic factors.

Analysis (1) • Broad measures of success: • Similarity of GCO profiles for most highly matched expressions. • Most similar expressions should have most similar GCO profiles, if similarity is being measured correctly. • Using simple measures: • correlation (pearson) between our score and GCO similarity (spearman) – should be maximised (<=1) • average difference between our score and GCO similarity – should be minimised.

Analysis (2) • Which method is best? • How does the size of the kb affect results? • Which elements (relatum, locatum, verb) have the greatest impact on the results? • Which geolinguistic factors have the greatest impact on the results? • How does the success of the method vary with different spatial relations? • How transferable is the method to different spatial relations and what is required of the knowledgebase?

To do (June) • General refinements/improvements: • Parsing of expressions. • Method 1: cosine method (DISCO) returns very low numbers • Method 2: Wordnet network distance matching, WS4J methods not very good, implement our own method. • Matching of terms to LAGO, currently uses hyponyms, hypernyms and synonyms, inconsistent ordering. • Improve speed. • Improve/extend overall measures of success, currently: • spearman corrcoeff of similarity score and GCO profile correlation (pearson) (trying to maximise) • average of difference between similarity score and GCO profile correlation (trying to minimise)

And then… • Analysis. • Query composition methods. • Methods for selecting the best GCO profile (single best match, or combined multiple matches?). • Comparison with mathematical models. • Refine methods: weightings? different measures of similariy? • More geolinguistic factors. • Richer geolinguistic factor model.

Conclusions • Proof of concept so far. • Framework set up. • Now we have the opportunity to test, refine and further develop the method. • Then, the next goal: • Can we use the data we have in the kb (2000 expressions) to discover patterns and infer GCO profiles for expressions for which there are no close matches in the kb (e.g. new spatial relations etc)?

NaturalGeo : Final Presentation

NaturalGeo : Final Presentation

Presentation Transcript

Focus on the Final Rule

Final Consonant Sounds

Employment, the Final Frontier

Final Presentation

Aging Systems Workshop Fuel Tank Safety- FINAL RULE November 8, 2001

P14043-Smart Cane Senior Design Final Presentation

The ToK Presentation

Final Presentation

BU247 Managerial Accounting Final Exam-Aid Review

Association Draft Operating Budget Presentation - FY 09/10

FINAL EXAM REVIEW CHEM 1411

Planning a Presentation

Final Jeopardy

Final HPSGs Cleaning up and final aspects, semantics, overview to statistical NLP

:: Final Presentation 2-D Discrete Cosine Transform

Final Jeopardy

COS 420

Review for Final in ISQS 5343

by PJ Kulick Graduate Advisor: Dr. Shreekanth Mandayam MS Final Oral Presentation

Presentation

Get Final GST Tax Rate with GST Helpline App