1 / 13

Postgresql

Postgresql. …and postgis & full text search & f uzzy comparisons. Postgis. For manipulating 2D/3D spatial data Points, lines, and polygons formed from points and lines Can perform union, intersection, operations Can project shapes into 2D areas Has a 3D geometry type (relatively new)

maili
Télécharger la présentation

Postgresql

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Postgresql …and postgis & full text search & fuzzy comparisons

  2. Postgis • For manipulating 2D/3D spatial data • Points, lines, and polygons formed from points and lines • Can perform union, intersection, operations • Can project shapes into 2D areas • Has a 3D geometry type (relatively new) • Can calculate accurate distances in meters • Works with an open source server that allows folks to share geospatial data • Command line interface • Also supports some forms of raster data • Provides spatial indices • Has a notion of a geometric column

  3. Queries SELECT superhero.name FROM city, superhero WHEREST_Contains(city.geom, superhero.geom) and city.name = 'Gotham'; SELECTAsBinary(the_geom) as wkb_geometryFROMriver AS r, state AS s WHERE intersects(r.the_geom, s.the_geom)

  4. Mapnik • Used for OSM (open street map) data and uses postgis • Mapnik is an open source system for rendering maps • Used to design maps • Written in C++ • It renders maps from postgis databases

  5. Next: full text and approximate text search • But first, not to be confused with the Like operator • Used % as the wild card • Or with regular expressions for character string comparison

  6. Full text search • First, you index the words in a document and create an array of lexemes • Second, specify a boolean phrase using and, or, not, and parens • We typically don’t index “stop” words like and, or, the, etc. • Dictionaries are used to find roots of related words, like dead and dying • Thesauruses dictionaries are used to for recognition of domain-specific and similar words

  7. documents • A document is a text attribute in a row of a table • Often we use part of a document or concatenate various parts of documents

  8. Details: dictionaries • Define stop words that should not be indexed • Map synonyms to a single word. • Map phrases to a single word using a thesaurus. • Map different variations of a word to a canonical form

  9. Searching • Uses a match operator - @@ • Basic search consists of asking about the relationship to a vector of words to a given document, which is also a vector • The vector can have and, or, etc. in it • tsvector – document – normalized lexemes • tsquery – query

  10. Examples SELECT title FROM pgweb WHERE to_tsvector(title || ' ' || body) @@ to_tsquery('create & table') ORDER BY last_mod_date DESC LIMIT 10; select the ten most recent documents that contain create and table in the title or body Results can be ranked

  11. Recent addition: fuzziness • soundex(text) returns text • Converts a string to its Soundexcode • Based on pronunciation • difference(text, text) returns int • converts two strings to their Soundex codes and then reports the number of matching code positions • 0 is a no match • 4 is a full match • Def: A phonetic coding system intended to suppress spelling variation and determining the relationship between two (similar) words

  12. Levenshtein • Levenshtein distance is a metric for evaluating the difference between two sequences, in particular, words • E.g.: test=# SELECT levenshtein('GUMBO', 'GAMBOL'); • E.g.: SELECT * FROM some_table WHERE levenshtein(code, 'AB123-lHdfj') <= 3 ORDER BY levenshtein(code, 'AB123-lHdfj') LIMIT 10 • Used in particular, to detect nicknames

  13. Metaphone • E.g., metaphone(text source, intmax_output_length) returns text • Similar to soundex • Used to classify words according to their english pronunciation • Apparently better for non-english languages, compared to soundex • E.g.: SELECT * FROM users WHERE METAPHONE(users.first_name, 2) = METAPHONE('Willem', 2) should detect similarity to word William

More Related