1 / 20

UGFIDD

UGFIDD. Unstructured Geospatial File Indexer and Distributed Dissemination. 1. Present Scenario. Transported. User must know what to search on. Very slow. Search Criteria. Users need data in low com situations. Data. Bottle Neck. UGFIDD Overview.

keiki
Télécharger la présentation

UGFIDD

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. UGFIDD Unstructured Geospatial File Indexer and Distributed Dissemination 1

  2. Present Scenario Transported User must know what to search on Very slow Search Criteria Users need data in low com situations Data Bottle Neck

  3. UGFIDD Overview • Provide a simple to use Web Service interface • This allows for customized clients • Free text “Google” like searches • Complete un-structed data – No need for a data model to follow • Communication is done over HTTP through SOAP (Simple Object Access Protocol ) messages • Currently supports PDFs, Microsoft Docs, JPEGs • Provide usable return types • RSS Feeds – Allow users to subscribe to standing queries • KML Results – Allow users to visually represent their data spatially • Plain Text – Give users their information fast and reliably • Bittorent – Allow users to distribute data quickly and distributed

  4. Code Enviroment • Subversion • I have written a lot of code and have spent a lot of time, provides a piece of mind • All of the code written was done under version control. This is very important in today’s commercial atmosphere. • Allows for many developers to work at the same type • Assists with merge conficts • Allows easy reverts and diff’s to be done • Maven • New and upcoming build tool • Allows for easy integration and dependency management • Completely written in XML • Repositories allow for open source projects to be easily be pulled in to assist in program development

  5. Pom File (Snippet) • <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" • xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"> • <modelVersion>4.0.0</modelVersion> • <groupId>com.p2p</groupId> • <artifactId>Peer2peer</artifactId> • <packaging>war</packaging> • <version>1.0-SNAPSHOT</version> • <name>Peer2peer</name> • <url>http://maven.apache.org</url> • <properties> • <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> • <final.version>1.0</final.version> • <artifact.name>${artifactId}</artifact.name> • <java.version>1.6</java.version> • </properties> <dependency> • <groupId>junit</groupId> • <artifactId>junit</artifactId> • <version>3.8.1</version> • <scope>test</scope> • </dependency> • <dependency> • <groupId>jpath</groupId> • <artifactId>jpathwatch</artifactId> • <version>0.93</version> • </dependency> • <dependency>

  6. High Level Architecture EndPoints SoapUI Daemon WebServiceEndpoints Startup & Shutdown Extractors + Publishers Interfaces Doc / Jpeg Parser Bittorent Publisher Rss Feed / Kml Feed File Monitor Jetty Core Services Utilities Query Indexing Solr

  7. Ingest of a file Ingest Orchestration Uses Tikka document extractors to extract header information along with binary data. JPEG parser parses Geospatial data Metadata Extraction XML Metadata Ingest monitor is triggered off of system level events. File Monitor File System HTTP Schema has been customized to store location and other valuable data File Solr

  8. Publish of results Query Orchestration Parse Query Web Service Endpoint Depending on the return type and call UGFIDD will query and return customized results HTTP User enters the query “Syracuse” Files Core Services Google Earth XML Metadata Publish HTTP RSS Solr Use query to search index Torrent

  9. GeoHash (Example GeoHash.java) • GeoHash algorithm recently developed by Gustavo Niemeyer • Publicly released in 2008 • Very new way of representing geo-spatial data • UGFIDD takes advantage of the single hash produced by the algorithm • Found many implementations in other languages (Python), ported it over to Java for the UGFIDD project • Distance searches • Geohash produces bounding boxes by nature • This is a perfect fit for UGFIDD and it’s free search capability • Geospatial searches are now extremely fast and easy to implement • Do not need complicated point radius algorithms which slow processing down • WKT (Well Known Text) • A new spec to represent vector geometry on single lines • User can query using single strings and does not need to represent points as Lat, Lon

  10. Place Holder for Geohash performance

  11. WKT • POINT(6 10) • LINESTRING(3 4,10 50,20 25) • POLYGON((1 1,5 1,5 5,1 5,1 1),(2 2,2 3,3 3,3 2,2 2)) • MULTIPOINT((3.5 5.6), (4.8 10.5)) • MULTILINESTRING((3 4,10 50,20 25),(-5 -8,-10 -8,-15 -4)) MULTIPOLYGON(((1 1,5 1,5 5,1 5,1 1),(2 2,2 3,3 3,3 2,2 2)),((6 3,9 2,9 4,6 3))) • http://en.wikipedia.org/wiki/Well-known_text

  12. Bittorrent • Allows for distributed downloading • Users download .torrent files which represent the tracker and information about the file or files • Many free available clients available to use • Bittorrent takes pressure off of the central server • Users only download the .torrent file • Communicate via the tracker (UGFIDD is using a open source tracker) • Users download from each other while there is a seed • UGFIDD will always be the initial seeder • Extremely fast downloads • Users download from each other and do not tie up the bandwidth pipe going to the server • Utilizes file pieces described in the .torrent file (pieces are downloaded from each other

  13. Torrent Torrent file has been created and seeded. Others can now download the torrent file and connect to the swarm File will then be downloaded from the server as well as clients

  14. RSS Feed • Users want their information when they aren’t there • RSS Feed allows the user to set up specific query and walk away • Query will be “standing” for a configurable amount of time • Feed will be updated as the query is hit • Fast and easy to learn publish and subscribe system • Most users know how to use RSS (easy to use) • RSS page is unique to that user and query • User can however pass the URL to other users who then can subscribe to the query too • Example : A group of users is interested in “IED and Iraq”. A RSS query is set up, as products are placed into the monitor directory, that information is passed onto the user’s RSS feed

  15. Google Earth • KML (Keyhole Markup Language) • XML data that Google Earth knows how to display • Visually represent data • More and more users are using tools to see their data visually • Can see similarities (such as distance and location) • Quickly find relevant data

  16. JPEG Product displayed via published KML

  17. GeoCoder • UGFIDD utilizes geo-coder web services provided by Google • Passing in a String will result in null if nothing is found or a Lat Lon for the location • Example: • User searches for “Syracuse” • UGFIDD will return hits for documents that contain “Syracuse” and also geospatial results near Syracuse, NY • http://code.google.com/apis/maps/documentation/geocoding/

  18. Future Work • Make it faster! • Multiple SOLR implementations. Distributed data implementation • Java Executor Service allows for multi-thread workers. This has been implemented but will take time to adjust based off of system • Create a client • Currently UGFIDD is a server only implementation • Creating a client is easy with web services • Allow user to ingest files using HTTP and FTP upload • Distributed Queries • Currently only one server is queried at a time • Would like to make a middle “tracker” to distribute queries and results

  19. Demo • Server is running on my home computer with an ingest directory already set up • Will move files into ingest directory • Demonstrate query capability • Demonstrate publishing capability • Will use SOAP UI a web services test utility to demonstrate client interaction • http://www.soapui.org/ • Code is located at: http://code.google.com/p/peer2peersuny/source/browse/

  20. UGFIDD • Questions?

More Related