1 / 13

Data Mining Project for Franklin County Auditor Website

Project to extract data from Franklin County Auditor's website using iRobot web scraper, facing challenges with street name matching and extraction speed. Next goal: mapping specifications, database design, and scraping foreclosure websites.

gaille
Télécharger la présentation

Data Mining Project for Franklin County Auditor Website

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Not Even Funny[Property Details Team] Matt Dixon Architect Candace RemalyProject Manager Spencer Smith Business Analyst Bryan Linthicum Developer Adam SternfeldTester

  2. Proposed Project Timeline

  3. Tasks Completed Installed necessary software Created a prototype Researched data mining techniques Successfully extracted data from Auditor’s website

  4. Data Mining • Decided to use iRobot web scraper to extract data from Franklin County Auditor website • Obtained a list of every street in Franklin County (http://www.fceo.co.franklin.oh.us/) • iRobot uses the list as search criteria • Able to output data to XML file or database

  5. <Variables Name="" Date="2009/04/13 23:53:41"> <VariableData Name="AbstractURL">http://franklincountyoh.metacama.com/do/selectDisplay?parcelid=56022655500&select=SUMMARY&curpage=*</VariableData> <VariableData Name="parcelid">560-226555-00</VariableData> <VariableData Name="MapRoutingNumber">560-N042KKK -068-00</VariableData> <VariableData Name="location">834 MACARRAN CT</VariableData> </Variables>

  6. Problems Encountered Searching by street name can only return a maximum of 400 results (some addresses are being left out) Auditor’s website not always reliable for street name matching May take several days to extract information for all of Franklin County

  7. Determining Foreclosures Can’t use foreclosure.com … doesn’t have full addresses Most likely use Fannie Mae (homepath.com) & Freddie Mac (homesteps.com) to get as much Foreclosure information as possible

  8. Goals for Next Week Begin mapping out specifications and use cases Begin database design Refine web scraping for Franklin County Auditor’s website Create web scraper for homesteps.com and homepath.com

More Related