1 / 17

Olav ten Bosch MSIS, Dublin, 14-16 April 2014

On the use of internet robots for official statistics. Olav ten Bosch MSIS, Dublin, 14-16 April 2014. Overview. Why internet as a data source (IAD)? Internet robots, how do they work ? Applications: Airline tickets Housing market Clothing “Robot assisted data collection”

royal
Télécharger la présentation

Olav ten Bosch MSIS, Dublin, 14-16 April 2014

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On the use of internet robots for official statistics Olav ten Bosch MSIS, Dublin, 14-16 April 2014

  2. Overview • Why internet as a data source (IAD)? • Internet robots, how do theywork? • Applications: • Airline tickets • Housing market • Clothing • “Robot assisted data collection” • Conclusion

  3. Why IAD? (1) Internet sources Faster, better, more efficient New indicators Less!!! Administrative sources Tax, social security services Municipalities/ Provinces Supermarkets Surveys

  4. Why IAD? (2) Internet sources Which content is original, reliable, stable, representative and accessible? Internet prices for CPI ? Real estate sites for housing statistics ? Internet vacancies for job statistics ? Social media sentiment for consumer confidence ? Trade in second-hand goods as economic indicators ? Travel activity for tourism statistics ?

  5. Robots / crawlers / bots / spiders / scrapers: how do theywork? (1) Internet Requests Graphical markup Website Commands code, images, style, data, etc. Browser You

  6. Robots / crawlers / bots / spiders / scrapers: how do theywork? (2) Navigation Internet Requests Website code, images, style, data, etc. Robot/ spider/ crawler You Data

  7. Robots / crawlers / bots / spiders / scrapers: how do theywork? (3) Generic software for: - site navigation - product details - monitoring Navigation Agile Internet Requests Website code, images, style, data, etc. Robot/ spider/ crawler Monitor actively Data Data Data Data Data

  8. Airline tickets (1)Robot collection versus manual collection

  9. Airline tickets (2)Price of a ticket over time

  10. Housing Market (1)

  11. Housing market (2)Dynamics of the ‘database behind’ becomesvisible

  12. Clothing (1):

  13. Clothing (2): 2 sites: veryvolatile data • Challenges: • from volatile data to stable statistics • how to classify multiple less structured • data sources Seasonal pattern

  14. Robot-assisted data collection (1) • Use case: few priceobservations on many sites • Example: price of a cinema ticket • “Robot tool” toautomatically check ifprices are changed

  15. Robot-assisted data collection (2)

  16. Conclusion • Using internet as a datasource we can measure statistical phenomena in a completely different way • It is powerful to combine fast internet data with reliable (but slower) administrative data • We should redesign statistics with the possibilities of internet data in mind Challenges: • Legal framework • The internet changes continuously: howto turn volatile data sources intoreliablestatistics? • We needadvancedstatisticalmethods, processesand IT

More Related