1 / 64

Big Data & Trusted Smart Statistics

Big Data & Trusted Smart Statistics. WG Methodology Luxembourg 8 April 2019. Mission. If we were running a long- distance express parcel service …. … to deliver reliably and quickly …. Expectation by users. weeks. days. hours. 20th century. 18th century. 21st century.

darlar
Télécharger la présentation

Big Data & Trusted Smart Statistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Big Data & Trusted Smart Statistics WG Methodology Luxembourg 8 April 2019

  2. Mission If wewererunninga long-distanceexpress parcelservice … … to deliverreliably and quickly… Expectationby users weeks days hours 20th century 18th century 21st century Operationalmodel time 2

  3. butwe are runningthe European Statistical System Mission […] to provide independent high quality statistics […] relevance, accuracy, timeliness, punctuality, accessibility, clarity, comparability, coherence Expectationby users annually every 10 years lastquarter, country level last day? per town/km²? (smart) surveys Operationalmodel adminregisters adminregisters new digital data 20th century census& surveys census 19th century 21st century time 3

  4. User expectations are changingwe need to keep their trust Newspaper article on tourism at the Belgian coast during the long Easter weekend, released one day after the weekend: • "150 000 same-day visitors on Easter Sunday, • 400 000 during the entire three-day weekend" • Monitoring by the regional tourism board, in cooperation with a mobile network operator & the road infrastructure administration. • In comparison:

  5. Scheveningen Memorandum on Big Data Examine the potential of Big Data sources for official statistics Official Statistics Big Data strategy as part of wider government strategy Address privacy and data protection Collaboration at European and global level Address need for skills Partnerships between different stakeholders (government, academics, private sector) Developments in Methodology, quality assessment and IT Adopt action plan and roadmap for the European Statistical System Heads of the National Statistical Institutes of the European Union September 2013 5

  6. ESSnet Big Data II • From 11/2018 to 12/2020, 26 months • 23 countries, 28 partners • 2+1 tracks • Implementation, • Pilots, • Smart statistics • Website & collaboration platform https://webgate.ec.europa.eu/fpfis/mwikis/essnetbigdata/index.php/Main_Page

  7. Implementation Track • Preparing Implementation of new data sources for production of official statistics • operational, methodological, quality, IT issues of previous work • resulting in functional official statistics prototypes • 4 implementation work packages: • WPB Online job vacancies • WPC Enterprise characteristics • WPD Smart Energy • WPE Tracking ships • Cross-cutting work package: • WPF Process and architecture

  8. Pilots Track • New research into the possibilities of various big data sources for producing official statistics, through pilot studies … … resulting in experimental statistics • 4 pilots • WPG Financial transactions data • WPH Earth observation • WPI Mobile networks data • WPJ Innovative tourism statistics • Cross-cutting work package: • WPK Methodology and quality

  9. Preparing Smart Statistics Proofs of Concepts for selected areas

  10. State of play https://webgate.ec.europa.eu/fpfis/mwikis/essnetbigdata/ • Background information On Essnet Big Data, tracks and workpackages, course of the project, … • Tools and resources GitHub, BDTI, documentation, big data events and links, templates, … • Project meetings https://webgate.ec.europa.eu/fpfis/mwikis/essnetbigdata/index.php/Project_meetings • First outputs • Working areas: https://webgate.ec.europa.eu/fpfis/mwikis/essnetbigdata/index.php/ESSnet_Big_Data_working_areas • Results: milestones and deliverables & experimental statistics: https://webgate.ec.europa.eu/fpfis/mwikis/essnetbigdata/index.php/ESSnet_Big_Data_results

  11. Identify and produce statistical estimates and experimental statistics Develop and test the methodology, recommendations, specifications and statistical software Definition of the conceptual production processes at national level and at the level of the ESS Use of common infrastructure Webscraping / Job Vacancies 11

  12. Models of integrating the online job vacancies and the survey data Time-series now-casting using OJV and JVS data Linking job openings and vacancies by relevant models Distribution of the data over regions, occupational, industry sector categories New statistics on aspects of the labour market, which are not included in current statistics (international jobs) Matching between occupations, competences (education) Linkage to the Business registers for other important variables Occupation classification coding frames maintenance Potential Statistical Outputs

  13. Statistical outputs

  14. Implement webscraping, text mining and inference techniques to collect, process and improve general information about enterprises: • E-commerce • Job advertisements • Social media presence • URL retrieval • profiling information: • type of activity, • links with other enterprises, etc. • Experimental statistics Webscraping / Enterprise characteristics 15

  15. Enterprise characteristics (webscraping) Rate of enterprises engaged in web sales on the website Website (web ordering) Web sales apps E-sales E-marketplace E-commerce EDI automated E-purchases

  16. Smart Meters Electricity consumption / identifying energy consumption patterns • Develop procedures, technical solutions • Collection, processing and analysis • Develop functional production prototypes • Outputs • Develop methodology for geo-spatial linking and define common quality measurements • Develop standard visual outputs for electricity data. • Validate the methods selected during the ESSNet big data I • define requirements for classifying households as vacant or seasonally vacant.

  17. Smart Electricity MetersAverage electricity consumption of households in 2015 (smart meters) • Verify census housing statistics • Estimate household costs • Enhance • environmental statistics • tourism statistics and • regional statistics • Identify vacant or seasonally vacant dwellings, … https://webgate.ec.europa.eu/fpfis/mwikis/essnetbigdata/index.php/WP3_Report_1#Users_of_energy_statistics

  18. AutomatedIdentification System Data Investigate whether real-time measurement data of ship positions (via AIS system) can be used • to improve the quality and internal comparability of existing statistics and • for new statistical products relevant for the ESS. • Validation of port visits • Compile air emissions and energy used from international shipping and fishing • Inland waterways statistics • Process of transport and activity fishing fleet Expansion of maritime statistics: sea routes and cargo

  19. ESSnet: AIS data Port visits

  20. Automatic Vessel Identification Traffic Density

  21. Trans-Shipment

  22. ProcessandArchitecture • Definition of reference architectures necessary to carry out big data production National levels European level • Architecture is the fundamental organization of a system • embodied in its components • their relationships to each other, • Their relationship to the environment, • the principle guiding its design and evolution [ISO/IEC 42010 Systems and software engineering - Architecture description]

  23. Mobile Networks Data Hourglass Model: Reference Methodological Framework

  24. Mobile Networks Data: Hourglassmodel Multiple data sources: MNO#1, MONO#2 Multiple data types: CDR, signaling, …

  25. Earth Observation

  26. Innovative TourismStatistics • Development of a conceptual framework and setting up a smart pilot Tourism Information System • Support statistical production in the field of tourism by integrating various big data sources with administrative registers and statistical databases using innovative statistical methods • Webscraping of internet portals • Administrative data • Car traffic, water consumption, waste production, parking meters, … • Methodology for combining and disaggregating data • Flash estimates • Modification of tourism satellite accounts

  27. Trusted Smart Statistics: What? Why?

  28. The new datafied world me and mysmartdevices myapp provider The cyber world isnativelydigitial. And the physical world isbeingincreasinglydigitized (IoT, Smart Devices…) “Anythingthatgoesdigital, getslogged” (somewehere, by somebody) 1° fundamental law of datafication digitalization datafication Individuals, organizations, places…become “data fountains” More and more business companies become “data buckets” myenergyprovider my mobile phone operator

  29. Data and new data “micro-data” • Featuresabout the individual • changingslowlyor rarely • recordedatcoarsetemporalaggregation (months, years). Name. Gender. Birth date. Marital Status. Residence address. Occupation. Householdcomposition… Monthlyincome. Monthlyexpendituresper goodcategory. Number of touristictrips in a year. • Featuresabout single events, transactions highly pervasive,sub-individuallevel • changingcontinuously • recordedat fine temporalaggregation(minutes, seconds) “nano-data” … Your exact location, everysecond. Every single heart-beat, blood pressure… Every single transaction, purchases, encounter, eventinvolvingyou… Your current opinion on any single fact…

  30. Data and new data “micro-data” Name. Gender. Birth date. “Shallow data” Marital Status. Residence address. Occupation. Householdcomposition… … Monthlyincome. Monthlyexpendituresper goodcategory. Number of touristictrips in a year … “nano-data” … Your exact location, everysecond. Every single heart-beat, blood pressure… Every single transaction, purchases, encounter, eventinvolvingyou… Your current opinion on any single fact… “Deep data”

  31. OfficialStatistics macro-data (statistics) micro-data (abutindividual) • OfficialStatisticsaims to produce macro-data (statistics) from input micro-data • Collection of micro-data asancillarytask

  32. OfficialStatistics. Augmented. Additionalstatisticalproducts: more dimensions, bettertimeliness, finerspatio/temporalgranularity, … macro-data (statistics) Additional micro-data, possiblyderived from nano-data micro-data (aboutindividual) Additionalprocesses nano-data (sub-individual) Additional Input Data Sources Availability of new data sourcesasopportunityto extend & empowerOfficialStatistics

  33. From fountains or from buckets? Both. online platforms energycompany carmaker smart home smart car B2G channel Business(Bucket?)-to-Governmentaccess to privately-held data private-public partnerships… smartphone smartwatch Statistical Office C2G channel Citizens-to-GovernmentCrowdsourcing, Smart Surveys Citizen Statistics!

  34. OfficialStatisticsbased on survey data and administrative data and now Big Data society, economy policy, media, research Public sector SO collection processing SO: Statistical Office

  35. OfficialStatisticsbased on survey data and administrative dataand now Big Data society, economy policy, media, research Public sector SO collection processing processing SO: Statistical Office

  36. OfficialStatisticsbased on survey data and administrative dataand nowBig Data society, economy policy, media, research Public sector SO collection processing processing Private sector(business and citizens)

  37. Handling the new in the old wayPull data in Thisisnotfeasible. Technical scalability, organisational, legal (riskconcentration), … society, economy policy, media, research Public sector SO collection processing “Shallow data” micro-data processing x processing processing processing processing Deep data nano-data Private sector

  38. Handlethe new in new waysPushcomputation out (partially) society, economy policy, media, research Public sector SO collection processing processing processing processing processing processing Trusted Smart Statistics processing processing processing processing processing processing processing processing Private sector

  39. Trusted Smart Statistics Smart Statiscsas an opportunity to deliver more advancedstatisticalproducts, more timely (nowcasting), more targeted to specificusergroups, throughnovel reporting and presentation ways … Smart: externalizationtowardsdata sourcesof the (intial) part of processing execution Leveraging the “smart” features of the data sources (often Smart Systems, Smart Objects) and other “smarttechnologies” (e.g., Smart Contracts). SO Trusted: ensure an articulated set of trust guarantees to allplayers(SO as “taker” and “giver” of trust guarantees) processing processing processing processing Guarantee that data are processed for the agreed purpose, by the agreed method, respect of user privacy & business confidentiality, compliance with legal provisions … Trusted Smart Statistics processing processing processing processing processing processing processing processing Private sector(business and citizens)

  40. Sources Pulling data in SHARING DATA Statistical System

  41. Sources Pulling data in SHARING DATA SHARING COMPUTATION Pushing computation out Statistical System Statistical System

  42. Towards a Reference Architecture for Trusted Smart Statistics Design Principles Reference Architecture Specifications Work-in-progress atEurostat in coordination with ESSEuropean Statistical System in dialogue with otherstakeholders • Private Data Holders • Researchers, Academiccommunities • Data ProtectionAuthorities • otherarms of EuropeanCommission • National and Local authorities • … Implementation …

  43. Certification Authority? Design principle #1 CA Data Holders Statistical Office consensus SO source code approved byall parties DH-1 DH-2 Processing method(algorithm) transparent to allinvolved parties • Co-designed or atleastagreed-upon (consensus-based design) • Involve alsoCertification Authority for privacy/ethicalcompliance? • Data are not “moved to/shared with”, butonly “used by” the Statistical Office – goal is the output, not the input! • Adopttechnologies for Secure Private Computing technologies, e.g., SecureMulty-Party Computation • Engage and partner with the input parties • Incentivesmightinvolve “giving back” computationoutput to them • Agreementfor data usagebound to computationinstance. • Technologicalmeansguaranteethat data cannot be used for otherquery/purposeotherthan the agreedone(s) • Purpose and algorithmsopen for public scrutiny • public transparency public trust

  44. Design principle #2 Data are not “moved to/shared with”, butonly “used by” the Statistical Office – goal is the output, not the input! • Adopttechnologies for Secure Private Computing, e.g., SecureMulty-Party Computation or TrustedExecution Environment

  45. consensus 1 source code approved byall parties CA 2 non-personalintermediate data exported to SO […] SO SO officialstatistics confidential input data secret shares DH-2 DH-1 DH-1 DH-2 authenticatedbinary code executed in securehardwar

  46. Secure Multi-Party Computation (SMPC) infrastructure A hardware+software+humanwareinfrastructure (technologicalcomponents + organizationalprovisions) to let the output information be extracted withoutexchanging the input data computation output(non-personal) confidential input data secret shares SMPC computation

  47. B2G scenario with multiple DHs […] SO officialstatistics confidential input data secret shares DH-1 DH-2 authenticatedbinary code executed in securehardwar

  48. BG2G scenario: SO providing input data confidential input data […] SO SO officialstatistics confidential input data secret shares DH-1 DH-2 authenticatedbinary code executed in securehardwar

  49. B2G2B scenario: giving back to the private sector! B&G Partnership model? confidential input data non-persona ldataexported to SO […] SO SO officialstatistics confidential input data secret shares Returning some output analyticsproduct to the private sectorfor legitimate business purposes (with certification), might facilitate partnership modelsbetween Statistical Offices and private Data Holders non-personal data exportedfor commercial purpose […] DH-2 privatecompany DH-1 authenticatedbinary code executed in securehardwar commercial analytics

More Related