1 / 48

WORKSHOP ON SCANNER DATA Geneva 10 May 2010

WORKSHOP ON SCANNER DATA Geneva 10 May 2010. Joint presentation by Ragnhild Nygaard (Statistics Norway) and Heymerik van der Grient (Statistics Netherlands). Historical overview – NL Supermarkets. Mid 90s: first contacts with chain(s) 2002: first implementation: 1/2 chain(s)

wardl
Télécharger la présentation

WORKSHOP ON SCANNER DATA Geneva 10 May 2010

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. WORKSHOP ON SCANNER DATAGeneva10 May 2010 Joint presentation by Ragnhild Nygaard (Statistics Norway) and Heymerik van der Grient (Statistics Netherlands)

  2. Historical overview – NL Supermarkets • Mid 90s: first contacts with chain(s) • 2002: first implementation: 1/2 chain(s) • Yearly Laspeyres (labour intensive) • Construction of yearly basket of items • Manual linking of items to COICOP-groups • Manual replacement of disappearing items • Reduction of ca 10 000 monthly price quotes in field survey

  3. Historical overview – NL, cont Supermarkets • 2010: extension: 6 chains • Monthly chained Jevons (efficient process) • No manual linking of items • No explicit replacements • Extra reduction of ca 5 000 monthly price quotes in field survey

  4. Historical overview – N • 1997: first contact with one chain • Gradually contact with more chains • Implementation in the CPI • only price information of specific representative items • 2002: scanner data from all the chains (no questionnaires - big incentive) • Aug 2005: expanded use for COICOP 01 • price and quantity information for all items in representative outlets

  5. Questions to be answered when dealing with scanner data • How/Where require scanner data? • Which statistical method? • How to link items to COICOP? • How to deal with all kind of particularities in data? • Development of new computer system?

  6. Source of scanner data • Market research companies • Cleaned data • (very) expensive • Two-stage delivery chain (timeliness) • Companies/Chains • Raw data • Cheap (NL/N do not pay) • Direct contact with original supplier

  7. Negotiations with companies • Time consuming process • Negotiations can take up to a year or more including meetings, sending test data, analysing data etc. • Be aware of some company establishing costs e.g. preparing the data extractions • Can company provide what you want/need? • E.g. information to link items to COICOP automatically

  8. Negotiations with companies, cont. • Focus on advantages for companies • Minor costs once established (just a copy of their sales administration) • No questionnaires or monthly visits of price collectors • Other incentives for companies? • Money – not likely • Information • E.g. company price development compared to overall price development

  9. Negotiations with companies, cont. • Establishing good routines with the companies are essential • Strict time schedules • No changes in formats when implemented

  10. Pre - production work • Take your time analyzing the data • Enormous amount of data • N: Over 300 000 price observations each month divided into about 14 000 items • Build shadow system (prototype) • Compare the new price indexes based on scanner data with the old method for a certain period of time before implementation • Discover possible problems in advance • Unexpected situations will arise for sure

  11. Pre - production work • Ideas for analysing the data: • Is same EAN always same item? • Extreme price changes • Specific price development at beginning or end life cycle EAN structurally • Risk of bias! • All kind of dynamics in data • Missing prices • Do properties of data change over time • Etc

  12. Methodology / IT-system • Find methodology that: • Delivers good indexes (e.g. no bias) • Can deal with all particularities in data • Build IT-system that supports the chosen methodology • Learn from experiences other countries using scanner data

  13. Properties of dataConsequences for methodology NL and N • High attrition rate of items

  14. Properties of data, cont.Consequences for methodology NL and N • How to deal with high attrition rate of items • NL : monthly chained index • N : monthly chained index

  15. Properties of data, cont.Consequences for methodology NL and N Sales: low prices combined with enormous increase in quantities sold

  16. Properties of data, cont.Consequences for methodology NL and N • Consequences of sales: • Single observations can have extremely high influence on elementary index • Risk of bias applying monthly chaining and explicit weights

  17. Properties of data, cont.Consequences for methodology NL and N • Bias not just theoretically! • Example for detergents

  18. Properties of data, cont.Consequences for methodology NL and N • How to deal with sales? • NL crude weighting on item level: w=0 or 1 • N Manual checks of price ratios that contribute most to elementary results: “critical observations”

  19. Properties of data, cont.Consequences for methodology NL and N • Implausible price changes • NL price changes (pt/pt-1) of more than a factor 4 are deleted • Changes of +5000% and -99% do actually occur • N price changes (pt/pt-1) of more than a factor 3 are deleted

  20. Properties of data, cont.Consequences for methodology NL and N • Temporarily missing prices

  21. Properties of data, cont.Consequences for methodology NL and N • How to deal with temporarily missing prices: • NL: imputation of missing prices • N : no adjustments, but imputing prices is considered for the near future

  22. Properties of data, cont.Consequences for methodology NL and N • Quality differences • Items with same EAN are considered to be identical • Items with different EAN are treated as different items (no matching) • How to deal with quality differences: • NL Only adjustment in exceptional cases: manual interference • N No adjustment

  23. Actual method - NL • Data received: • For each item each week: • EAN • Short description • (Chain specific) product group • Used to link items to COICOP automatically • Expenditures • Quantities sold

  24. Actual method – NL, cont. • Price of item: • Unit value based on first three weeks of month • Unweighted price index elementary level: • Monthly chained Jevons on selection of items • Weighted price index higher aggregates: • Yearly chained Laspeyres • Weights based on scanner data of all 52 weeks of previous year

  25. Actual method – NL, cont. • Item selection at elementary level • Items with low expenditures : w=0 • Other items : w=1 • Threshold of low (average) expenditure share: • Example: threshold =1% for χ=2 and N=50

  26. Actual method – NL, cont. • Determination of threshold value • Simulations lead to: • Optimal value: χ=1.25 • Ca 50% of items is excluded (on average) • Elementary index based on 80 à 85% of total expenditures • Elementary level (chain dependent) comparable with COICOP6

  27. Actual method – NL, cont. • Refinements: • Extreme price changes are excluded (factor 4) • Missing prices are imputed • Dump prices at end lifecycle item are excluded (see paper)

  28. Actual method – NL.What advantages were achieved? • Indexes are of higher quality • Compared with old method scanner data • Compared with field survey • Response burden for companies is lower • No price collection in the shops • Efficiency gains? • Yes: more or less automatic production process • Investment costs (IT-system) were (very) high

  29. Illustrations • Price indexes based on five supermarkets

  30. Illustrations • Price indexes based on five supermarkets

  31. Actual method - N • Data received: • For each item in the midweek of the month: • EAN/PLU • Short description • (Chain specific) product group • Calculated average price • Quantity sold • Expenditure

  32. Actual method – N, cont. • Sample of representative outlets • Stratified by chain and concept • Matching EAN/PLU with COICOP6 • Weighted Jevons price index on elementary level with expenditures shares of current and base period; • Monthly chained Törnqvist index • Scanner data weights between the COICOP6 groups

  33. Actual method – N, cont. • Higher aggregates: • Yearly chained Laspeyres • Weights from HES (NR as of 2011) • Exclude strongly seasonal items only available for a certain period of the year • Manual control and possibly exclusion of extreme contributions to elementary results

  34. Actual method – NWhat advantages were achieved? • Indexes of higher quality? • New methodology led to reduction of e.g sampling and measurement errors, but also to new biases • Much more data – more detailed price indexes • Considering both prices and quantities • Many indexes have improved, others have not • Low response burden for companies • No questionnaires • Efficiency gains? • Automatic production process which requires some manual interference • Resources demanded not much higher than before • High investment costs (IT-system)

  35. New methodology • Newly developed index (Ivancic, Diewert, Fox) • Rolling year GEKS price index • Source: • GEKS-algorithm of purchasing power parities (International Comparison Programme) • GEKS index transitive by construction • chained index equals direct index • no chain drift • A geometric mean of direct superlative price indexes

  36. New methodology, cont. bilateral indexes (Törnqvist or Fisher) between entities j and l (l=1..M) and between entities k and l, respectively Purchasing power parities : entity is country Scanner data : entity is month

  37. New methodology, cont. • Expanding time period leads to revising all previous GEKS indexes • Solution: rolling version (chaining) etc

  38. RYGEKS and NL • RYGEKS specifically developed for Statistics Netherlands as remedy for not-weighting at elementary level • Not (yet) applied in practice • Used as benchmark • Finding optimal value threshold • Current method (NL) resembles RYGEKS quite well (on average) • No bias found

  39. RYGEKS and NL: Illustrations

  40. RYGEKS and NL: Illustrations

  41. RYGEKS and NL: Illustrations

  42. RYGEKS and NL, cont. • Plans for near future: • Shadow system based on RYGEKS indexes • Continuous benchmark for current method • Implementation when RYGEKS is widely accepted? • More (international) analysis needed

  43. RYGEKS and N • RYGEKS indexes tested on Norwegian scanner data on different levels; • EAN, elementary and aggregated COICOP levels • For COICOP 01 compared a monthly chained Törnqvist index with a monthly chained RYGEKS index • The results indicate some bias in the Törnqvist index

  44. RYGEKS and N, cont. • Small deviations for many COICOP aggregates • Milk, Cheese and eggs, Oils and fats, Vegetables, Fish

  45. RYGEKS and N, cont. • While others show more deviations • Meat, Sugar, jam and chocolate

  46. RYGEKS and N, cont.

  47. RYGEKS and N, cont. • Causing bias; • Missing prices • Seasonal items (not excluded) • Price and quantity oscillating over time • Shadow system for calculating RYGEKS indexes on monthly basis established • Too early to be implemented

  48. Scanner data in other branches? • NL: • Expanding to other branches desirable • Data available (e.g. durables) • Problem of quality changes • Analysis needed • N: • Continuously working to expand scanner data • Increasing pressure from chains and outlets • Data available for pharmaceutical products, wine and spirits (state monopoly) and petrol • Mostly price information implemented • Have tried to cover clothing, but matched item model unsuccessful

More Related