1 / 22

Benchmark database inhomogeneous data, surrogate data and synthetic data

Benchmark database inhomogeneous data, surrogate data and synthetic data. Victor Venema. Goals of COST-HOME working group 1. Literature survey Benchmark dataset Known inhomogeneities Test the homogenisation algorithms (HA). Benchmark dataset. Real (inhomogeneous) climate records

ishields
Télécharger la présentation

Benchmark database inhomogeneous data, surrogate data and synthetic data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Benchmark databaseinhomogeneous data, surrogate data and synthetic data Victor Venema

  2. Goals of COST-HOME working group 1 • Literature survey • Benchmark dataset • Known inhomogeneities • Test the homogenisation algorithms (HA)

  3. Benchmark dataset • Real (inhomogeneous) climate records • Most realistic case • Investigate if various HA find the same breaks • Good meta-data • Synthetic data • For example, Gaussian white noise • Insert know inhomogeneities • Test performance • Surrogate data • Empirical distribution and correlations • Insert know inhomogeneities • Compare to synthetic data: test of assumptions

  4. Creation benchmark – Outline talk • Start with (in)homogeneous data • Multiple surrogate and synthetic realisations • Mask surrogate records • Add global trend • Insert inhomogeneities in station time series • Published on the web • Homogenize by COST participants and third parties • Analyse the results and publish

  5. 1) Start with homogeneous data • Monthly mean temperature and precipitation • Later also daily data (WG4), maybe other variables • Homogeneous • No missing data • Longer surrogates are based on multiple copies • Generated networks are 100 a

  6. 1) Start with inhomogeneous data • Distribution • Years with breaks are removed • Mean of section between breaks is adjusted to global mean • Spectrum • Longest period without any breaks in the stations • Surrogate is divided in overlapping sections • Fourier coefficients and phases are adjusted for every small section • No adjustments on large scales!

  7. Surrogates from inhomogeneous data

  8. 2) Multiple surrogate realisations • Multiple surrogate realisations • Temporal correlations • Station cross-correlations • Empirical distribution function • Annual cycle removed before, added at the end • Number of stations between 5 and 20 • Cross correlation varies as much as possible

  9. 5) Insert inhomogeneities in stations • Independent breaks • Determined at random for every station and time • 5 breaks per 100 a • Monthly slightly different perturbations • Temperature • Additive • Size: Gaussian distribution, σ=0.8°C • Rain • Multiplicative • Size: Gaussian distribution, <x>=1, σ=10%

  10. Example break perturbations station

  11. Example break perturbations network

  12. 5) Insert inhomogeneities in stations • Correlated break in network • One break in 50 % of networks • In 30 % of the station simultaneously • Position random • At least 10 % of data points on either side

  13. Example correlated break

  14. 5) Insert inhomogeneities in stations • Outliers • Size • Temperature: < 1 or > 99 percentile • Rain: < 0.1 or > 99.9 percentile • Frequency • 50 % of networks: 1 % • 50 % of networks: 3 %

  15. Example outlier perturbations station

  16. Example outliers network

  17. 5) Insert inhomogeneities in stations • Local trends (only temperature) • Linear increase or decrease in one station • Duration: 30, 60a • Maximum size: 0.2 to 1.5 °C • Frequency: once in 10 % of the stations • Also for rain?

  18. Example local trends

  19. 6) Published on the web • Inhomogeneous data will be published on the COST-HOME homepage • Everyone is welcome to download and homogenize the data • http://www.meteo.uni-bonn.de/ mitarbeiter/venema/themes/homogenisation

  20. 7) Homogenize by participants • Return homogenised data • Should be in COST-HOME file format (next slide) • Return break detections • BREAK • OUTLI • BEGTR • ENDTR • Multiple breaks at one data possible

  21. 7) Homogenize by participants • COST-HOME file format: http://www.meteo.uni-bonn.de/ venema/themes/homogenisation/costhome_fileformat.pdf • For benchmark & COST homogenisation software • New since Vienna: • Stations files include height • Many clarifications

  22. Work in progress • Preliminary benchmark: http://www.meteo.uni-bonn.de/ venema/themes/homogenisation/ • Write report on the benchmark dataset • More input data • Set deadline for the availability benchmark • Deadline for the return of the homogeneous data • Agree on the details of the benchmark • Daily data: other, realistic, fair inhomogeneities

More Related