1 / 3

Revisiting Benchmark Data: Addressing Biases and Definitions in Large Datasets

Since 2008, significant progress has been made in understanding the biases in large datasets, yet many issues remain unresolved. Our independent efforts have led to minimal data reuse, highlighting the need for standardized benchmarks. Initiatives like MOVE and MPA’10 have been instrumental in defining desirable characteristics for benchmark data. To further this dialogue, we're organizing a workshop at the Lorentz Centre, focusing on realistic benchmark problems and collaborative approaches to compare results. Join us in tackling these open challenges and advancing our field.

pillan
Télécharger la présentation

Revisiting Benchmark Data: Addressing Biases and Definitions in Large Datasets

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Last time… 4. delusivedwarves • Early workbased on smallsamples – needtorevisit 5. bafflingbias • Are all penguinsequal? • Very large datasets still containbias 6. sinfulsimulations • Are patterns a function of model parameters? • Simulation ≠ validation 1. slipperyspaces • First order effects = context • Objects break rules 2. granularitygrief • More data ≠ moreinformation • Sensitivityvarieswithmeasure 3. defectivedefinitions • Missedpattern ≠ badalgorithm, but baddefinition

  2. Since 2008 there’s been lots of progress, but… • We’re all still working more or less independently • There’s very little reuse of existing work • A number of initiatives (MOVE, MPA’10) have discussed possible benchmark data and their characteristics • They’ve made a good start on defining benchmark types and desirable characteristics of such data…

  3. Open problem • With Bettina, Judy Shamoun-Baranes and Daniel Weiskopf I’m organising a workshop at the Lorentz Centre • We’ll have a data challenge as part of that workshop • I’d like to discuss, on the basis of the state of the art, what realistic benchmark problems and definitions are and how we can compare results

More Related