180 likes | 296 Vues
This study proposes a pricing model for data queries based on minimal provenance, tracking source tuple contributions to query results. It introduces a pricing function with specific properties and algorithms for exact and approximation price computation, validated through experiments.
E N D
A Model and Algorithms for Pricing Queries Tang Ruiming, Wu Huayu, Bao Zhifeng, Stephane Bressan, Patrick Valduriez
Overview Aggdata
Overview Windows Azure Marketplace
motivation and existing works • People may want to buy data by asking queries. • As stated by Koutris et al. in [Koutris et al., 2012], current pricing schemes have limitations: • Assign prices to entire datasets. • Assign prices to predefined views, and consumers are restricted to these views. • May lead to arbitrage situations. E.g. 10 10-application-free accounts can be used to get 100 applications. • In frameworks of [Koutris et al., 2012], [Koutris et al., 2013], [Li et al., 2012] • Assign prices to pre-defined views. • The price of a query is the price of cheapest set of pre-defined views which can determine the query. (NP-hard)
Framework provenance • In our framework • Assign prices to individual tuples. • For a query, we track the source tuples contributing to the query result. • Each contributing source tuple is charged only once no matter how many times it contributes. Nature of information goods [Balazinska et al., 2011]
Minimal provenance • (provenance) Let Q be a query, D be a database. Q(D) is the query result. A provenance of Q(D) is a set of tuples L in D, such that • (minimal provenance) A minimal provenance of Q(D) is a provenance L of Q(D) such that • where L’ is a provenance of Q(D).
Pricing function • Pricing setting function maps each tuple in database to its price. • Pricing function takes a query as input and returns its price. • Properties of pricing function: • Contribution monotonicity: if a query uses less source tuples than the other query, the price of the first query should be lower. • Contribution arbitrage-freedom: if a query uses less source tuples than a set of queries, the price of the first query should be lower than the sum price of the set of queries. • Bounded price: the price of a query is always not higher than the price of source tuples in the involved relations in the query.
Pricing function • The price of a query Q in a database D is defined as the price of the cheapest minimal provenance of Q(D): • where is the p-norm of L. Increasing p value decreases the p-norm value. Data seller can use p-norm to adjust prices according to different categories of data consumers.
Algorithms for price computation • We assume that for each result tuple, its set of minimal provenances is available. • We aim to find the cheapest minimal provenance of the set of result tuples. • We prove that this problem is NP-hard. • Exact algorithm: • enumerates all the provenances of the query result. (exponential number) • choose the cheapest one.
Approximation algorithms • We devise some approximation algorithms. • Worst case Khanna et al. prove that the approximability of this problem is a polynomial factor in the size of input. ([Khanna et al., 2000] )
Approximation algorithms • Heuristic 1: choose the cheapest minimal provenance for each individual result tuple independently. (greedy algorithm) • Heuristic 2: choose the minimal provenance with the lowest average price for each individual result tuple independently. (greedy algorithm) • Heuristic 3: Heuristic 1 but consider previous choices. (semi-greedy) • Heuristic 4: Heuristic 2 but consider previous choices. (semi-greedy)
Experiments • Effectiveness: the ratio between approximate price and exact price • Efficiency: running time of approximation algorithms.
Experiments • Effectiveness: the ratio between approximate price and exact price • Efficiency: running time of approximation algorithms. • Set up: • Number of result tuples is 10 for measuring effectiveness. (ratio in the worst case is 10) • Number of result tuples varies from 1,000 to 5,000 for measuring efficiency. • For each result tuple, the number of minimal provenances and the size of each minimal provenance is sampled from [1,5] with uniform distribution.
Effectiveness 50,000 runs
Conclusion • We propose a framework for pricing queries based on the source tuples contributed in the query result. • The price of a query is the price of the cheapest minimal provenance of the query result. • We propose a baseline algorithm to compute the exact price of a query and four heuristics to compute the approximate price of a query. • We conduct experiment to show the effectiveness and efficiency of the heuristics.