1 / 27

Query-Based Data Pricing

Query-Based Data Pricing. Paraschos Koutris Prasang Upadhyaya Magdalena Balazinska Bill Howe Dan Suciu. University of Washington PODS 2012. Motivation. Data is increasingly sold and bought on the web Websites that sell data: AggData [ www .aggdata. com ]

javan
Télécharger la présentation

Query-Based Data Pricing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Query-Based Data Pricing ParaschosKoutris PrasangUpadhyaya Magdalena Balazinska Bill Howe Dan Suciu University of Washington PODS 2012

  2. Motivation • Data is increasingly sold and bought on the web • Websites that sell data: • AggData[www.aggdata.com] • Xignite (financial data) [www.xignite.com] • Gnip (social media) [www.gnip.com] • Data marketplace services: • Windows Azure Marketplace (100+ datasets) [datamarket.azure.com] • Infochimps (15,000 datasets) [www.infochimps.com] Query-based pricing customized for buyers

  3. Current Pricing (1) • A fixed price for the whole dataset or for a specific set of views • Example:CustomLists • USA Business Database for $399 • Email addresses for $299 • Businesses in WA for $199 • Limitations: • Restaurants in WA ? • Businesses in cities with population >100,000 ?

  4. Current Pricing (2) • API Subscriptions (Azure Marketplace, Infochimps) • Allow queries over the data • Pay by number of transactions (page of results)

  5. Issues With Pricing • Buyers today need to buy a superset of the data they are interested in • Sellers can’t easily anticipate all possible queries that buyers might ask • Solution: we need a more flexiblepricing scheme, parameterizedby queries

  6. Outline • The Pricing Framework • The Pricing Formula • The Complexity of Pricing • Dichotomy and Algorithms for Selections

  7. The Pricing Framework • The seller defines price points (view-price pairs): S = { (V1,p1), (V2,p2), … } • A buyer can buy anyquery Q • The system will compute priceDS(Q) Buyer Q(D) ? Seller priceDS(Q) Pricing System + Database D V1,p1 V2,p2 …

  8. Instance-Based Determinacy Definition. V = V1,…,Vkdetermine Q given D, denoted D ⊢ V ↠ Q, if: forall D’, if V(D) = V(D’), then Q(D) = Q(D’) Intuitively, “V1,…, Vk determine Q” means that Q(D) can be answered only from V1(D),…,Vk(D), without accessing the database instance D

  9. Arbitrage-Free • Axiom 1. • Given D, the pricing function priceD(Q) is arbitrage-free if for all views V1, …, Vk and query Qwhere D ⊢ V1, …, Vk↠ Q: • priceD(Q) ≤ priceD(V1) + … + priceD(Vk) Suppose V determines Q and priceD(Q) > priceD(V). Then, we can • buy V(D) for priceD(V) • compute Q(D) from V(D) • now we have answered Q at some price p<priceD(Q)

  10. Discount-Free Axiom 2. The pricing function priceD(Q) should not offer any other additional discounts except for the explicit price points defined by the seller. • The intuition is that the price points represent discounts that the seller offers relative to the price of the whole database • A pricing function is discount-free if it is maximal

  11. Example: Origami Database

  12. Example: Origami Database Database S Price points Get all dragon origami for $2 Get all red origami for $3 What is the price of the entire database? Q(x,y,z) :- S(x,y,z) Exhausts the active domain V1, V2, V3, V4determine Q: price(Q) ≤ $8W1, W2, W3determine Q: price(Q) ≤ $9 price(Q)=$8

  13. Example: Origami Database R T S p(σcolor)=$50 p(σshape)=$99 p(σshape)=$2 p(σcolor)=$5 What is the price of the full join? Q(x,y,z,u,v) :- R(x,u), S(x,y,z), T(y,v)

  14. Outline • The Pricing Framework • The Pricing Formula • The Complexity of Pricing • Dichotomy and Algorithms for Selections

  15. The Query Pricing Formula • Given: • Price points S = {(V1,p1),…,(Vk, pk)} • Database instance D • Query Q. • Compute: priceDS(Q) • Properties: (a) arbitrage-free, (b) discount-free, (c) priceDS(Vi)=pi • If it exists, we say that the price points are consistent • Method: • Consider all subsets of V ={V1,…,Vk} that determine Q • Let C be the subset with the minimum price, Σi pi, for Viin C • Define pD(Q) = Σi pi Theorem. The price points are consistentiffpD(Vi)=pi for any price point i=1,…,k (b) priceDS(Q) = pD(Q) is the uniquearbitrage-free, discount-free pricing function that agrees with the price points 15

  16. Discussion • If the result of Q1 is always a subset of Q2, should Q1 be priced less than Q2? No! Example: • V(x,y) :- Fortune500(x,y)Q(x,y) :- Fortune500(x,y), StrongBuyRec(x) • price(Q) >> price(V) • We ignore computation costs in our framework • Cost of computing query Q • Q(D)=f(V(D)), but f can be hard to compute

  17. Outline • The Pricing Framework • The Pricing Formula • The Complexity of Pricing • Dichotomy and Algorithms for Selections

  18. Determinacy Definition. [Instance-dependent] V determines Q given D, denoted as D ⊢ V ↠ Q, if: forall D’, if V(D’) = V(D), then Q(D) = Q(D’) [Nash, Segoufin, Vianu ‘07] Definition. [Instance-independent] V determines Q, denoted as V ↠ Q, if: forall D, D’, if V(D) = V(D’), then Q(D) = Q(D’) V ↠ Q iffthere exists a function f such that Q(D) = f(V(D)) for all D ifffor every D, we have that D ⊢ V ↠ Q

  19. Complexity Of Determinacy Open Question: is the bound on the combined complexity tight?

  20. Complexity Of Pricing • Corollary. • Deciding whether priceDS(Q) ≤ k is: • Combined complexity [input S, D]: Σp2 • Data complexity [input D]: coNP-hard Proposition. Pricing is at least as hard as determinacy How do we deal with the hardness of computation?

  21. Outline • The Pricing Framework • The Pricing Formula • The Complexity of Pricing • Dichotomy and Algorithms for Selections

  22. Restricting Price Points to Selections • A seller can specify only the prices of selectionqueries of the form σR.X=a: prices on columns • The domain of each column is finite and known to buyers and sellers • Price points on selections is how prices are set in most cases today

  23. Dichotomy Theorem Theorem. Assuming selection views only, for any Conjunctive Query w/o self-joins Q, one of the following holds (data complexity): priceQS(D) is in PTIME checking whether priceQS(D)≤k is NP-complete • PTIME: • Q(x,y,z,u,v) :- R(x,u),S(x,y,z),T(y,v) [Chains] • Q(x1,…,xk) :- R1(x1,x2),…,Rk(xk,x1) [Cycles] • NP-complete: • Q(x) :- R(x,y) [Projections] • Q(x,y,z) :- R(x,y,z),S(x),T(y),U(z)

  24. Algorithm For PTIME Cases • The algorithm uses a reduction to maximum flow • Edges of finite capacity represent price points • A set of edges of finite cost is a cutiff they determine the query • Example: • Chain query Q(x,y):-R(x),S(x,y),T(y) S R T Dom(X) = {a1,a2,a3,a4} Dom(Y)= {b1,b2,b3}

  25. S Flow Graph R T R T a4 b1 a3 b2 a2 b3 a1 a4 b1 a3 b2 a2 b3 A set of edges of finite cost is a cutiff they determine the query a1 S

  26. Conclusions • Summary: • The seller sets prices to some views, while the system computes the price of any query • Interesting application of query determinacy • Complexity: dichotomy for CQs w/o self-joins • Future Work: • Pricing in the presence of updates • How do we overcome pricing for intractable queries? • Connection of pricing and privacy

  27. Thank you !

More Related