1 / 14

Crowd Algorithms

Crowd Algorithms. Scoop — The Stanford – Santa Cruz Project for Cooperative Computing with Algorithms, Data, and People .

monte
Télécharger la présentation

Crowd Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Crowd Algorithms Scoop — The Stanford – Santa Cruz Project for Cooperative Computing with Algorithms, Data, and People Hector Garcia-Molina, Stephen Guo, AdityaParameswaran, Hyunjung Park, AlkisPolyzotis, PetrosVenetis, Jennifer Widom Stanford and UC Santa Cruz

  2. The Goal • Design Fundamental Algorithms for Human Computation Latency • Which questions do I ask? • When do I ask the questions? • When do I stop? • How do I combine the answers? Uncertainty Cost

  3. The Problems • Crowd- • Crowd- • Crowd- • Crowd- • Sort / Max • GraphSearch • Categorize • Filter • : Difficult! • : Difficult! • : Difficult! • : Difficult! [VLDB 2011] Summaries of the rest Progress! The focus of this talk. Latency Uncertainty Cost

  4. Filters Is this image that of Bytes Café ? • Given: • Error Probability (FP/FN) & Selectivity for each predicate • Desired Overall Error Probability • To: Compose a filtering strategy • Minimize Overall Cost (# of questions) Predicate 1 Dataset of Items Is the image blurry? Filtered Dataset Predicate 2 Does it show people’s faces? …… Predicate k • Which questions do I ask? • When do I ask the questions? • When do I stop? • How do I combine the answers?

  5. Single Filter • Surprisingly difficult! • Need to meet an overall error threshold • Say, up to 10% of my images may be wrongly filtered • Minimize overall expected number of questions • Boils down to the following: • Take one item • Ask some questions • Results in a certain number of (Y, N) for a given item • Do I stop (if so, what do I return), or do I continue asking? Dataset of Items Filtered Dataset Predicate 1

  6. Hasn’t this been done before? • Solutions from statistics guarantee the same error per item • Important on contexts like: • Automobile testing • Diagnosis • We’re worried about aggregate error over all items: a uniquely data-oriented problem • I don’t care if every image is perfect as long as the overall error is met. • As we will see, results in $$$ savings

  7. Strategies YES Answers • Reformulated Task: • For each point in grid : Return Pass/Fail/Cont. • Equivalently, • Find the best shape and color it! YES = 5, NO = 6 Return “Passed” YES = 3, NO = 5 Continue YES = 3, NO = 7 Return “Failed” Start here, with no questions NO Answers

  8. Common Strategies • Always ask X questions, return most likely answer • The triangle shape • If you get X YES, return “Pass” or Y NO, return “Fail”, else keep asking. • Rectangular shape • Ask until |#YES - #NO| > X, or at most Y questions • Chopped off rectangle • Anhai’s work on MOBS

  9. Summary of Results • A characterization of which “shapes” are optimal • A optimal PTIME “probabilistic” approach • LP leveraging the inherent DP structure • Optimal: Strategy with minimum overall cost • for given parameters and requirements • Probabilistic: Probability of “Pass” “Fail” “Continue”

  10. Empirical Results Generate Parameters • Evaluation on 10000 synthetic scenarios • Tested: • Optimal, Brute Force, Statistical, 5 Heuristic Algorithms • Optimal Probabilistic issues fewer questions overall • 15% savings on average compared to brute force • 32% savings when optimal wins • 22% savings on average compared to the statistics approach • 49% savings when optimal wins Brute Force Deterministic Optimal Probabilistic Other Algorithms >> >> COST1 COST2 COST3 Translates to $$$ for many items !!

  11. Crowd-Max/Sort • The problem(s): • Find the strategy of sorting n items • Given: Probability of error for a comparison • Given: Desired threshold on error,#questions,#rounds • Sorting automatically given evidence • NP-Hard even for a simple probability of error model • Related work in the area of voting theory, economics • Which r questions do we ask next? • One question in each round • Ask all pairs a total of 2k/n times • Tournament, with k repetitions at each level Decreasing Parallelism More Accuracy

  12. Crowd-GraphSearch Image Categorization Example To attach: image of a honda car Is image one of vehicle? vehicle YES! car Is image one of toyota? NO! nissan honda toyota Is image one of honda? maxima sentra YES! target node = intended category Is the image one of X? = Is the target node reachable from X? Find the target node by asking minimum number of search questions.

  13. Crowd-Categorize • k buckets, n items • Categorize every item, overall error < threshold • For k = 1, same as filters problem • Two versions: • Discrete • Independent (like in the filters case) • Dependent buckets (e.g., colors, GraphSearch) • Continuous (e.g., age) Dataset of Items …….

  14. Questions?

More Related