1 / 69

CROWDSOURCING

CROWDSOURCING. Massimo Poesio Part 2: Games with a Purpose. GAMES WITH A PURPOSE. Luis von Ahn pioneered a new approach to resource creation on the Web: GAMES WITH A PURPOSE, or GWAP, in which people, as a side effect of playing, perform tasks ‘computers are unable to perform’ (sic).

kapono
Télécharger la présentation

CROWDSOURCING

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CROWDSOURCING Massimo Poesio Part 2: Games with a Purpose

  2. GAMES WITH A PURPOSE • Luis von Ahn pioneered a new approach to resource creation on the Web: GAMES WITH A PURPOSE, or GWAP, in which people, as a side effect of playing, perform tasks ‘computers are unable to perform’ (sic)

  3. GWAP vs OPEN MIND COMMONSENSE vs MECHANICAL TURK • GWAP do not rely on altruism or financial incentives to entice people to perform certain actions • The key property of games is that PEOPLE WANT TO PLAY THEM

  4. EXAMPLES OF GWAP • Games at www.gwap.com • ESP • Verbosity • TagATune • Other games • Peekaboom • Phetch

  5. ESP • The first GWAP developed by von Ahn and their group (2003 / 2004) • The problem: obtain accurate description of images to be used • To train image search engines • To develop machine learning approaches to vision • The goal: label the majority of the images on the Web

  6. ESP: the game

  7. ESP: THE GAME • Two partners are picked at random from the large number of players online • They are not told who their partner is, and can’t communicate with them • They are both shown the same image • The goal: guess how their partner will describe the image, and type that description • Hence, the ESP game • If any of the strings typed by one player matches the string typed by the other player, they score points

  8. THE TASK

  9. SCORING BY MATCHING

  10. THE CHALLENGE: SCORES • One of the motivating factors is to try to score as many points as possible • Hourly, daily, weekly, and monthly scores are shown

  11. SCORES

  12. THE CHALLENGE: TIMING • Partners try to agree on as many images as they can during 2 ½ minutes • The termometer on the side indicates how many images they have agreed on • If they agree on 15 images they score bonus points

  13. TABOO WORDS • To ensure the production of a large number of specific labels, some words are declared TABOO and not allowed • Taboo words are obtained from the game itself: any word that has been agreed upon by players who were shown a picture earlier becomes a taboo word for that image

  14. TABOO WORDS

  15. PASSING

  16. GOOD LABELS, COMPLETING AN IMAGE • A label is considered “good” when more than N players produce it (with N a parameter of the game) • An image is “done” when its list of taboo words is so extensive that most players pass on it

  17. IMPLEMENTATION • Pre-recorded game play • Especially at the beginning, and at quiet times, there won’t always be players to pair with • In these cases a player is paired against a recorded ‘hand’ of a previous game with the same picture • Cheating • Players could cheat in a number of ways, including agreeing on labels / playing against themselves • A number of mechanisms are in place against those cases • Selecting images

  18. SOME STATISTICS • In the 4 months between August 9th 2003 and December 10th 2003 • 13630 players • 1.2 million labels for 293,760 images • 80% of players played more than once • By 2008: • 200,000 players • 50 million labels

  19. ANALYSIS • The numbers indicate that the game is fun to play • Exciting factors: • Playing with a partner • Playing against time

  20. QUALITY OF THE LABELS • For IMAGE SEARCH: • choose 10 labels among those produced and look at which images are returned • Compare labels produced by players with labels produced by participants in an experiment • 15 participants, 20 images among the 1000 with more than 5 labels • 83% of game labels also produced by participants • Manual assessment of labels (‘would you use these labels to describe this image?’) • 15 participants, 20 images • 85% of words rated useful

  21. GOOGLE IMAGE LABELLER

  22. THE TASK

  23. RESULTS

  24. VERBOSITY • … or, the game approach to collecting commonsense knowledge • Motivation: slow progress both on CYC (5 million facts collected) and on Open Mind Commonsense (around 700,000 facts)

  25. THE GAME • Based on an existing game, TABOO: • Players have to guess a word • One of the players gives hints concerning the word • In Verbosity, you have two players, the DESCRIBER and the GUESSER, and a SECRET WORD

  26. THE GAME

  27. TEMPLATES IN VERBOSITY • As in Open Mind Commonsense, templates are used to ensure that the relations / properties of interest are collected • The Describer produces hints by filling in a template

  28. GUESSING ATTRIBUTES

  29. PRODUCING A DESCRIPTION

  30. TEMPLATES • _ is a kind of _ • _ is used for _ • _ is typically near/in/on _ • _ is the opposite of _ / _ is related to _

  31. EMULATION • As in ESP game, pre-recorded games are used when a player cannot be paired with another player • The asymmetry of the game causes a problem not encountered in ESP game • Describer: can just repeat behavior of previous describer • Guesser: not so easy

  32. RESULTS • Only published results I’m aware of predate the actual release of the game so I don’t know about the QUANTITY • Quality: • Ask six raters whether 200 facts collected using Verbosity are ‘true’ • Around 85% success

  33. PEEKABOOM • Objective: collect data about the presence of objects in images in order to train vision algorithms for object detection

  34. THE GAME • Two players • They take turns at playing ‘Peek’ and ‘Boom’ • ‘Boom’ gets a picture with an associated word; ‘Peek’ has to guess what is the associated word • ‘Boom’ reveals parts of a picture to ‘Peek’ by clicking on it (each click reveals a circular area of 20 pixels of radius)

  35. THE GAME: PEEK

  36. THE GAME

  37. PINGS

  38. HINTS

  39. IMPLEMENTATION • Images and their labels come from ESP • Cheating: • Player queue (wait until next ‘matching interval’ – one every 10 seconds – to start playing) • IP address checks (to make sure players are not paired with themselves) • Blocking bots: ‘seed images’ (previously annotated) and blacklist

  40. EVALUATION: USER STATISTICS • Usage: • 1 month in 2005 • 14,153 players • 1,122,998 completed rounds • Average person played around 158 images (or 72 minutes)

  41. EVALUATION: ACCURACY OF DATA • Accuracy of bounding boxes • Choose 50 images played by at least two pairs • Have four volunteers make bounding boxes • OVERLAP(A,B) = AREA(A∩B) / AREA(A∪B) • Average: 0.75 • Accuracy of pings • 50 images as above • Three subject decide if ping is ‘inside the object’ • Result: 100%

  42. SOME GENERAL LESSONS • von Ahn & Dabbish (2008) discuss the general approach and some lessons they took from their work

  43. THREE TEMPLATES • OUTPUT AGREEMENT GAMES • Generalization of ESP • INVERSION-PROBLEM GAMES • INPUT-AGREEMENT GAMES

  44. OUTPUT AGREEMENT GAMES • Two strangers are chosen among all potential players. They cannot see each other or communicate with each other. • In each round, both are given the same input • Game instructions say that players should produce same output as their partners • Winning condition: they produce the same output, possibly after a few attempts E.g.: ESP GAME.

  45. INVERSION PROBLEM GAMES • Two strangers are chosen among all potential players. They cannot see each other or communicate with each other. • In each round, one player is designated as the DESCRIBER whereas the other is designated as the GUESSER. The output from the describer should help the guesser guess the original input • WINNING CONDITION: The guesser correctly guesses the input originally assigned to the describer. E.g.: VERBOSITY. Based on ‘20 Questions’.

  46. INPUT AGREEMENT GAMES • Two strangers are chosen among all potential players. They cannot see each other or communicate with each other. • In each round, both are given input that is known by the game (but not by the players) to be the same or different • Game instructions say that players should produce output describing their input so that they can decide whether input is same or different • Winning condition: playing partners correctly decide whether input is same or different. E.g.: TagATune.

  47. INCREASE ENJOYMENT • Games designed so as to make the task enjoyable • GWAPs by von Ahn et al attempt to do this by giving players a CHALLENGE: • TIMED RESPONSE • SCORE KEEPING • SKILL LEVELS • HIGH SCORE LEVELS

  48. OUTPUT ACCURACY • Mechanisms to ensure correctness and avoid collusions (e.g., always produce the same label) • Random matching (players don’t know each other’s identity) • Player testing (assess quality of particular player’s input by matching his output against already annotated data) • Repetition (output only considered correct if many players produced it) • Taboo

  49. MISCELLANEOUS • Other useful ideas • Evaluation • Efficiency: THROUGHPUT (T) • ‘Enjoyability’: AVERAGE LIFETIME PLAY (ALP) • Combined measure: EXPECTED CONTRIBUTION = T * ALP

  50. OTHER GAMES • On gwap.com • TagATune • Elsewhere: • FoldIt • Karaoke Callout • PheTch • Spectral Game

More Related