CrowdSearch: Accurate Real-time Image Search on Mobile Phones

CrowdSearch: Accurate Real-time Image Search on Mobile Phones University College London Department of Computer Science CS M038/GZ06: Mobile and Cloud Computing Paper presentation Students: Shaig Mursalzade & Vasos Koupparis Date: 14.03.2014

Outline: • Motivation and problem definition • Main contributions and aspects of work • Design decisions • Experimental evaluation • Related work • Future work • Conclusion

Motivation and problem definition • Mobile phones are right hands of people. • More than 70% of smartphone users perform internet search. However there are several challenges such as: • Small form-factor • Resource limitations and etc. • For these reasons, on-board sensors are used for multimedia search.

Motivation and problem definition Example: Use of GPS and voice for searching is powerful nowadays. But: Image search falls behind. Question: What is actually image search?

What is actually image search? Search by taking a picture! Google Goggles: Taking a picture of a famous landmark would search for information about it, or taking a picture of a product's barcode will search for information on the product.

Limitations Image search has significant challenges due to variations in: Lighting Texture Image qualityand etc. Multimedia searches require significant memory, storage and computing resources. So, search should be precise and generate few erroneous results. New approach: CrowdSearch.

CrowdSearch Accurate image search system that combines automated image search with real time human validation of search results. Automated image search –> generates candidate search results Real time validation -> uses Amazon Mechanical Turk for validation by humans (for monetary cost) CrowdSearch requires: an image query, a query deadline and a payment mechanism for human validators. Sensitive aspects: delay, accuracy, monetary cost and energy.

CrowdSearch interface and validation task example Figure 2.p.79. Figure 1.p.78. CrowdSearch iPhone interface

Design choices • How to construct tasks such that they are likely to be answered quickly? • Simple format for validation tasks. Validator is required to provide simple YES or NO answer. YES for correctly matching images, NO for otherwise. • How to minimize human error and bias? • Requesting several duplicate responses for a validation task from multiple validators and aggregating the responses using majority rule. • How to price a validation task to minimize delay? • It is typically better to have more tasks at a low price than fewer tasks at a high price.

Optimizing delay and cost. Parallel posting to optimize delay (expensive in terms of monetary cost) Serial posting to optimize cost (incurs much higher delay than parallel posting, when top ranked image is incorrect) CrowdSearch prediction algorithm (optimizes delay and cost)

Optimizing delay and cost. CrowdSearch algorithm estimates the probability that any of the received valid sequences of “YES, NO” answers occurs during the deadline. Done by using: Models of inter-arrival times of responses from human validators Probability estimates for each sequence that can lead to a positive validation results. IF probability of a valid result within the deadline is less than a pre-defined threshold Pth, validation task is posted for next candidate image.

Optimizing delay and cost. Consider S(i) as partial sequence received. • CrowdSearch uses 2 functions. • DelayPredict which estimates the probability than sequence S(j) will be received before the deadline • ResultPredict which estimates probability that the sequence S(j) will occur given that sequence S(i) has been received so far. • As these probabilities are independent from each other, product of the two, P(j) is the probability that the sequence S(j) is received prior to the deadline. • We compute P+, which is the accumulation of P(j) for all cases where the remaining sequence leads to a positive answer. • This gives us the predicted probability that current task can be validated as positive given S(i) results are received.

Predicting validation results For two leaf nodes where only the last bit is different, they have a common parent node whose sequence is the common substring of the two leaf nodes. For example, nodes ‘YNYNN’ and ‘YNYNY’ have a parent node of ‘YNYN’. The probability of a parent node is the summation of the probability from its children. Following this rule, the SeqTree is built, where each node Si is associated with a probability pi that its sequence can happen. Given the tree, it is easy to predict the probability that Sj occurs given partial sequence Si using the SeqTree. Simply find the nodes that correspond to Si and Sj respectively, and the probability we want is pj/pi Probability tree called SeqTree is used for predicting validation results. Figure 5 (p.83)

Image search overview • Image search process contains 2 major steps: • Extracting features from query image • Search through database images with features of query image. Features  set of compact image representations called visual terms. • Advantage: • Compact  can be communicated from phone to remote server with low energy cost. • Disadvantage • Has significant computation overhead and delay.

Implementation tradeoff • The first question is whether visterm extraction should be performed on the mobile phone or remote server? • System chooses the best option for visterm extraction depending on the availability of WiFi connectivity. If only 3G connectivity is available, visterm extraction is performed locally, whereas if WiFi connectivity is available, the raw image is transferred quickly over the WiFi link and performed at the remote server. • The second question is whether inverted index lookup should be performed on the phone or the remote server? • It is chosen to use a remote server for inverted index lookup as having database on phone is not feasible and it makes harder to update the database to add new images.

CrowdSearch components Figure 6 (p.84)

CrowdSearch experimental evaluation • In the next slides, 4 aspects of the performance of CrowdSearch are evaluated. • Improvement in image search precision • Accuracy of the delay models • Ability to tradeoff cost and delay • Energy efficiency

Precision of automated search results In the graph x axis is the length of ranked list obtained from search engine. Y axis is the precision indicator. • Top-ranked response has 80% precision for categories such as buildings and books • Very poor precision for faces and flowers. • Therefore, we can not present the results directly to users! Figure 7 (p.85)

How human validation can improve image search precision? Human-validated search scheme returns only the candidate images on the ranked list that are deemed to be correct. Automated image search simply returns the top five images on the ranked list. Two key observations: Considerable improvement in all strategies. Second, among the four schemes, human validation with majority(5) is easily the best performer and consistently provides accuracy greater than 95% for all image categories. Figure 8 (p.85) X axis shows 4 different image categories. Y axis is the precision indicator.

Accuracy of delay models Figure shows the cumulative distribution functions (CDF) for the first response. This model is derived by the convolution of the acceptance time and submission time distribution. Graph shows that the model parameters for the acceptance, submission, as well as the total delay for the first response fit the testing data very well. Figure 9 (p.86) The scatter points are for testing dataset and the solid line curves are for our model. X axis show time with seconds Y axis is the cumulative distribution function.

Ability to tradeoff cost and delay We evaluate three aspects: precision, recall, and cost. Precision is the ratio of the number of correct results to the total number of results returned to the user. Recall is the ratio of number of correctly retrieved results and the number of results that actually correct. Cost is measured in dollars. Here, we evaluate the CrowdSearch algorithm on its ability to meet a user-specified deadline while maximizing accuracy and minimizing overall monetary cost. • CrowdSearch is compared against 2 schemes: • parallel posting • Parallel posting posts all five candidate results at the same time. • serial posting • Serial posting processes one candidate result at a time and returns the first successfully validated answer.

Ability to tradeoff cost and delay For stringent deadlines of 120 seconds or lower, CrowdSearch posts tasks aggressively since its prediction correctly estimates that it cannot meet the deadline without posting more tasks. Thus, the recall follows parallel posting. Beyond 120 seconds, CrowdSearch tends to wait longer and post fewer tasks since it has more slack. This can lead to some missed images which leads to the dip at 180 seconds. Again the recall increases after 180 seconds since the CrowdSearch has better prediction with more partial results. Y axis indicates Recall, X axis shows deadline At lowest deadline neither scheme obtains many responses from human validators, hence the recall is very poor and many valid images are missed. The parallel scheme is quicker to recover as the deadline increases, and recall quickly. The serial scheme does not have enough time to post multiple candidates, however, and is slower to improve. Figure 10 (p.87)

Ability to tradeoff cost and delay Y axis indicates Cost, X axis shows deadline 180 or 240 seconds is ideal to obtain a balance between delay, cost, and accuracy in terms of precision and recall. Figure shows the average price per validation task as a function of the user-specified delay. When the deadline is small, CrowdSearch behaves similar to parallel search. When deadline is larger than 120 seconds, the cost of CrowdSearch is significantly smaller and only 6-7% more than serial search. Figure 10 (p.87)

Energy efficiency Figure 12 (p.88) Energy consumption of the partitioned scheme is the same in 3G and WiFi. This is because visterms are very compact. With WiFi, remote processing is more efficient than local processing. But, communicating the image via 3G is more expensive, as 3G has greater power usage and lower bandwidth. Results confirm design choice of using remote processing when WiFi is available and local processing when only 3G is available. We consider two design choices: 1) remote processing where phones are used only as a query front-end while all search functionality is at the remote server, and 2) partitioned execution, where the visterm extraction is performed on the phone and the search by visterms is done at the remote server. In each design, we consider the case where the network connection is via 3G and WiFi.

Related Work Using multiple features can help image search engine performance BUT It does not solve the problem of low accuracy for certain categories • Image Search Google Goggle[21]: primarily advertised for building landmarks • Why ? Techniques such as SIFT, visterm-extraction using vocabulary trees and inverted lookup approaches have a well known limitation. Limitation: work best for mostly planar images eg. buildings work poor for non-planar images eg. faces iScope system[31]: a multi-modal image search system for mobile devices • Performs image search using: - mixture of feature - temporal or spatial information CrowdSearch use of real time human validation does not improve the performance of an image search engine BUT helps filtering incorrect responses and return only the good ones.

Related Work • Participatory Sensing: sensing using mobile phones and in-built accelerometers, high-quality cameras, microphones, and digital compasses. • Urban Sensing[4]a platform extracting patterns of use and citizens’ perceptions related or concerning city spaces • Nokia’s SensorPlanet[22]:a global test platform for mobile-centric wireless sensor network research. • MetroSense[7] • SurroundSense[1]: a mobile phone based system that explores logical localization via ambience ﬁngerprinting. Such projects concentrate to utilize humans with mobile phones for providing sensor data that can eventually be used for applications such as traffic monitoring. CrowdSearch is distinct from those approaches as it is focus on designing human-in-the-loop computation systems rather than just using mobile phones for data collection.

Related Work CroudSearch inspired from those approaches BUT differs in that it focus on using croudsourcing to provide real-time search services for mobile users. • Crowdsourcing: • reCaptcha[29]:uses humans to solve difficult OCR tasks - enabled digitazation of old books and newspapers - protects websites against robots • ESP game[28]:uses humans to find good labels for image - Faciliate image search - rewards participants with points if the players provide matching models • auction-based crowdsourcing model (eg: Taskcn[23] ) • Simultaneous crowdsourcing contests (eg: TopCoder[25] )

Related Work • Crowd Search enables real-time responses by specifying deadlines and combining automated and human processing • Considerably more sophisticated • Many apps utilize micro-payment crowdsourcing systems eg.AMT including the use of crowdsoursing for labelling images and other complex data items. • Sorokin et al[15] show that a quick way to annotate a large image databases is using AMT. - is done in an offline manner - image annotation is noisy VS validation of candidates from a search engine • It seems that CrowdSearch model can have broader applicability for others apps that use croudsourcing systems • Amazon Remembers[27]: it takes phone based queries and uses crowdsourcing to retrieving product information. - Combines mobile phones with croudsourcing

Future possibilities. • Improving CrowdSearch Performance: • Realistic model for such systems may be one where the users post their queries to CrowdSearch and go offline. CrowdSearch processes the search query and sends the results to the user via notification, such as iPhone push notification or SMS. • The price of human validation can also be reduced with a simple optimization in order to be more adaptive about how many duplicates are requested for each validation task.

Future possibilities. • Improving Automated Search Performance: • With using positive and negative feedbacks from humans, GPS locations, orientation or text tags. • CrowdSearch payment models: • 2 possible payment models. First is where the search provider pays for human validation in-order to provide a more accurate search experience for mobile users. Second is where the mobile users pay directly for human validation through a micropayment account such as PayPal.

Conclusion • Unlike text , image search is difficult due to unclear features. • A general image search system is far from reality despite the significant research in the area • CrowdSearch demonstrate an 95% search precision • Compare to alternative approaches with similar search delay ,it saves the monetary cost up to 50% • While CrowdSearch focus on image search, techniques used are applicable to areas beyond images to any multimedia search from mobile phones. • Is it enough to design and build such systems only on iphones? • How the world find the idea of a system that will be able to identify faces?

CrowdSearch: Accurate Real-time Image Search on Mobile Phones

CrowdSearch: Accurate Real-time Image Search on Mobile Phones

Presentation Transcript

Real-Time and Near Real-Time GPS Products and Services from Canada

Real-Time Database Systems and Data Services: Issues and Challenges

Lab 2: J2ME: Java 2 Micro Edition (Writing Programs for Mobile Phones using Java)

Locality Sensitive Hashing and Large Scale Image Search

{image} {image} {image} {image}

Neolithic vs. Paleolithic Comparison Posters

{image} {image} {image} {image} {image}

Overview of Real -Time PCR

Multimedia search: From Lab to Web

DataMigrator 7.7 in Real Time

Mobile Tools for Java Platform

The Contemporary Image of Professional Nursing

Real-Time Tracking

Wireless Technologies for Mobile Phones

Protons for Breakfast Are Mobile Phones Safe? Week 5

Real-Time PCR

Real-Time PCR

Fast Image Search

Mobile Tools for Java Platform

Mobile TV

Adversarial Search