Subjective Databases

Subjective Databases AlonHalevy Joint work with: Yuliang Li, Jinfeng Li, Vivian Li, Aaron Feng, Saran Mumick, Xiaolan Wang, Wang-Chiew Tan Megagon Labs

AI and Happiness/Well-being Happiness & wellbeing research Jo: the smart journal HappyDB Experience Engine Voyageur Subjective Databases

Jo the Smart Journal: Key Ideas • Much of our happiness depends on how we decide to spend our time. • Can technology help raise awareness of our choices? • Jo is a digital journal. You log your significant moments • The Twitter of journaling • Youcan imagine Jo being embedded withina digital assistant.

Reflectionon your moments

Jo’s Values Jo recognizes: • Values • Activities • People

Pointers to further reading

Reminders Based on Positive Moments

Technical Challenges in Jo • Recognizing happy from unhappy moments • Identifying the activity and people involved in the moment • Is the activity repeatable? (can we use it to give advice later) • How do we make Jo part of the user’s everyday life? HappyDB[LREC, 2018] was created to start investigating some of these questions.

HappyDB: 100 Happy Moments • I went to the park with the kids. The weather was perfect!” • “I quit smoking cigarette since the tax increase of this year here in California. I am hoping to keep it up and improve my health.” • “I had dinner with my mom” • “A few weeks ago I received a letter from the President of my University letting me know that I've received tenure and promotion to Associate Professor.”

Initial Field Studies with Jo Done in 2 psychology courses (not only psychology majors) Initial observations: • Subjects want the app to be more proactive to bring about behavior change • The app raised awareness of their daily activities and wellbeing • They wanted more qualitative feedback from the app in addition to graphs

Recommending and Creating Experiences • Ultimately, Jo should be able to give recommendations: • What should I be doing right now? • Even if Jo understood exactly what experiences you want, could it help you create more like them? • Search engines don’t support search for experiences: • You can search for objective attributes, but not the experiential ones.

Searching for Hotels

Very Hard to Search for Experiences Online I want a restaurant with a sunset view over Tokyo Tower I want a hotel with a lively bar scene and clean rooms I want Thai food close to a theater that shows Ocean’s 8 I want a job where I can work with a dynamic set of people on social good I want an apartment with 2 rooms in a quiet neighborhood at most ½ an hour away from work I want a 1-week course on python with short programming exercises I want a relaxing trip to a beach with Mediterranean restaurants

Why is it So Hard to Find Experiences? • Content, when/if it exists, is in online reviews •  NLP-Complete problem. • Experiences are nuanced,varied and complex • People use the same words to describe very different experiences or different words to describe the same experiences • Experiences are subjective Voyageur: a prototype experience search engine (WWW 2019 demo)

Technical Challenges in Voyageur Managing and querying subjective data: OpineDB • Hotels with friendly staff • Attractions with short wait times • Romantic restaurants Finding what and when to highlight about entities: • Tips: there is free parking two blocks away • Interesting facts: best ramen in town

Survey: people care about subjective attributes Trummer et al: Mining Subjective Properties on the Web (sigmod, 2015)

Subjectivity in Databases

Subjective queries against subjective data Why is this a hard problem? • Experiences are subjective and personal. • Specified in a variety of ways. • Often in text, not in a database. • Their meanings are often imprecise. • Hard to model in a database.

Examples

NLP has Studied Subjective Data Extensively • Work on sentiment analysis, summarizing reviews, identifying subjective opinions [Wiebe++] • Addressed the problem of finding individual subjective statements • Database query processing needs joins and aggregation: • Combine multiple conditions in a query, aggregate reviews into a meaningful scale • Databases are designed and have a schema. • How do we combine the strengths of both disciplines to create subjective databases?

Subjective attributes in OpineDB Objective attributes Hotel (hotelname, capacity, address, price_pn, *room_cleanliness, *bathroom, *service, *comfort) • Type of a subjective attribute: a marker summary over a linguistic domain. Subjective attributes “modern”, “old style”, “dated shower”, “recently remodeled”, “modernistic style”, ... “very clean”, “pretty clean”, “spotless”, “average”, “stained carpet”, “dirty”, “quite dirty”, “very filthy”, “dusty”, “very dirty”, “unclean”, ... Linguistic variations Linguistic domains

Linguistic domain and marker summaries • Linguistic domain of an attribute • a set of short linguistic variations that describe the attribute. • Marker • a word in the linguistic domain • Marker summary: • a set of markers that the designer decides to highlight • Room_cleanliness[“very clean”, “average”, “dirty”, “very dirty”]

Marker Summaries “rooms are pretty clean” • Linearly-ordered • Markers form a linear-scale. • Room_cleanliness[“very clean”, “average”, “dirty”, “very dirty”] • Categorical • Each marker represents a different concept • Bathroom[“old-fashioned”, “standard”, “modern”, “luxurious”] 0.5 0.5 “extravagant old-fashioned bathrooms” 1 1

Subjective queries against subjective data Hotel (hotelname, capacity, address, price_pn, *room_cleanliness, *bathroom, Subjective data *service, *comfort) … Room is comfortably clean. The continental breakfast is OK. ... Marker summaries Room_cleanliness [very_clean, average, dirty, very_dirty] Bathroom [old, standard, modern, luxurious] Service [exceptional, good, average, bad, very_bad] Bed [very_soft, soft, firm, very_firm, ok, worn_out] … Apartment was clean, staff friendly. Pool was adequate. ... ... “Hotels with really clean rooms and is a romantic getaway.” … showerhead with many settings, thick luxurious towels, … friendly staff. … Apartment was clean, staff friendly. Pool was adequate. ... … Apartment was clean, staff friendly. Pool was adequate. ... Subjective query Linguistic domains ...

Subjective database queries “Find hotels with cost less than $150 per night, has really clean rooms and is a romantic getaway.” select * from Hotels whereprice_pn < 150 and “ has really clean rooms ” and “ is a romantic getaway ” V. Zhong, C.Xiong, and R. Socher. Seq2SQL: Generating structured queries from natural language using reinforcement learning. arXiv 2017 :

Processing subjective database queries 0.7 “has really clean rooms” → room_cleanliness[“very clean”] “is a romantic getaway” → Service[“exceptional”] or Bathroom[“luxurious”] select * from Hotels where price_pn < 150 and “has really clean rooms” and “is a romantic getaway” Predicate Interpretation 0.6 “ has really clean rooms ”, “ is a romantic getaway ” Compute degrees of truth for each hotel Query result: • Holiday Hotel 2. Inn Hotel ... Fuzzy combination/join

Subjective Databases: Additional Challenges • Consider the user profile: “romantic” may not mean the same to every person. • OpineDB already supports qualifying on the reviewers: e.g., consider only reviews of people who reviewed > 10 hotels. • Consider semantics of different attributes: • E.g., for some attributes, more recent reviews matter more.

The Ubiquity of SubjectivityData Engineering Bulletin, March 2019 Data: Subjective databases Presentation: Avoid bias and tailor to the frame of mind of the recipient. Decision making: people don’t make decisions only based on facts Every computer scientist should understand how data is perceived and used. Explore literature in fields like psychology, behavioral economics, and neuroscience.

Backup slides

Subjective database team Yuliang Li, Aaron Feng, Jinfeng Li, Saran Mumick, Alon Halevy, Vivian Li Development & UI: Sara Evensen, Huining Li, George Mihaila, John Morales, Natalie Nuno, Kate Pavlovic, Xiaolan Wang

Predicate interpretation Interpret each predicate into a fuzzy logic expression over attribute markers. select * from Hotels h where price_pn < 150 ⨂ h.room_cleanliness ⩬ “really clean” ⨂ (h.service ⩬ “exceptional” ⨁ h.style ⩬ “luxurious”) select * from Hotels h where price_pn < 150 and “has really clean rooms” and “is a romantic getaway”

Predicate interpretation: The “easy” case Query predicates match directly to markers. “ has really clean rooms” ? “ is a romantic getaway” ? Marker summaries Room_cleanliness [very_clean, average, dirty, very_dirty] Bathroom [old, standard, modern, luxurious] Service [exceptional, good, average, bad, very_bad] Bed [very_soft, soft, firm, very_firm, ok, worn_out] “ has firm beds” “ luxurious bathrooms ”

Predicate interpretation: The “harder” case Query predicates have arbitrary phrases. • Problem: Given a query predicate p, find the marker(s) that best represent p. • Word embedding method: • Find variations similar to p based on its word embedding. • Co-occurrence method: • Find a marker whose linguistic variations frequently co-occur with p in the reviews.

Predicate interpretation: word embedding method • Find best semantically matching variations to p. • p = query predicate, • w2v(w) = word vector of w, • idf(w) = inverse document frequency of w in the review corpus. • Interpretation: corresponding marker of q with highest similarity score to p above a certain threshold.

Example output of co-occurrence method

When all else fails … : Text-retrieval method • Apply traditional IR techniques • when both word embedding method and co-occurrence method fail. • Represent reviews of each hotel by a single document D (concatenate all reviews). • Compute BM25(D, p).

Processing subjective database queries select * where price_pn < 150 and “ has really clean rooms ” and “ is a romantic getaway ” 0.7 Predicate Interpretation “has really clean rooms” → room_cleanliness[“very clean”] “is a romantic getaway” → Service[“exceptional”] ⨁ Bathroom[“luxurious”] “ has really clean rooms ”, “ is a romantic getaway ” 0.6 Compute degrees of truth for each hotel Query result: • Holiday Hotel 2. Inn Hotel ... Fuzzy aggregation

Compute then aggregate the degrees of truth • Computes a degree of truth for each interpreted predicate. • How well does the marker summary represent the query predicate? • Combine degrees of truth. • Multiplication variant • X ⨂ Y = deg(X) * deg(Y) • NOT X = 1-deg(X) • X ⨁ Y = (1-(1-deg(X)*(1-deg(Y))

Processing subjective database queries select * where price_pn < 150 and “ has really clean rooms ” and “ is a romantic getaway ” 0.7 Predicate Interpretation “has really clean rooms” → room_cleanliness[“very clean”] “is a romantic getaway” → Service[“exceptional”] ⨁ Bathroom[“luxurious”] “ has really clean rooms ”, “ is a romantic getaway ” 0.6 Compute degrees of truth for each hotel Query result: • Holiday Hotel 2. Inn Hotel ... Fuzzy aggregation

Aggregate Markers and Marker Summaries To aggregate, you need a scale • Aggregate markers are chosen phrases in the domain • Marker summaries: aggregate the reviews according to the markers.

Queries in Subjective Databases • Queries include subjective and objective attributes: • Map query predicates to schema attributes or combinations thereof • Otherwise, go directly to the data • Combining conjuncts using fuzzy logic

The Unreasonable Ubiquity of Subjectivity Data Presentation Decision making

Subjective Databases

Subjective Databases

Presentation Transcript

Identifying Subjective Language

Subjective refraction

Subjective Probability

Pain is subjective

Subjective Perception:

Managing Subjective Employee Appraisal

Subjective refraction

Subjective reports of remembering

Exploiting Subjective Annotations

Assessing the subjective: Meaningful evaluation of subjective disciplines using rubrics.

A subjective account

Optimizing Subjective Results

Subjective Vs. Objective Writing

Subjective Tests Results

Subjective well-being

Objective or Subjective?

Immersive Subjective Testing

FIRST PERSON SUBJECTIVE

Subjective refraction

Subjective Question # 1 Kinetics