Efficient summarization framework for multi-attribute uncertain data

Efficient summarization framework for multi-attribute uncertain data Jie Xu, Dmitri V. Kalashnikov, SharadMehrotra

The Summarization Problem Extractive Uncertain Data Set face(e.g. Jeff, Kate) O1 On O8 … O11 O2 O1 O25 location (e.g. LA) Abstractive visual concepts (e.g. water, plant, sky) Kate Jeff wedding at LA

Summarization Process Modeling Information Extract best subset information summary dataset object What information does this image contain? … • Metrics? • Coverage Agrawal, WSDM’09; Li, WWW’09; Liu, SDM‘’09; Sinha, WWW’11 • Diversity Vee, ICDE’08; Ziegler, WWW’05 • Quality Sinha, WWW’11

Existing Techniques image customer review doc/micro-blog • Do not consider information in multiple attributes • Do not deal with uncertain data Hu et al. KDD’04 Ly et al. CoRR’11 Inouye et al. SocialCom’11 Li et al. WWW’09 Liu et al. SDM’09 Kennedy et al. WWW’08 Simon et al. ICCV’07 Sinha et al. WWW’11

Challenges • Design a summarization framework for • Multi-attribute Data • Uncertain/Probabilistic Data. face tags visual concept event time location visual concepts P(sky) = 0.7, P(people) = 0.9 data processing (e.g. vision analysis)

Limitations of existing techniques - 1 Existing techniques typically model & summarize asingle information dimension Summarize only information about visual content (Kennedy et al. WWW’08, Simon et al. ICCV’07) Summarize only information about review content (Hu et al. KDD’04, Ly et al. CoRR’11)

What information is in the image? Elemental IU {sky}, {plant}, … {Kate}, {Jeff} {wedding} {12/01/2012} {Los Angeles} Is that all? Intra-attribute IU Inter-attribute IU {Kate, Jeff} {sky, plant} … {Kate, LA} {Kate, Jeff, wedding} Even more information from attributes? …

Are all information units interesting? Is {Liyan, Ling} interesting? Is {Sharad, Mike} an interesting intra-attribute IU? Yes, they often have coffee together and appear frequently in other photos Yes from my perspective, because they are both my close friends Are all of the 2n combinations of people interesting? Shall we select a summary that covers all these information? Well, probably not! I don’t care about person X and person Y who happen to be together in the photo of this large group.

Mine for interesting information units O1 face T1 {Jeff, Kate} O2 face T2 {Tom} O3 face T3 {Jeff, Kate, Tom} Modified Item-set mining algorithm frequent correlated {Jeff, Kate} O4 face T4 {Kate, Tom} O5 face T5 {Jeff, Kate} … … On face Tn {Jeff, Kate}

Mine for interesting information units O1 face {Jeff, Kate} O2 face {Jeff} Mine from social context (e.g. Jeff is friend of Kate, Tom is a close friend of the user) O3 face {Jeff, Kate, Tom} {Jeff, Kate} {Tom} O4 face {Kate, Tom} O5 face {Jeff, Kate} … On face {Jeff, Kate}

Limitation of existing techniques – 2 • Can not handle probabilistic attributes IU dataset summary Not sure whether an object covers an IU in another object P(Jeff) = 0.6 ? 1 P(Jeff) = 0.8 3 2 objects 3 n … n

Deterministic Coverage Model --- Example Coverage = 8 / 14 information summary dataset object

Probabilistic Coverage Model Simplify to compute efficiently Expected amount of information covered by S Expected amount of total information Can be computed in polynomial time The function is sub-modular

Optimization Problem for summarization • Parameters: • datasetO = {o1, o2, · · · , on} • positive number K • Finding summary with Maximum Expected Coverage is NP-hard. • We developed an efficient greedy algorithm to solve it.

Basic Greedy Algorithm Initialize S = empty set Too many operations of computing Cov. (Iteration-level Optimization) For each object o in O \ S, Compute hkjhkhk Expensive to compute Cov. It is (Object-level optimization) Select o* with max Yes No done

Efficiency optimization – Object-level • Reduce thetime required to compute the coverage for one object • Instead of directly compute and optimize coverage in each iteration, compute the gain of adding one object o to summary S gain(S,o) = - • Updating gain(S,o) is much more efficient ( )

Submodularity of Coverage Expected Coverage Cov(S,O)is submodular: Cov(S, O) Cov(T, O) Cov(S ∪ o, O) – Cov(S, O) Cov(T ∪ o) - Cov(T, O)

Efficiency optimization – Iteration-level • Reduce the number of object-level computations (i.e. gain(S,o) ) in each iterationof the greedy process • While traversing objects in O \ S, we maintain • the maximum gain so far gain*. • an upper bound Upper(S, O) on gain(S,o). For any • prune an object o if Upper(S, o) < gain*. Update in constant time By submodularity By definition

Experiment -- Datasets • Facebook Photo Set 200 photos uploaded by 10 Facebook users • Review Dataset Reviews about 10 hotels from TripAdvisor. Each hotel has about 250 reviews on average. • Flickr Photo Set 20,000 photos from Flickr. visual concept face event time visual concept rating facets event time visual

Experiment – Quality

Experiment – Efficiency Basic greedy algorithm without optimization runs more than 1 minute

Summary • Developed a new extractive summarization framework • Multi-attribute data. • Uncertain/Probabilistic data. • Generates high-quality summaries. • Highly efficient.

Efficient summarization framework for multi-attribute uncertain data

Efficient summarization framework for multi-attribute uncertain data

Presentation Transcript

Multi-Attribute Risk Assessment

Multi-Attribute Utility Theory (MAUT)

A Generic Framework for Handling Uncertain Data with Local Correlations

A Rank-Rewrite Framework for XML Data Summarization

Descriptive Data Summarization (Understanding Data)

Uncertain Data Management

Topic Themes for Multi-Document Summarization

An Efficient Profile-Analysis Framework for Data-Layout Optimizations

Feasibility Mapping for Multi-attribute Decision Making

Representation Formalisms for Uncertain Data

An Efficient Multi-Dimensional Index for Cloud Data Management

Multi -Attribute Spaces: Calibration for Attribute Fusion and Similarity Search

Geospatial Attribute Data

Multi Criteria Operators for Multi-attribute Auctions

Attribute Data

Clustering Uncertain Data

Multi-Attribute Preference Logic

Geospatial Attribute Data

Managing Uncertain Data

Multi-Section Summarization Strategy

Computational Mechanisms for Multi-Attribute Exchange Markets

Uncertain Data