1 / 23

Efficient summarization framework for multi-attribute uncertain data

Efficient summarization framework for multi-attribute uncertain data. Jie Xu, Dmitri V. Kalashnikov, Sharad Mehrotra. The Summarization Problem. Extractive. Uncertain Data Set. face ( e.g. Jeff, Kate ). O 1. O n. O 8. …. O 11. O 2. O 1. O 25. l ocation ( e.g. LA ).

hewitt
Télécharger la présentation

Efficient summarization framework for multi-attribute uncertain data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Efficient summarization framework for multi-attribute uncertain data Jie Xu, Dmitri V. Kalashnikov, SharadMehrotra

  2. The Summarization Problem Extractive Uncertain Data Set face(e.g. Jeff, Kate) O1 On O8 … O11 O2 O1 O25 location (e.g. LA) Abstractive visual concepts (e.g. water, plant, sky) Kate Jeff wedding at LA

  3. Summarization Process Modeling Information Extract best subset information summary dataset object What information does this image contain? … • Metrics? • Coverage Agrawal, WSDM’09; Li, WWW’09; Liu, SDM‘’09; Sinha, WWW’11 • Diversity Vee, ICDE’08; Ziegler, WWW’05 • Quality Sinha, WWW’11

  4. Existing Techniques image customer review doc/micro-blog • Do not consider information in multiple attributes • Do not deal with uncertain data Hu et al. KDD’04 Ly et al. CoRR’11 Inouye et al. SocialCom’11 Li et al. WWW’09 Liu et al. SDM’09 Kennedy et al. WWW’08 Simon et al. ICCV’07 Sinha et al. WWW’11

  5. Challenges • Design a summarization framework for • Multi-attribute Data • Uncertain/Probabilistic Data. face tags visual concept event time location visual concepts P(sky) = 0.7, P(people) = 0.9 data processing (e.g. vision analysis)

  6. Limitations of existing techniques - 1 Existing techniques typically model & summarize asingle information dimension Summarize only information about visual content (Kennedy et al. WWW’08, Simon et al. ICCV’07) Summarize only information about review content (Hu et al. KDD’04, Ly et al. CoRR’11)

  7. What information is in the image? Elemental IU {sky}, {plant}, … {Kate}, {Jeff} {wedding} {12/01/2012} {Los Angeles} Is that all? Intra-attribute IU Inter-attribute IU {Kate, Jeff} {sky, plant} … {Kate, LA} {Kate, Jeff, wedding} Even more information from attributes? …

  8. Are all information units interesting? Is {Liyan, Ling} interesting? Is {Sharad, Mike} an interesting intra-attribute IU? Yes, they often have coffee together and appear frequently in other photos Yes from my perspective, because they are both my close friends Are all of the 2n combinations of people interesting? Shall we select a summary that covers all these information? Well, probably not! I don’t care about person X and person Y who happen to be together in the photo of this large group.

  9. Mine for interesting information units O1 face T1 {Jeff, Kate} O2 face T2 {Tom} O3 face T3 {Jeff, Kate, Tom} Modified Item-set mining algorithm frequent correlated {Jeff, Kate} O4 face T4 {Kate, Tom} O5 face T5 {Jeff, Kate} … … On face Tn {Jeff, Kate}

  10. Mine for interesting information units O1 face {Jeff, Kate} O2 face {Jeff} Mine from social context (e.g. Jeff is friend of Kate, Tom is a close friend of the user) O3 face {Jeff, Kate, Tom} {Jeff, Kate} {Tom} O4 face {Kate, Tom} O5 face {Jeff, Kate} … On face {Jeff, Kate}

  11. Limitation of existing techniques – 2 • Can not handle probabilistic attributes IU dataset summary Not sure whether an object covers an IU in another object P(Jeff) = 0.6 ? 1 P(Jeff) = 0.8 3 2 objects 3 n … n

  12. Deterministic Coverage Model --- Example Coverage = 8 / 14 information summary dataset object

  13. Probabilistic Coverage Model Simplify to compute efficiently Expected amount of information covered by S Expected amount of total information Can be computed in polynomial time The function is sub-modular

  14. Optimization Problem for summarization • Parameters: • datasetO = {o1, o2, · · · , on} • positive number K • Finding summary with Maximum Expected Coverage is NP-hard. • We developed an efficient greedy algorithm to solve it.

  15. Basic Greedy Algorithm Initialize S = empty set Too many operations of computing Cov. (Iteration-level Optimization) For each object o in O \ S, Compute hkjhkhk Expensive to compute Cov. It is (Object-level optimization) Select o* with max Yes No done

  16. Efficiency optimization – Object-level • Reduce thetime required to compute the coverage for one object • Instead of directly compute and optimize coverage in each iteration, compute the gain of adding one object o to summary S gain(S,o) = - • Updating gain(S,o) is much more efficient ( )

  17. Submodularity of Coverage Expected Coverage Cov(S,O)is submodular: Cov(S, O) Cov(T, O) Cov(S ∪ o, O) – Cov(S, O) Cov(T ∪ o) - Cov(T, O)

  18. Efficiency optimization – Iteration-level • Reduce the number of object-level computations (i.e. gain(S,o) ) in each iterationof the greedy process • While traversing objects in O \ S, we maintain • the maximum gain so far gain*. • an upper bound Upper(S, O) on gain(S,o). For any • prune an object o if Upper(S, o) < gain*. Update in constant time By submodularity By definition

  19. Experiment -- Datasets • Facebook Photo Set 200 photos uploaded by 10 Facebook users • Review Dataset Reviews about 10 hotels from TripAdvisor. Each hotel has about 250 reviews on average. • Flickr Photo Set 20,000 photos from Flickr. visual concept face event time visual concept rating facets event time visual

  20. Experiment – Quality

  21. Experiment – Efficiency Basic greedy algorithm without optimization runs more than 1 minute

  22. Summary • Developed a new extractive summarization framework • Multi-attribute data. • Uncertain/Probabilistic data. • Generates high-quality summaries. • Highly efficient.

More Related