1 / 13

Measuring the Quality of Web Content using Factual Information

16. April 2012. Measuring the Quality of Web Content using Factual Information . WebQuality 2012 workshop at WWW 2012. Elisabeth Lex, Michael Voelske , Marcelo Errecalde , Edgardo Ferretti , Leticia Cagnina , Christopher Horn, Benno Stein and Michael Granitzer. Agenda .

anitra
Télécharger la présentation

Measuring the Quality of Web Content using Factual Information

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 16. April 2012 Measuringthe Quality of Web Content usingFactual Information WebQuality2012 workshop at WWW 2012 Elisabeth Lex, Michael Voelske, Marcelo Errecalde, Edgardo Ferretti, Leticia Cagnina, Christopher Horn, Benno Stein and Michael Granitzer

  2. Agenda • Motivation • Approach • Results • Summary and Outlook

  3. Motivation • People‘sdecisionsoftenbased on Web content • lackingqualitycontrol, noverification • Inaccurate, incorrectinfomation • Nofactchecking • Measuresneededtocapturecredibilityandqualityaspects • In respecttofacts!

  4. Approach • Measure information quality based on factual information • 3 Approaches: • Use simple statistics about the facts obtained from text • Exploit relational information contained in facts • Use semantic relationships like meronymy and hypernymy • First approach: • Use simple statistical features about facts in a document • Indicates how informative a document is • Derive facts from Web content using Open Information Extraction

  5. Definition ofFactualDensity • Fact Count • Factual Density

  6. Experiments • Wikipedia: 1000 FeaturedandGoodarticles versus 1000 Non-Featured (randomlyselected) • Featured: a comprehensivecoverageofthemajorfacts in thecontextofthearticle’ssubject • Baseline: Word Count [Blumenstock 2008] • Featuredarticleslongerthan non-featured • Bias: longerdocscontainmorefacts • Evaluation: 2 Datasets • Unbalanced: articlesdiffer in length • Balanced: articlessimilar in length

  7. Distributionsofdocs in bothdatasets in respecttowordcount

  8. Precision/Recall curvesofFactualDensity

  9. ResultsFactualDensity on balancedcorpus

  10. Experiments – Relational Features • Approach 2: exploiting relational informationcontained in facts • Extract relational featuresfromarticles • UserelationsfromReVerb: binaryrelations (e1, relation, e2) • Usethemtotrain a classifiertodiscriminatebetweenfeatured/goodand non-featured

  11. Experiments – Relational Features • Approach 2: exploiting relational informationcontained in facts • Extract relational featuresfromarticles • UserelationsfromReVerb: binaryrelations (e1, relation, e2) • Usethemtotrain a classifiertodiscriminatebetweenfeatured/goodand non-featured

  12. Summary • Simple factrelatedmeasure: FactualDensity • Based on FactualDensity, featured/goodarticlescanbeseparatedfrom non-featuredifarticlelengthsimilar • Ifarticlesdiffer in length, wordcount!  Forfuturework, combinationofboth • Plan toincorporateedithistory: moreeditors, higherfactualdensity • Preliminaryexperimentswith relational features • Promising results, morework in thisdirection • Goal hereisto bring semantics in tothefieldof Information Quality • Weexpectthistounlockseveral IQ dimensions, e.g. generalityvsspecificity

  13. Thankyouforyourattention! • Elisabeth Lex • elex@know-center.at

More Related