1 / 24

Searching and Ranking Documents based on Semantic Relationships

Searching and Ranking Documents based on Semantic Relationships. Boanerges Aleman-Meza LSDIS lab , Computer Science, University of Georgia. Paper presentation ICDE Ph.D. Workshop 2006 April 3rd, 2006, Atlanta, GA, USA.

xaria
Télécharger la présentation

Searching and Ranking Documents based on Semantic Relationships

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Searching and Ranking Documents based on Semantic Relationships Boanerges Aleman-Meza LSDIS lab, Computer Science, University of Georgia Paper presentation ICDE Ph.D. Workshop 2006 April 3rd, 2006, Atlanta, GA, USA This work is funded by NSF-ITR-IDM Award#0325464 titled '‘SemDIS: Discovering Complex Relationships in the Semantic Web’ and NSF-ITR-IDM Award#0219649 titled ‘Semantic Association Identification and Knowledge Discovery for National Security Applications.’

  2. Outline • Research Problem • Proposed Solution • Preliminary Results • Outstanding Future Work • Conclusions and Future work

  3. Today’s use of Relationships (for web search) • ‘href’ relationships between documents • documents as a whole • No explicit relationships are used • other than co-occurrence • Implicit semantics • such as page importance (some content from www.wikipedia.org)

  4. But, more relationships are available • Documents are connected through concepts&relationships • i.e., MREF [SS’98] • Named-entities can be identified • with respect to existing data, such as ontologies (some content from www.wikipedia.org)

  5. Complex Relationships • People will use Web search not only for documents, but also for information about semantic relationships [SFJMC’02] • Relationships play an important role in the continuing evolution of the Web [SAK’03]

  6. Complex Relationships • Semantic Relationships: named-relationships connecting information items • their semantic ‘type’ is defined in an ontology • go beyond ‘is-a’ relationship (i.e., class membership) • Have gained interest in the Semantic Web • operators “semantic associations” [AS’03] • discovery and ranking [AHAS’03, AHARS’05, AMS’05] • Relevant in emerging applications: • content analytics – business intelligence • knowledge discovery – national security

  7. Research Problem How we can exploit semantic relationships of named-entities to improve relevance in search and ranking of documents?

  8. Proposed Solution: Diagram View • Builds upon the following capabilities: • Populated Ontologies • Semantic Annotation • RDF databases • It can be done [ABEPS’05] • Demonstrated with small dataset • Using explicit, named relationships [SRT’05] • Allows to explain why a document is relevant

  9. Research Challenges • Ranking Complex Relationships • Utilization of populated Ontologies • Defining and measuring what is relevant • Addressing Scalability

  10. Proposed Solution: Big Picture Ranking Complex Relationships[AHAS’03, AHARS’05] User-defined Context for Document Retrieval [ABEPS’05] Searching and Ranking Documents based on Semantic Relationships Large Populated Ontologies [AHSAS’04] Relevance Measures using Semantic Relationships [ANR+06] (current work)

  11. Goal: Search and Ranking of Documents using Relationships

  12. Rarity Association Length Organization Political Organization Democratic Political Organization Subsumption Context Trust Ranking Complex Relationships Association Rank Popularity

  13. Populated Ontologies: SWETO • SWETO: Semantic Web Technology Evaluation Ontology [AHSAS’04] • Large scale test-bed ontology containing instances extracted from heterogeneous Web sources • Domain: cs-publications, locations, terrorism • Over 800K entities, 1.5M relationships (version 1.4) • Developed using Freedom toolkit • (www.semagix.com) • Version 1.4

  14. Defining what is relevant Ultimately, many entities are inter-connected! … Which ones are relevant?

  15. … Defining what is relevant • Relevance is determined by considering: • - type of next entity (from ontology) • name of connecting relationship • length of discovered path so far • (short paths are preferred) • cumulative relevance score • other properties such as transitivity • user-defined context (if any)

  16. … Defining what is relevant • Involves human-defined relevance of • specific path segments • The simplest case, • a YES/NO question: • Is it relevant to discover entities through a ‘ticker’ relationship? • … yes? • Is it relevant to discover entities through a ‘industry focus’ relationship? • … no? ticker (Company) x industry focus y

  17. … Measuring what is relevant has industry focus Information-loss: measure that defines a cut-off on whether a sequence of relationships is still relevant (extending [MKIS’00]) Technology Consulting listed in has industry focus 499 Fortune 500 (20+) leader of listed in listed in leader of based at Plano Tina Sivinski Electronic Data Systems 7K+ EDS NYSE:EDS ticker listed in

  18. Preliminary Results • Using human-defined relevance • pruned to 5 relevant paths • naïve method (all paths) • results in over 24K paths • (of up to length 5) Technology Consulting has industry focus Fortune 500 listed in leader of based at Plano Tina Sivinski Electronic Data Systems EDS NYSE:EDS ticker listed in

  19. Outstanding Future Work • Formalize relevance-threshold idea • leading to claim/lemma with proof • Address Scalability Issues • refinement of current indexing techniques • Release of SWETO-DBLP Ontology • enhanced ontology of DBLP data • Comprehensive Evaluations • human-subjects & comparisons with related work

  20. Future Work: Context:why, what, how? • Context  Focused/Personalized Relevance • Context captures users’ interest to provide him/her with relevant results • By selecting concepts/relations/entities of the ontology Will build upon our previous work [AHAS’03, ABEPS’05]

  21. Related Work • Semantic Searching and Ranking of entities on the Semantic Web • Rocha et al. WWW’2004 • Nie et al. WWW’2005 • Guha et al. WWW’2003 • Stojanovic et al. ISWC’2003 • Zhuge et al. WWW’2003

  22. References [ABEPS’05] B. Aleman-Meza, P. Burns, M. Eavenson, D. Palaniswami, A.P. Sheth: An Ontological Approach to the Document Access Problem of Insider Threat, IEEE ISI-2005 [ASBPEA’06] B. Aleman-Meza, A.P. Sheth, P. Burns, D. Paliniswami, M. Eavenson, I.B. Arpinar: Semantic Analytics in Intelligence: Applying Semantic Association Discovery to determine Relevance of Heterogeneous Documents, Adv. Topics in Database Research, Vol. 5, 2006 (in print) [AHAS’03] B. Aleman-Meza, C. Halaschek, I.B. Arpinar, and A.P. Sheth: Context-Aware Semantic Association Ranking, First Intl’l Workshop on Semantic Web and Databases, September 7-8, 2003 [AHARS’05] B. Aleman-Meza, C. Halaschek-Wiener, I.B. Arpinar, C. Ramakrishnan, and A.P. Sheth: Ranking Complex Relationships on the Semantic Web, IEEE Internet Computing, 9(3):37-44 [AHSAS’04] B. Aleman-Meza, C. Halaschek, A.P. Sheth, I.B. Arpinar, and G. Sannapareddy: SWETO: Large-Scale Semantic Web Test-bed, Int’l Workshop on Ontology in Action, Banff, Canada, 2004 [AMS’05] K. Anyanwu, A. Maduko, A.P. Sheth: SemRank: Ranking Complex Relationship Search Results on the Semantic Web, WWW’2005 [AS’03] K. Anyanwu, and A.P. Sheth, ρ-Queries: Enabling Querying for Semantic Associations on the Semantic Web, WWW’2003

  23. References [HAAS’04] C. Halaschek, B. Aleman-Meza, I.B. Arpinar, A.P. Sheth, Discovering and Ranking Semantic Associations over a Large RDF Metabase, VLDB’2004, Toronto, Canada (Demonstration Paper) [MKIS’00] E. Mena, V. Kashyap, A. Illarramendi, A.P. Sheth, Imprecise Answers in Distributed Environments: Estimation of Information Loss for Multi-Ontology Based Query Processing, Int’l J. Cooperative Information Systems 9(4):403-425, 2000 [SAK’03] A.P. Sheth, I.B. Arpinar, and V. Kashyap, Relationships at the Heart of Semantic Web: Modeling, Discovering and Exploiting Complex Semantic Relationships, Enhancing the Power of the Internet Studies in Fuzziness and Soft Computing, (Nikravesh, Azvin, Yager, Zadeh, eds.) [SFJMC’02] U. Shah, T. Finin, A. Joshi, J. Mayfield, and R.S. Cost, Information Retrieval on the Semantic Web, CIKM 2002 [SRT’05] A.P. Sheth, C. Ramakrishnan, C. Thomas, Semantics for the Semantic Web: The Implicit, the Formal and the Powerful, Int’l J. Semantic Web Information Systems 1(1):1-18, 2005 [SS’98] K. Shah, A.P. Sheth, Logical Information Modeling of Web-Accessible Heterogeneous Digital Assets, ADL 1998

  24. Data, demos, more publications at SemDis project web site, http://lsdis.cs.uga.edu/projects/semdis/Thank You

More Related