1 / 10

Research Meeting

Research Meeting. 2009-10-22 Jaeseok Myung. Summary. TA DB : project 3, midterm(24 명 응시 ) WEC : report, project (android), classroom, 수업 ( 정재목 이사 ) Research DESWeb 2010 1 st International Workshop on Data Engineering meets the Semantic Web in conjunction with ICDE 2010

yardley
Télécharger la présentation

Research Meeting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Research Meeting 2009-10-22 JaeseokMyung

  2. Summary • TA • DB : project 3, midterm(24명 응시) • WEC : report, project (android), classroom, 수업(정재목 이사) • Research • DESWeb 2010 • 1st International Workshop on Data Engineering meets the Semantic Web in conjunction with ICDE 2010 • Submission : Nov 15th, 6 pages • 논문 개요 작성 • LUBM 변환, Complex Query 선정 Center for E-Business Technology

  3. SPARQL Basic Graph Pattern Processing with Iterative MapReduce • Abstract • In this paper, we propose an iterative MapReduce(MR) algorithm for SPARQL Basic Graph Pattern (BGP). Generally, a BGP may have a lot of self-join in itself, but because of MR’s shared-nothing architecture, it is difficult to process such join operations with MR framework. In other words, an expensive MR iteration is needed for getting a shared join key between two graph patterns. For this reason, we suggest an algorithm which reduces the number of MR iteration, and we examine the algorithm with the Lehigh University Benchmark(LUBM). Our experiments are based on physically separated RDF storage and parallel data processing framework, and the result shows that the algorithm provides scalable access to large RDF data. Center for E-Business Technology

  4. Outline • Introduction • Related Work • BGP Processing with MR • MR Iteration (Join시 MR iteration 발생이유, N-Triple 저장 구조) • Naïve Approach (Single-Random) • Our Approach • Multi-Greedy Algorithm • Discussion (edge preserving, type별 performance, key selection) • Experiments • Environmental Settings (Hadoop, LUBM, Complex Query, Amazon EC2, Converter) • SPARQL Processing Results (node개수 변화, 데이터 size 변화) • Dealing with Intermediate Result (중간의 파일 IO 비용 크다, CGL-MR) • Conclusion (N-Triple보다 복잡한, 압축가능한 저장 구조 및 인덱싱 연구 필요) • Reference Center for E-Business Technology

  5. Outline2 • Introduction • Related Work • BGP Processing with MR • MR Iteration (Join시 MR iteration 발생이유, N-Triple 저장 구조) • Naïve Approach (Single Point –Random Selection) • Multi-point Greedy Selection Algorithm • Experiments • Environmental Settings (Hadoop, LUBM, Complex Query, Amazon EC2, Converter) • SPARQL Processing Results (node개수 변화, 데이터 size 변화) • Discussion • Discussion (edge preserving, type별 performance, key selection) • Dealing with Intermediate Result (중간의 파일 IO 비용 크다, CGL-MR) • Conclusion (N-Triple보다 복잡한, 압축가능한 저장 구조 및 인덱싱 연구 필요) • Reference Center for E-Business Technology

  6. Introduction (1/2) • SPARQL is a recommendation of W3C for querying RDF data • RDF활용을 위해 SPARQL이 중요하고, BGP가 SPARQL Pattern matching의 기본임을 설명 • SPARQL BGP Processing is difficult, because BGP may have a significant number of self-joins which is expensive • Many researches were conducted with a perspective of single machine triplestore • However, for some tasks, we may need multiple machines and federated query processing techniques Center for E-Business Technology

  7. Introduction (2/2) • MR is a distributed & parallel data processing framework, which is good at large-scale data analysis • Unfortunately, MR has not been considered as the best option for join operations which are inherent in graph pattern matching algorithms • heterogeneous 하고 shared-nothing이기 때문 • Some researchers have employed iterative MR, but the iteration is expensive • In this paper, we propose an algorithm which reduces the number of MR iteration for BGP Processing • The rest of the paper is organized as follow Center for E-Business Technology

  8. Related Work • SPARQL Processing • BGP, Join (single machine), Triplestore • Data Processing with MR • Google, Hadoop, Hive, Pig • PDBMS vs. MR • Federated SPARQL Processing • DARQ, YARS2, Virtuoso, … • SPARQL Processing with MR is a new approach, but it takes advantage of above researches Center for E-Business Technology

  9. An Example of BGPs ub:Faculty ub:Chair ub:GraduateStudent ub:Lecturer rdf:type rdf:type rdf:type rdf:type ub:advisor ub:publicationAuthor ?j1 ?x ?d1 ?n1 ?p1 ub:teacherOf ub:memberOf ub:teacherOf ub:worksFor ub:takesCourse ub:hasAlumnus ?j3 ?y ?m3 ?o1 ub:subOrganizationOf ub:subOrganizationOf rdf:type rdf:type rdf:type ub:Course ub:Department ub:Person ?l1

  10. Reference • M. Stocker et al, SPARQL Basic Graph Pattern Optimization Using Selectivity Estimation, WWW 2008 • C. Weiss et al, Hexastore: Sextuple Indexing for Semantic Web Data Management, VLDB 2008 • D. J. Abadi et al, SW-Store: a vertically partitioned DBMS for Semantic Web data management, VLDB Journal 2009 • T. Neumann et al, Scalable Join Processing on Very Large RDF Graphs, SIGMOD 2009 • H. Yang et al, Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters, SIGMOD 2007 • A. Pavlo et al, A Comparison of Approaches to Large-Scale Data Analysis, SIGMOD 2009 • A. Abouzeid et al, HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads, VLDB 2009 • C. Olston et al, Pig Latin: A Not-So-Foreign Language for Data Processing, SIGMOD 2008 • J. Ekanayake et al, MapReduce for Data Intensive Scientific Analyses, ESCIENCE 2008 • J. Cohen, Graph Twiddling in a MapReduce World, CISE 2009 • B. Quilitz et al, Querying Distributed RDF Data Sources with SPARQL, ESWC 2008 • A. Harth et al, YARS2: A Federated Repository for Querying Graph Structured Data from the Web, ISWC 2007 Center for E-Business Technology

More Related