100 likes | 230 Vues
This summary highlights the research presentation from Jaeseok Myung on December 28, 2009, focusing on the iterative MapReduce (MR) technique for efficient SPARQL Basic Graph Pattern (BGP) processing. The discussion includes the implementation challenges using HBase, the rationale for iterative MR, and comparisons between naive and multi-greedy approaches. Experimental results demonstrate performance on Hadoop with datasets from LUBM and complex queries. The conclusion indicates the need for advanced storage structures and indexing strategies to enhance efficiency in processing complex queries.
E N D
Research Meeting 2009-12-28 JaeseokMyung
Summary • 수업(성적입력) • 학부생졸업논문(이승재, 김홍찬) • 서울대 멘토링 진행중 • Research • SPARQL BGP Processing with Iterative MR • Implementation: Hbase • WAIM 2010(1/29), VLDB 2010(3/9) • How MR works for triples? • Why do we need iterative MRs? Center for E-Business Technology
Outline • Introduction • Related Work • BGP Processing with MR • MR Iteration (Join시 MR iteration 발생이유, N-Triple 저장 구조) • Naïve Approach (Single-Random) • Our Approach • Multi-Greedy Algorithm • Discussion (edge preserving, type별 performance, key selection) • Experiments • Environmental Settings (Hadoop, LUBM, Complex Query, Amazon EC2, Converter) • SPARQL Processing Results (node개수 변화, 데이터 size 변화) • Dealing with Intermediate Result (중간의 파일 IO 비용 크다, CGL-MR, MR-Online) • Conclusion (N-Triple보다 복잡한, 압축가능한 저장 구조 및 인덱싱 연구 필요) • Reference Center for E-Business Technology
MapReduce 한재선, SearchDay2008, http://nexr.tistory.com Center for E-Business Technology
How MR works fortriples? (1/2) SELECT ?a ?b WHERE { ?a dbpedia:spouse ?b. ?a dbpedia:wikilinkdbpediares:actor. ?b dbpedia:wikilinkdbpediares:actor. ?a dbpedia:placeOfBirth ?c. ?b dbpedia:placeOfBirth ?c } Actors who are married to each other and born in the same place 1 2 3 4 5 2 4 1 3 5 a1 (1), (2), (4) … a1 a1 a1 a1 b1 a1 b1 a1 a1 b1 b1 a1 place spouse spouse link link place place link spouse place link place b1 c1 c1 actor c1 c1 actor b1 actor actor c1 b1 Mapper … b1 (1), (3), (5) c1 … (4), (5) … Center for E-Business Technology
How MR works for triples? (2/2) SELECT ?a ?b WHERE { ?a dbpedia:spouse ?b. ?a dbpedia:wikilinkdbpediares:actor. ?b dbpedia:wikilinkdbpediares:actor. ?a dbpedia:placeOfBirth ?c. ?b dbpedia:placeOfBirth ?c } Actors who are married to each other and born in the same place 1 2 3 4 5 2 4 1 3 5 a1 a1 spouse b1 (1, 2, 4) link actor … b1 b1 a1 a1 a1 place spouse place link link b1 actor c1 actor c1 Reducer place c1 b1 a1 spouse b1 link actor … (1, 3, 5) c1 a1 place c1 … (4, 5) b1 place … Center for E-Business Technology
Why do we need iterative MR? SELECT ?a ?b WHERE { ?a dbpedia:spouse ?b. ?a dbpedia:wikilinkdbpediares:actor. ?b dbpedia:wikilinkdbpediares:actor. ?a dbpedia:placeOfBirth ?c. ?b dbpedia:placeOfBirth ?c } Actors who are married to each other and born in the same place a|c a 1 2 3 4 5 2 4 a|b 1 b b|c 3 5 a1 a1 spouse b1 (1, 2, 4) link actor … a1 a1 b1 a1 b1 place link spouse link place actor c1 actor b1 c1 place c1 b1 (1, 3, 5) a1 spouse b1 link actor … (4, 5) c1 a1 place c1 … b1 place … … Center for E-Business Technology
Why do we need iterative MR? SELECT ?a ?b WHERE { ?a dbpedia:spouse ?b. ?a dbpedia:wikilinkdbpediares:actor. ?b dbpedia:wikilinkdbpediares:actor. ?a dbpedia:placeOfBirth ?c. ?b dbpedia:placeOfBirth ?c } Actors who are married to each other and born in the same place a|c a 1 2 3 4 5 2 4 a|b 1 b b|c 3 5 a|b b|c a|d 3 1 2 a|c 2 a|b b|c c|d a|b 4 a|e 1 1 2 3 6 a|b b|c c|d d|e a|g 5 a|f 1 2 3 4 (b) (c) (d) … (a) Center for E-Business Technology
Naïve vs. Our Approach • 정리 진행중 Center for E-Business Technology
Outline • Introduction • Related Work • Preliminaries • BGP Processing with MR • MR Iteration (Join시 MR iteration 발생이유, N-Triple 저장 구조) • Naïve Approach (Single-Random) • Our Approach • Multi-Greedy Algorithm • Improvement • Using Advanced Storage for Selection Task • Using Selectivity Info. for Minimizing BGP Iteration • Discussion (edge preserving, type별 performance, key selection) • Experiments • Environmental Settings (Hadoop, LUBM, Complex Query, Amazon EC2, Converter) • SPARQL Processing Results (node개수 변화, 데이터 size 변화) • Dealing with Intermediate Result (중간의 파일 IO 비용 크다, CGL-MR, MR-Online) • Conclusion (N-Triple보다 복잡한, 압축가능한 저장 구조 및 인덱싱 연구 필요) • Reference Center for E-Business Technology