50 likes | 204 Vues
Efficient Processing of Semantic Information on the Web. Georg Lausen Technische Fakultät Universität Freiburg. Processing of Semantic Information on the Web. The amount of available information o n W eb still is increasing rapidly . (Semi-)Automatic Data Extraction .
E N D
Efficient Processing ofSemantic Information on the Web Georg Lausen Technische Fakultät Universität Freiburg
Processing ofSemantic Information on the Web • The amountof available information on Web still is increasingrapidly. • (Semi-)Automatic Data Extraction . • Resource Description Framework (RDF) . • SPARQL is the standard query language for RDF. • Efficiency and Scalability of query processing.
Efficiency andScalability: A Varietyof Approaches • Single machine RDF stores • Parallel Database Approach: Verticaandothers • Approaches based on Hadoop (MapReduceParadigm) • Hadoop • Hadoop++ • Integration ofdatabases: HadoopDB • Language translation • Mapping SPARQL toHadoop/HBasedirectly • Mapping SPARQL toPigLatin • Non Hadoopclusters
Cluster-basedParallelismvs Parallel Database/Single Machine RDF-Store Eachtechnologyhasitsownadvantagesandproblems. Roughcharacterization: Loading in thecontextof Web research: ExtractTransform Loadschema. SPARQL provides a declarativewayforspecifyingthetransformationandquerying.
ETL andQuerying in thecontextof Web research Initial RDF graph T Web documents L RDF store E EfficientLoading Efficientquerying SPARQL PigSPARQL: Mapping SPARQL toPigLatin; toappearSemantic Web Information Management – SWIM 2011