200 likes | 341 Vues
Explore key research areas in data warehousing, including lineage tracing, incremental view maintenance, indexing, and data quality. This overview highlights several pivotal studies such as lineage tracing frameworks, metadata models, and query language applications. Dive into diverse topics like practical lineage tracing, data integration methods, bitmap index design, and proactive data quality management techniques. This curated list serves as a foundation for exploring contemporary challenges and advancements in data warehousing environments.
E N D
Research topics in data warehouse Directed By : Dr Rahgozar Mostafa h.Chehreghani
List of research topics • Lineage tracing • Incremental view maintenance • Indexing in data warehouse • Data quality
Lineage tracing • List of papers : • Using AutoMed Metadata in Data Warehousing Environments • A Tutorial on the IQL Query Language • Practical Lineage Tracing in Data Warehouses • Incremental view maintenance and data lineage tracing in heterogeneous database environments • A Framework for supporting data integration using the materialized and virtual approaches
Lineage tracing • Automed: model for metadata in data warehouse • Use tag for relations • Use a language such as IQL • Node , Edge , Constraint • IOL: • Functional and typed language • Prefix and Infix functions • New functions by lambda • lambda {x,y,z} ((*) ((+) x y) z)
IQL • let v = q1 in q2 • let v = ((+) 200 500) in ((*) v v) • union : R ++ S • duplicate elimination: distinct (R) • setUnion R S Ξdistinct (R ++ S) • difference : R – S • projection : [{x,z} | {x,y,z} <- R] • Cartesian product and Joins • gc agFun xs • map f xs • Grouping and Aggregation Operations
Using IQL in Automed • Example : Enforce unique key constraint: (=) (count (distinct [n | {s,n} <- <<Student,name>>])) (count <<Student>>) • Name : field • Student : table
Example of lineage tracing • TS1,S2 = addNode (dept,{“Maths”,“CompSci”}); • addNode (person, [x| x mathematician] ++ [x| x compScientist]); • addNode (avgDeptSalary, {avg [s| (m,s)«_, mathematician, salary»]} ++ {avg [s| (c,s)«_, compScientist, salary»]}); • addEdge («_, dept, person», [( “Maths”, x)| x mathematician] ++ [(“CompSci”, x) | x compScientist]); • addEdge («_, person, salary», «_, mathematician,salary» ++ «_, compScientist, salary»); • addEdge («_, dept, avgDeptSalary», {( “Maths”, avg [s| (m,s) «_, mathematician, salary»]),
Example of lineage tracing • (“CompSci”, avg [s| (c,s)«_, compScientist, salary»])}); • delEdge («_, mathematician, salary», [(p, s)| (d, p) «_, dept, person»; (p’, s) «_, person, salary»; d = “Maths”; p = p’]); • delEdge («_, compScientist, salary», [(p, s)| (d, p) «_, dept, person»; (p’, s) «_, person, salary»; d = “CompSci”; p = p’}); • delNode (mathematician, [p| (d, p) «_, dept, person»; d = “Maths”]); • delNode («compScientist», [p| (d, p) «_, dept, person»; d = “CompSci”]);
Incremental view maintenance • List of papers • Incremental view maintenance and data lineage tracing in heterogeneous database environments • View maintenance in a warehousing environment • A System Prototype for Warehouse View Maintenance
Incremental view maintenance • Di : set of base relations • ΔDi : bags inserted into Di • ⌂Di : bags deleted from Di • V : materialized view • ΔV : bags inserted into V • ⌂V : bags deleted from V • Vnew = (V ++ ΔV) -- ⌂V • Minimality condition • ΔV C V • ΔV∩ ⌂V = Ø
Indexing in data warehouse • Paper • Bitmap Index Design and Evaluation • Advantages : • Compact size • Efficient hardware support for bitmap operations (AND, OR, XOR, NOT) • Fast search
Data quality in data warehouse • List of papers • Towards Quality-Oriented Data Warehouse Usage and Evolution • Data Quality Problems and Proactive Data Quality Management in Data-Warehouse-Systems • Data Warehouse Data Policy • Fitness for use • Subjective : • Related to end users • Objective : • Definition of system • Models: • GQM : Goal Question Metric • English
GQM • Goal factor • Importance of each factor determined respect to Goal • Quality dimension : • Data coherence • Data Completeness • Data freshness