Maintaining Temporal Coherency of Virtual Data Warehouses

Maintaining Temporal Coherency of Virtual Data Warehouses Raghav Srinivasan, Chao Liang, Krithi Ramamritham Department of Computer Science, University of Massachusetts IEEE Real-Time Systems Symposium

Outline • Introduction • Time-To-Live • Choosing a good TTL • Performance Analysis • Experimental Result • Conclusion

Introduction • Maintaining Temporal Coherency • S(t): the data value at the source at time t U(t): the data value at the user at time t C(t): the data value at the cache at time t • the system must guarantee c: • user specify a temporal coherency requirements for each data item of interest • it can be specified in units of time or value (3 minutes or 1 dollar) • it associated with user tolerance ex: a user may desire to have stronger coherency requirements for data items such as stock prices than news information.

Introduction • Pull vs. Push • Clients Pullthe data based on the dynamics of the data and a user’s coherency requirements. • Servers with Push capability maintain state information pertaining to clients and push only those changes that are of interest to a user.

Time-To-Live • Time-To-Live (TTL) • The next time at which the client should pull the server to refresh the data item. • Based on the data change rate and the coherency control.

Choosing a good TTL • Static TTL - based on a priori assumptions • Semi-static TTL - based on observed maximum rate of change • Dynamic TTLdr - based on the most recent source changes • Dynamic TTLds - based on keeping TTL within static bounds • An Adaptive approach

Static TTL – based on a priori assumptions • Lower TTL  Higher #pollings (Higher bandwidth) Higher Fidelity • Higher TTL Lower #pollings (Lower bandwidth) Lower Fidelity • Simplicity

Semi-static TTL - based on observed maximum rate of change • S(0), S(1), … , S(l) : the data values at the source at different points of time in chronological order • T0, T1, …, Tl : the TTL values that resulted in the respective W values • Tl : the latest TTL value • changel : the latest data change • TTLestl : an estimate of the TTL value • TTLmr: the fastest source change so far (the smallest TTL used so far)

Dynamic TTLdr - based on the most recent source changes • TTLestl : a candidate for the next TTL value using only the most recent observations • TTLestl-1 : a candidate for the next TTL value using only the penultimate observations • TTLdr : the new TTL value set by the dynamic TTL approach • w : weight w (0.5 <= w < 1) is a measure of the relative change between the recent and the old changes • It assumes that recent changes are likely to be reflective of the changes in the near future. • More recent changes affect the new TTL more than the older changes

Dynamic TTLds - based on keeping TTL within static bounds • TTLmin : change rapidly  TTL tends to get closer to TTLmin (the low end of the interval) • TTLmax : change slowly  TTL tends to get closer to TTLmax (the high end of the interval)

An Adaptive approach • TTLmin : change rapidly  TTL tends to get closer to TTLmin (the low end of the interval) • TTLmax : change slowly  TTL tends to get closer to TTLmax (the high end of the interval) • TTLmr’ : accommodate both of TTLmr and TTLestl • TTLmr: the fastest source change so far (the smallest TTL used so far) • TTLestl : an estimate of the TTL value corresponds to the recent change • f : the fudge factor( 0 <= f <= 1 )

Traces used for the Experiment

Performance Analysis • Metrics used • #pollings • the number of times the source is polled • network bandwidth • VProb : the consistency violation probability • ti: the durations when happens • T : the total time for which data was presented to a user • Tradeoff between #pollings and VProb • Performance better  VProb lower  #pollings higher

Violation Probability

Numbers of Pollings

Conclusion • A combination of Push and Pull maintain data consistency. • Client are allowed to specify temporal constraints, so that the displayed results are updated only when the changes are of interest to the user. • The adaptive algorithm’s performance was shown to be much better than other algorithms that are less adaptive.

Maintaining Temporal Coherency of Virtual Data Warehouses