Efficient Data Compression for Mobile Devices with Severe Storage Constraints
Mobile devices often work offline and require downloading large query results for later use, but they face storage and processing limitations. This paper presents a method to effectively store and access large data sets, using the example of low and high stock prices from a quote table. By compressing each attribute individually and selecting appropriate compression methods based on semantic and statistical characteristics, we improve space-saving and minimize decompression costs. Our approach outperforms standard Windows CE methods and LZ77 in both compression efficiency and access speed.
Efficient Data Compression for Mobile Devices with Severe Storage Constraints
E N D
Presentation Transcript
Motivation • Mobile devices often work offline, and users often need to download large query results for later use. • Results are often accessed in small pieces. • Mobile devices have severe storage and processing constraints. How to store more results ? How to access them quickly?
An Example • Select two years daily low and high stock prices from a quote table. • The result contains six attributes: year, month, day, ticker, low, and high price. It is order by year, month, day, and ticker. • 343 KB result size. 10,000 Tuples. • The client is a palm size CASSIOPEIA device running Windows CE with 4 MB RAM (2MB of persistent data storage and 2 MB of program memory).
Our Approach • Compress each attribute individually. • Utilize information of the query result: • Choose a combination of compression methods based on semantic and statistical information of the result. • Because different attributes have different characteristics, there is no unique winner. • The choice is made by estimating compression cost, decompression cost, transfer cost and storage cost. reduce decompression cost Increase compression ratio
Demonstration • We compare our methods with Windows CE’s default method and page level LZ77 (used in WinZip, PKZIP, Gzip). • We compare the space saving and decompression cost (measured by access time). • Our approach is far better than WinCE’s method in space saving and adds little extra decompression cost. Our approach also beats LZ77 both on space saving and decompression cost.
Example select S_SUPPKEY, N_NAME, S_PHONE, O_ORDERDATE, L_SHIPDATE, SUM (L_EXTENDEDPRICE*(1- L_DISCOUNT)) AS REVENUE from LINEITEM, SUPPLIER, NATION, ORDER where L_SHIPDATE < O_ORDERDATE + 3 months AND S_SUPPKEY = L_SUPPKEY AND S_NATIONKEY = N_NATIONKEY AND L_ORDERKEY = O_ORDERKEY group by S_SUPPKEY, N_NAME, S_PHONE, O_ORDERDATE, L_SHIPDATE order by S_SUPPKEY, O_ORDERDATE having REVENUE between 10,000 AND 100,000