1 / 20

by Nirish Dhruv Department of Computer Science Advisor Dr. Taek Kwon

by Nirish Dhruv Department of Computer Science Advisor Dr. Taek Kwon Department of Electrical and Computer Engineering. DESIGN OF LARGE SCALE DATA ARCHIVAL AND RETRIEVAL SYSTEM FOR TRANSPORTATION SENSOR (WRITE-ONCE-READ-MANY TYPE) DATA. Graduate Comitee

Télécharger la présentation

by Nirish Dhruv Department of Computer Science Advisor Dr. Taek Kwon

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. by Nirish Dhruv Department of Computer Science Advisor Dr. Taek Kwon Department of Electrical and Computer Engineering DESIGN OF LARGE SCALE DATA ARCHIVAL AND RETRIEVAL SYSTEM FOR TRANSPORTATION SENSOR (WRITE-ONCE-READ-MANY TYPE) DATA. Graduate Comitee Dr. Donald Crouch Dr. Carolyn Crouch Dr. Taek Kwon Department of Computer Science Department of Computer Science Department of Electrical and Computer Engineering

  2. Background • ITS sensor networks produce huge amount of data • Presently used for operational and monitoring uses due to huge size of data • Examples: RWIS, WIM and traffic detector networks • Efficient archival/retrieval need for planning and research

  3. Problem Statement • Present TMC Archive • Flat zip compressed format • Difficult to extract spatially correlated data • Need for efficient archival / retrieval for spatially and/or temporally correlated data

  4. 1 2 3 4 5 . . . . 2880 00:00:00 00:00:30 00:01:00 00:01:30 00:02:00 . . . . 23:59:30 1-byte 1-byte 1-byte 1-byte 1-byte . . . . 1-byte 2-byte 2-byte 2-byte 2-byte 2-byte . . . . 2-byte Zip ###.v30 & ###.o30 files for 4000 Sensors ###.v30 file (2880 bytes) ###.o30 file (5760 bytes) yyyymmdd.traffic file Existing File Format and Archive • Unified Traffic Data Format (UTDF) Record Time Volume Occupancy

  5. Review of Large Data Archive • Data Warehouse • Inflow: To get data from various systems • Upflow: Put data to a more compact from • Downflow: Put compact data form to archival storage • Outflow: Output data to consumers as required • Metaflow: To manage warehouse itself

  6. Why Data Warehouse? • Simplicity • Better Quality of Data • Fast Access • Platform Independent

  7. Hierarchical Data Format (HDF) • File format and library for storing scientific data • Software includes I/O libraries and tools for analyzing, visualizing, and converting scientific data. • Platform Independent

  8. Common Data Format (CDF) • Self-describing data abstraction for the storage and manipulation of multi-dimensional data in discipline-independent format • File format and a library • Transparent data compression • Platform Independent • API available in C, FORTRAN, Java, and Perl

  9. Creating Traffic CDF Traffic Archive C Program (.EXE) CDF 2.7 C API (DLL, Lib and cdf.h file) traffic.cdf

  10. Traffic Data Archive in CDF • Designing Data Structure for traffic data • Setting Dimensions • Setting Variances • Setting CDF variables, CDF data types, CDF attributes (meta-data), and compression algorithm

  11. Data Organization

  12. Variances Specification for traffic CDF

  13. CDF Compression Algorithms

  14. CDF Archive (.cdf) C Program using CDF API (.EXE) Station Definition ~~~~~~~~~ ~~~~~~~~~ ~~~~~~~~~ ~~~~~~~~~ Volume Count (.txt) ~~~~~~~~~ ~~~~~~~~~ ~~~~~~~~~ ~~~~~~~~~ Data Retrieval in CDF

  15. Traffic Data Archive (zipped Binary files) Traffic Data Archive (SQL Server 2000) Dynazip Active X control ADODB Connection 32-bit ODBC (DSN) Visual Basic Interface Data Archive in SQL Server

  16. Retrieval Task Station 1: 10069N Detectors: 3263,3264,3265,3266 Station 1: Volume Computation 3263(Vol)+ 3264(Vol)+ 3265(Vol+ 3266(Vol) Text File: 10069N Total Vol 10069S Total Vol . . . 17750W Total Vol Station 2: 10069S Station 2: Volume Computation Station 492: Volume Computation Station 492:17750W

  17. Results on single day traffic data

  18. Conclusions • Transportation archive using CDF could be a better archive due to following reasons • More data storage with almost no additional storage requirements • Indexed data allowing random access • Open standard, portable and free • Can be used directly with many scientific visualization and analysis packages

  19. Conclusions • RDBMS is less suitable for large-scaled traffic data due to following reasons • Large storage requirements due to overheads • Retrieval is comparatively quite slow • Initial investment is expensive

  20. Future Work • Using XML with CDF for web • Scaling CDF • Adding more Features • Variables and attributes

More Related