260 likes | 388 Vues
This document presents an overview of the Telegraph project at UC Berkeley, which focuses on enhancing Java's data processing capabilities. The project introduces an in-memory database and a dynamic query engine designed for diverse data sources, such as web feeds and sensors. Key components such as a query parser and execution framework are discussed, alongside performance metrics, memory management strategies, and garbage collection issues. Insights into using Java objects for tuple storage and the impact of JNI on performance are also explored, providing a comprehensive understanding of adaptive query processing in Java.
E N D
Telegraph Java Experiences Sam Madden UC Berkeley madden@cs.berkeley.edu RightOrder : Telegraph & Java
Telegraph Overview • 100% Java • In memory database • Query engine for alternative sources • Web • Sensors • Testbed for adaptive query processing RightOrder : Telegraph & Java
Telegraph & WWW : FFF • Federated Facts and Figures • Collect Data on the Election • Based on Avnur and Hellerstein Sigmod ‘00 Work: Eddies • Route tuples dynamically based on source loads and selectivities RightOrder : Telegraph & Java
fff.cs.berkeley.edu RightOrder : Telegraph & Java
Architecture Overview • Query Parser • Jlex & CUP • Preoptimizer • Chooses Access Paths • Eddy • Routes Tuples To Modules RightOrder : Telegraph & Java
Modules • Doubly-Pipelined Hash Joins • Index Joins • For probing into web-pages • Aggregates & Group Bys • Scans • Telegraph Screen Scraper: View web pages as Relations RightOrder : Telegraph & Java
Execution Framework • One Thread Per Query • Iterator Model for Queries • Experimented with Thread Per Module • Linux threads are expensive • Two Memory Management Models • Java Objects • Home Rolled Byte Arrays RightOrder : Telegraph & Java
Tuples as Java Objects • Tuple Data stored as a Java Object • Each in separate byte array • Tuples copied on joins, aggregates • Issues • Memory Management between Modules, Queries, Garbage collector control • Allocation Overhead • Performance: 30,000 200byte tuples / sec -> 5.9 MB / sec RightOrder : Telegraph & Java
Byte Array Offset, Size Offset, Size Offset, Size Directory Surrogate Objects Tuples As Byte Array • All tuples stored in same byte array / query • Surrogate Java Objects RightOrder : Telegraph & Java
Byte Array (cont) • Allows explicit control over memory / query (or module) • Compaction eliminates garbage collection randomness • Lower throughput: 15,000 t/sec • No surrogate object reuse • Synchronization costs RightOrder : Telegraph & Java
Other System Pieces • XML Based Catalog • Java Introspection Helps • Applet-based Front End • JDBC Interface • Fault Tolerance / Multiple Servers • Via simple UNIX tools RightOrder : Telegraph & Java
RightOrder Questions • Performance vs. C • JNI Issues • Garbage Collection Issues • Serialization Costs • Lots of Java Objects • JDBC vs ODI RightOrder : Telegraph & Java
Performance Vs. C • JVM + JIT Performance Encouraging: IBM JIT == 60% of Intel C compiler, faster than MSC for low level benchmarks • IBM JIT 2x Faster than HotSpot for Telegraph Scans • Stability Issues • www.javalobby.org/features/jpr RightOrder : Telegraph & Java
JIT Performance vs C Optimized Intel Optimized MS IBM JIT Source: www.javalobby.org/features/jpr RightOrder : Telegraph & Java
Performance Gotchas • Synchronization • ~2x Function Call overhead in HotSpot • Used in Libraries: Vector, StringBuffer • String allocation single most intensive operation in Telegraph • Mercatur: 20% initial CPU Cost • Garbage Collection • Java dumb about reuse • Mercatur: 15% Cost • OceanStore: 30ms avg latency, 1S peak RightOrder : Telegraph & Java
More Gotchas • Finalization • Finalizing methods allows inlining • Serialization • RMI, JNI use serialization • Philippsen & Haumacher Show Performance Slowness RightOrder : Telegraph & Java
Performance Tools • Tools to address some issues • JAX, Jopt: make bytecode smaller, faster • www.alphaworks.ibm.com/tech/JAX • www.condensity.com • Bytecode optimizer • www.optimizeit.com • Good profiler, memory allocation and garbage collection monitor RightOrder : Telegraph & Java
JNI Issues • Not a part of Telegraph • JNI overhead quite large (JDK 1.1.8, PII 300 MHz) Source: Matt Welsh. A System Support High Performance Communication and IO In Java. Master’s Thesis, UC Berkeley, 1999. RightOrder : Telegraph & Java
More JNI • But, this is being worked on • IBM JDK 100,000 B copy in 5ms, vs 23ms for 1.1.8 (500 Mhz PIII) • JNI allows synchronization (pin / unpin), thread management • See http://developer.java.sun.com/developer/onlineTraining/Programming/JDCBook/jni.html • GCJ + CNI: access Java objects via C++ classes • http://gcc.gnu.org/java/ RightOrder : Telegraph & Java
Garbage Collection • Performance • Big problem: 1 S or longer to GC lots of objects • Most Java GCs blocking (not concurrent or multi-threaded) • Unexpected Latencies • OceanStore: Network File Server, 30ms avg. latencies for network updates, 1000 ms peak due to GC • In high-concurrency apps, such delays disastrous RightOrder : Telegraph & Java
Garbage Collection Cont. • Limited Control • Runtime.gc() only a hint • Runtime.freeMemory() unreliable • No way to disable • No object reuse • Lots of unnecessary memory allocations RightOrder : Telegraph & Java
Serialization • Not in Telegraph • Philippsen and Haumacher, “More Efficient Object Serialization.” International Workshop on Java for Parallel and Distributed Computing. San Juan, April, 1999. • Serialization costs for RMI are 50% of total RMI time • Discard longevity for 7x speed up • Sun Serialization provides versioning • Complete class description stored with each serialized object • Most standard classes forward compatible (JDK docs note special cases) • See http://java.sun.com/products/jdk/1.2/docs/guide/serialization/spec/serialTOC.doc.html RightOrder : Telegraph & Java
Lots of Objects • GC Issues Serious • Memory Management • GC makes programmers allocate willy-nilly • Hard to partition memory space • Telegraph byte-array ugliness due to inability to limit usage of concurrent modules, queries RightOrder : Telegraph & Java
Storage Overheads • Java Object class is big: • Integer requires 23 bytes in JDK 1.3 • int requires 4.3 bytes • No way to circumvent object fields • Use primitives or hand-written serialization whenever possible RightOrder : Telegraph & Java
JDBC vs ODI • No experience with Oracle • JDBC overheads are high, but don’t have specific performance numbers RightOrder : Telegraph & Java
Bottom Line • Java great for many reasons • GC, standard libraries, type safety, introspection, etc. • Significant reductions in development and debugging time. • Java performance isn’t bad • Especially with some tuning • Memory Management an Issue • Lack of control over JVMs bad • When to garbage collect, how to serialize, etc. RightOrder : Telegraph & Java