‘ The SEQUOIA 2000 Storage Benchmark ’ M.Stonebraker, J.Frew, K. Gardels, J. Meredith

‘The OO7 Benchmark’M.J.Carey, D.J.DeWitt, J.F.NaughtonUniversity of Wisconsin,MadisonVersion of January 1991 ‘The SEQUOIA 2000 Storage Benchmark’ M.Stonebraker, J.Frew, K. Gardels, J. Meredith University of California, Berkeley 1993 Μπαζιάνα Περιστέρα baziana@hotmail.com Προχωρημένα Θέματα Βάσεων Δεδομένων

The OO7 Benchmark • Benchmark: • a comprehensive test of OODBMS (Object-Oriented DataBase Management System) performance • goals to evaluate new techniques and algorithms for OODBMS implementation • gives the performance metrics for OODBMS design • OO7 Benchmark: • is implemented in 4 OODB systems: E/Exodus, Objectivity/DB, Ontos, Versant – Fail to test the ObjectStore, from Object Design Inc. • uncovers correctness and/or performance problems of the tested OODBMS Προχωρημένα Θέματα Βάσεων Δεδομένων

Performance Characteristics tested by OO7 • Speed of pointer traversals: over cashed data, over disk-resident data, sparse and dense traversals • Efficiency of updates: to indexed and unindexed object fields, repeated and sparse updates, updates of cashed data, creation and deletion of objects • Performance of query processor: on different kinds of queries Προχωρημένα Θέματα Βάσεων Δεδομένων

Related Work • OO1 Benchmark • HyperModel Benchmark • Initial Sun Benchmark, evaluating Vbase • A Complex Object Benchmark (ACOB)- for client/server applications • Magic Editor, UC Berkeley Προχωρημένα Θέματα Βάσεων Δεδομένων

OO1 Benchmark (Object-Operations ver.1, Sun Benchmark) • First standard benchmark • Measures performance for navigation and simple updates • Is based on a database consisting of: • Part objects – fields: id, type, (x,y) coordination, build date • Connections between part objects • Each part has 3 ‘out-going’ and many ‘in-going’ connections • Database size: 20.000 and 200.000 parts, to model applications whose data fits or exceeds memory • OO1 Benchmark operations: • ‘Lookup’ -1000 random parts by ids • ‘Traversal’- accesses 3.280 connected parts • ‘Insert’ – adds 100 new parts to database Προχωρημένα Θέματα Βάσεων Δεδομένων

HyperModel Benchmark • Richer schema and wider range of operations, than OO1 • Performance test based on hypertext application model • HyperModel database: graph of inter-connected nodes • Node Relationships: • One hierarchical 1:N relationship • Two M:N relationships • Types of nodes: • k-1 levels of non-leaf nodes, which hold many integer values • 1 level of leaf nodes, which hold a text string or a bitmap • HyperModel Benchmark operations: • Exact-match lookup (by integer attribute value) • Range query (1% and 10%) • Group lookup (follows the relationship from a random node to its related nodes) • Reference lookup (the inverse) Προχωρημένα Θέματα Βάσεων Δεδομένων

Why another benchmark? • We want to evaluate a wide range of OODB features • OO1 Benchmark: • Tests on simple navigations and updates only • Do not use complex objects, being important for many OODB applications • Reference between ‘nearby’ objects • Do not examine: density of traversals, traversal with updates, object queries • HyperModel Benchmark has no tests for: • Object queries • Updates to indexed vs. non-indexed object attributes • Repeated object updates • Impact of transaction boundaries Προχωρημένα Θέματα Βάσεων Δεδομένων

OO7 Database Description • OO7 goals to test many aspects of OODBMS performance, not to model a specific application • Size of OO7 Benchmark Database: Small, Medium, Large Προχωρημένα Θέματα Βάσεων Δεδομένων

The design library (1) • Basic component of the OO7 database: • A set of composite parts • Design library in the OO7 database: • Is a set of all composite parts • Num of composite parts per Module: NumCompPerModule (500) • Attributes of a composite part: • Id (integer), buildDate (integer), Type (character array) • Document object: • Associated with each composite part • Models a documentation concerning the composite part • Attributes: id (integer), title (character), text (character string) – length of string: DocumentSize • Composite part object and its document object: • Connected by a bi-directional association Προχωρημένα Θέματα Βάσεων Δεδομένων

The design library (2) • A composite part contains: • A graph of atomic parts – basic units constructing a comp. part • Num of atomic parts per com. Part: NumAtomicPerComp • 20 atomic parts per com. part (small benchmark) • 200 atomic parts per com. part (medium and large benchmark) • Attributes of an atomic part: • Id (integer), buildDate (integer), x,y (integer), docId (integer), Type (character array) • Each atomic part: • Is connected via bi-directional connections with other atomic parts (3,6,9)-NumConnPerAtomic • Connections between atomic parts in a ring, plus in random way • Connection objects: • Connect atomic parts • Attributes: length (integer), type (character array) Προχωρημένα Θέματα Βάσεων Δεδομένων

A Composite part and its associated document object Προχωρημένα Θέματα Βάσεων Δεδομένων

Assembling Complex Design (1) • We introduce the ‘assembly hierarchy’ to the database • Each assembly object is made up from: • Either composite parts (base assembly level) • Or other assembly objects (complex assembly level) • A base assembly object: • Attributes: Id (integer), buildDate (integer), Type (character array) • Has bi-directional association with: • 3 ‘shared’ composite parts (on a module basis) • 3 ‘unshared’ composite parts (both of them: NumCompPerAssm) • A complex assembly object (higher level of hierarchy): • Attributes: Id (integer), buildDate (integer), Type (character array) • Has bi-directional association with 3 sub-assemblies (NumAssmPerAssm): • Either base assemblies (if the complex assembly is at level two) • Or other complex assemblies (if the complex assembly is at higher level) • 7 levels in assembly hierarchy (NumAssmLevels) Προχωρημένα Θέματα Βάσεων Δεδομένων

Assembling Complex Design (2) • buildDate: • Range in base assemblies: 1000-1999 • Range in ‘young’ composite parts: 2000-2999 • Range in ‘old’ composite parts : 0-999 • The percentage of ‘young’ vs. ‘old’ composite parts: YoungCompFrac Προχωρημένα Θέματα Βάσεων Δεδομένων

Assembling Complex Design (3) • Each assembly hierarchy is called ‘module’ • Characteristics of a module: • Attributes: Id (integer), buildDate (integer), Type (character array) • Associated Manual object – larger version of a document Προχωρημένα Θέματα Βάσεων Δεδομένων

Testbed Configuration • 2 Sun Workstations on an Ethernet LAN • 1 Sun IPX workstation as Server: • 48 MB memory • 424 MB disk drive (model Sun0424): holds system S/W and swap space • 1.3 GB disk drive (model Sun 1.3.G): holds the database (actual data) • 424 MB disk drive (model Sun0424): holds recovery information • 1 Sun Sparc ELC workstation as client: • 24 MB memory • 207 MB disk drive (model Sun0207): holds system S/W and swap space • SunOS run on both workstations Προχωρημένα Θέματα Βάσεων Δεδομένων

Design Syntax Προχωρημένα Θέματα Βάσεων Δεδομένων

Software - E/Exodus (1) • Exodus consists of: • Exodus Storage Manager (EMS) • E programming language • The EMS : • Provides files of objects, B-trees, linear hashing • Uses a page-server architecture (current version 2.2) • Client processes request pages from server via TCP/IP • Server answers: • either from its buffer pool • or by invoking a disk process to perform the I/O operation. After reading the page, it gives it to the client process and keeps a copy to buffer pool • Provides concurrency control – at page and file level with non 2-PLprotocol • Provides recovery services – via logging the changed portion of objects Προχωρημένα Θέματα Βάσεων Δεδομένων

Software - E/Exodus (2) • The E-language: • Extends C++, adding persistence, collection of persistent objects, B-trees indexes • No support for associations, queries • Current version of E: • Based on GNU g++ compiler • Uses EMS for storing persistent objects • Operations on objects are compiled – Interpreter: EPVM 3.0, which stores the objects in the buffer pool of the EMS client process • Pointers between such objects are swizzled at traversal, and are unswizzled at the replacement of the page by the EMS • Our experiment with E/Exodus: • Disk page size: 8 KB – transfer unit between client and server • Client buffer pool: 12 MB - Server buffer pool: 36 MB Προχωρημένα Θέματα Βάσεων Δεδομένων

Software – Objectivity/DB,ver.2.1 • No page-server architecture: We have file-server architecture • No server process for handling data • Clients access database pages via NFS – need of separate lock server placed in Sun IPX • Recovery via shadows: • During transaction, updates are written to shadow database • At commit time, these updates apply to the database – are used to recover if commit fails • If transaction aborts, shadow database is deleted • Objectivity/DB provides: • persistence to C++ :Persistent objects are defined by inheritance from a persistent root class • Sets, relations and iterators • Our experiment with Objectivity/DB : • Client buffer pool: 12 MB - Server buffer pool: 36 MB • Database and shadow files are stored as Unix files Προχωρημένα Θέματα Βάσεων Δεδομένων

Software – Ontos,ver.2.2 (1) • Ontos employs client-server architecture • Objects are created in one of the three storage managers (SM) • ‘in-memory’ SM, manages transient objects –like C++ implementation • ‘Standard’ SM, implements the object-server architecture including the objects of locking and transferring from client to server • ‘group’ SM, implements the page-server architecture – OO7 benchmark composite parts, atomic parts, connection objects are created by the ‘group’ SM • Ontos: • Provides: sets, lists and associations • Do not support query optimizer for object-SQL • Supports nested transactions, optimistic concurrency control Προχωρημένα Θέματα Βάσεων Δεδομένων

Software – Ontos,ver.2.2 (2) • Recovery in Ontos: • Via REDO logging • During a transaction, all updates are buffered to virtual memory • At commit time, the updates are written to files in the server • After success, the updates are applied to the database • Buffering at the client: • No client buffer pool • Objects are kept in client’s virtual memory • Our experiment with Ontos: • Disk page size: 7.5 KB – transfer unit between client and server • Unix file systems are used to hold the database Προχωρημένα Θέματα Βάσεων Δεδομένων

Software – Versant 3.0 Beta • Versant employs: • Client-server architecture • Object-server architecture • Different objects for locking and transferring between client and server • Versant Object Manager: • caches objects during a transaction • Server Storage Manager: • Performs I/O with page granularity • Versant C++ interface: • adds persistence to C++ • Do not modify the C++ compiler Προχωρημένα Θέματα Βάσεων Δεδομένων

Results (1) • Results: Traversals, Queries, Structural Modification Operations • Database Size: Small Databases Medium Databases Προχωρημένα Θέματα Βάσεων Δεδομένων

Results (2) • Traversals • Are implemented as methods of the database objects • Navigate from object to object, invoking the appropriate method at each object • Run over ‘small’ and ‘medium’ OO7 benchmark databases • In ‘small’ OO7 databases, run in two ways: • ‘cold’, if the traverse begins with the client and server cashes empty • ‘hot’, if the first running is a ‘cold’ traversal and then run the same query 4 times reporting the average of the middle 3 runs Προχωρημένα Θέματα Βάσεων Δεδομένων

Traversal T1: Raw traversal speed • ‘Traverse the assembly hierarchy. As each base assembly is visited, visit each of its referenced unshared composite parts. As each composite part is visited, perform a depth first search on its graph of atomic parts. Return a count of the number of atomic parts visited when done’ Medium DB Προχωρημένα Θέματα Βάσεων Δεδομένων

Traversal T6: Sparse traversal speed • ‘Traverse the assembly hierarchy. As each base assembly is visited, visit each of its referenced unshared composite parts. As each composite part is visited, visit the root atomic part. Return a count of the number of atomic parts visited when done’ Προχωρημένα Θέματα Βάσεων Δεδομένων

Traversal T2: Traversal with updates • ‘Repeat Traversal T1, but update objects during the traversal. There are three types of update patterns in this traversal. In each, a single update to an atomic part consists of swapping its (x,y) attributes. The three types of updates are: A: Update one atomic part per composite part B: Update every atomic part as it is encountered C: Update each atomic part in a composite part 4 times When done, return the number of update operations that were actually performed’ Προχωρημένα Θέματα Βάσεων Δεδομένων

Traversal T3: Traversal with indexed field updates • ‘Repeat Traversal T2, except that now the update is on the date field, which is indexed. The specific update is to increment the date if it is odd, and decrement the date if it is even’ Προχωρημένα Θέματα Βάσεων Δεδομένων

Traversals T8 and T9: Operations on Manual • ‘Traversal T8 scans the manual object, counting the number of occurrences of the character ‘I.’. Traversal T9 checks to see if the first and the last character in the manual objects are the same’ Προχωρημένα Θέματα Βάσεων Δεδομένων

Traversals CU (Cached Update) • ‘Perform traversal T1, followed by T2A, in a single transaction. Report the total time minus the T1 hot time minus the T1 cold time’ Προχωρημένα Θέματα Βάσεων Δεδομένων

Queries…Query Q1: exact match lookup • ‘Generate 10 random atomic part id’s; for each part id generated, lookup the atomic part with that id. Return the number of atomic parts processed when done’ Προχωρημένα Θέματα Βάσεων Δεδομένων

Queries Q2, Q3 and Q7 • Query Q2: ‘ Choose a range for dates that will contain the last 1% of the dates found in the database’s atomic parts. Retrieve the atomic parts that satisfy this range predicate’ • Query Q3: ‘ Choose a range for dates that will contain the last 10% of the dates found in the database’s atomic parts. Retrieve the atomic parts that satisfy this range predicate’ • Query Q7: ‘ Scan all atomic parts’ Προχωρημένα Θέματα Βάσεων Δεδομένων

Query Q4: path lookup • ‘Generate 100 random document titles. For each title generated, find all base assemblies that use the composite part corresponding to the document. Also, count the total number of base assemblies that qualify’ Προχωρημένα Θέματα Βάσεων Δεδομένων

Query Q5: single-level make • ‘Find all base assemblies that use a composite part with a build date later than the build date of the base assembly. Also, report the number of qualifying base assemblies found’ Προχωρημένα Θέματα Βάσεων Δεδομένων

Query Q8: ad-hoc join • ‘Find all pairs of documents and atomic parts where the document id in the atomic part matches the id of the document. Also, return a count of the number of such pairs encountered’ Προχωρημένα Θέματα Βάσεων Δεδομένων

Structural Modifications: • 1: Insert Create five new composite parts, which includes creating a number of new atomic parts (100 in small configuration, 1000 in large, and five new document objects) and insert them into the database by installing references to these composite parts into 10 randomly chosen base assembly objects’ • 2: Delete ‘Delete the five newly created composite parts (and all of their associated atomic parts and document objects)’ Προχωρημένα Θέματα Βάσεων Δεδομένων

Small and large databases….. Small Databases Medium Databases Προχωρημένα Θέματα Βάσεων Δεδομένων

Conclusion.. • OO7 test results give OODBMS performance characteristics that are not observed in previous benchmarking systems • Each application’s requirements lead us to choose the best OODBMS suitable for it Προχωρημένα Θέματα Βάσεων Δεδομένων

The SEQUOIA 2000 Storage Benchmark • Collects the database requirements of Earth Sciences (ES): • Geography, hydrology, oceanography… • Hopes that it will be applied to general database community • Tests the performance of three DBMSs: • GRASS, IPW, POSTGRESS • Characteristics of ES applications: • Massive size: Database contains (satellite) images: 10^14 bytes (100Tbytes) • Complex data types: support for arrays, spatial objects, complex objects • Sophisticated searching: B-trees are adequate for searching for arrays and spatial data • Need for new benchmark for ES applications Προχωρημένα Θέματα Βάσεων Δεδομένων

Scaling benchmark • Regional benchmark: geographic region 1280 km x 800 km • National benchmark: geographic region 5500 km x 3000 km • Earth benchmark: all the earth….. • Kinds of data: • Raster data: tile area 1 km x 1 km (or 0.5 x 0.5), tile size 10 bits regional benchmark contains: 2x1280x2x800=4.096.000 tiles each: (5 observations)*(2bytes)=10 bytes • Point data: a geogr. feature is given by its name and its location (2x32 bits) regional benchmark occupies: 1.83 Mbytes • Polygon data: polygons contain regions of same landuse, av. pol. sides=50 regional benchmark occupies 19.1 bytes • Directed graph data: area unit is segment regional benchmark occupies 47.8 bytes Προχωρημένα Θέματα Βάσεων Δεδομένων

Benchmark Queries…(1) • Query 1: data load ‘Create and load the data base and build any necessary secondary indexes’ • Query 2: raster query ‘Select data for a given wavelength band and rectangular region ordered by ascending time’ POSTQUEL • Query 3: raster query • ‘Select data for a given time and geographic rectangle and then calculate an arithmetic function of the five wavelength band values for each cell in the study rectangle’ POSTQUEL • Query 4: raster query • ‘Select data for a given time, wavelength band and geographic rectangle. Lower the resolution of the image by a factor of 64 to a cell size of 4x4km and store it as a new object Προχωρημένα Θέματα Βάσεων Δεδομένων

Benchmark Queries…(2) • Query 5: Point query ‘Find the POINT record that has a specific name’ • Query 6: Polygon query ‘Find all the polygons that intersect a specific rectangle and store them in DBMS’ • Query 7: Polygon query ‘Find all the polygons that are more than a specific size and within a specific circle’ • Query 8: Spatial join ‘Show the landuse/landcover in a 50 km quadrangle surrounding a given point’ • Query 9: Spatial join ‘Find the raster data for a given landuse type in a study rectangle for a given wavelength and time’ • Query 10: Spatial join ‘Find the names of all points within polygons of a specific vegetation type and create this as a new DBMS object’ • Query 11: Recursion ‘Find all segments of any waterway that are within 20 km downstream of a specific point’ Προχωρημένα Θέματα Βάσεων Δεδομένων

Results Προχωρημένα Θέματα Βάσεων Δεδομένων

‘ The SEQUOIA 2000 Storage Benchmark ’ M.Stonebraker, J.Frew, K. Gardels, J. Meredith