640 likes | 666 Vues
Database Extensibility. Jacques Roy jacquesr@us.ibm.com. Agenda. What is Database Extensibility? Database and Extensibility background IDS features Extensibility Examples DataBlade Modules Bladelets Considerations for building your own. What is Database Extensibility?.
E N D
Database Extensibility Jacques Roy jacquesr@us.ibm.com
Agenda • What is Database Extensibility? • Database and Extensibility background • IDS features • Extensibility Examples • DataBlade Modules • Bladelets • Considerations for building your own
What is Database Extensibility? • Ability to add business components in the database • Tailor the database to the business environment • Reduce application complexity • Put the processing where it makes the most sense • Set processing provided by the database • Higher performance • Less data movement, better data representation, better indexing • Faster development, lower maintenance cost
Extensibility Background • Historic Approach • user exits, device drivers • Relational Databases • triggers, constraints, stored procedures • New development platforms • Web Server: CGI, NSAPI/ISAPI • App Servers: J2EE • Other • Eclipse and MS Visual Studio plugins • Object-Relational Capabilities • Relational Framework
User-defined types Distinct, opaque Table/Type inheritance Functional index R-Tree index Smart Large Objects More. . .Extensions can be written in:SPL, C, Java Complex types Row, set, multiset, list Polymorphism User-Defined Function Also known as user-defined routine User-Defined Aggregate Primary/Secondary Access Methods IDS Extensibility Features
When to Use UDFs/UDRs • Eliminate large data transfer • Transfer time is much larger than processing time • Simplify or improve processing • Eliminate table scan • Eliminate application set processing • Define new sorting order • Replace store procedures with aggregates • Provide business processing to applications • Consistency of result • Eliminate the need for custom interfaces
Better Grouping • Quarter() function:SELECT Quarter(date), SUM(income)FROM ordersWHERE Quarter(date) LIKE '2008Q%‘GROUP BY 1ORDER BY 1; • AgeGroup() function:SELECT AgeGroup(birthdate, 20, 10) AgeGrouping, SUM(total_price) TotalFROM customer c, orders o, items IWHERE c.customer_num = o.customer_numAND o.order_num = i.order_numGROUP BY 1ORDER BY 1 DESC; AgeGrouping Total 40-49 $12646.97 50-59 $ 5507.80
Quarter Function in “C” #include <mi.h> mi_lvarchar *quarter(mi_date date, MI_FPARAM *fparam) { mi_lvarchar *retVal; /* The return value. */ short mdy[3]; mi_integer ret, qt; char buffer[10]; /* Extract month, day, and year from the date */ ret = rjulmdy(date, mdy); qt = 1 + ((mdy[0] - 1) / 3); /* calculate the quarter */ sprintf(buffer, "%4dQ%d", mdy[2], qt); retVal = mi_string_to_lvarchar(buffer); return retVal; /* Return the value. */ }
Compiling and Linking # Compilation COMPILE=-I$(INFORMIXDIR)/incl/public -O -c cc -DMI_SERVBUILD $(COMPILE) quarter.c # Creating the shared library ld -G -o qlib.bld quarter.o # Change permission on file to 775 # read and execute to other chmod a+x qlib.bld
Creating the function CREATE FUNCTION quarter(date) RETURNS varchar(10) WITH (not variant, parallelizable) external name "$INFORMIXDIR/extend/class/qlib.bld(quarter)" LANGUAGE C; GRANT EXECUTE ON FUNCTION quarter(date) TO public;
Stored Procedures SQL group UDR order order scan scan scan scan UDR Processing vs. Stored Procedures
Better date manipulation: Day of the year Week of the year Week of the month Quarter Unit conversion Feet Meters Gallons Liters Fahrenheit Celsius Functional indexes CREATE FUNCTIONquarter(dt date) RETURNS integer WITH (NOT VARIANT) RETURN (YEAR(dt) * 100) + 1 + (MONTH(dt) - 1) / 3; END FUNCTION; Example: EXECUTE FUNCTION quarter('9/2/2008'); (expression) 200803 Using SPL for Extensions See Developer Works Informix zone:http://www-128.ibm.com/developerworks/db2/library/techarticle/dm-0510roy/
1.0 1.1 1.2 1.3 1.2.3 1.2.2 1.2.1 1.2.3.2 1.2.3.3 1.2.3.4 1.2.3.4.5 The Node Type • "Hierarchically" aware type: Node • Pre-processed the hierarchical relationships • ex: Chapter 11, section 7, paragraph 3: 11.7.3 • Add new way to relate objects to one another • IsAncestor(), IsChild(), IsDescendant(), IsParent() • Can change processing from exponential to linear • examples: • policies, product classification, bill-of-material, LDAP, XML, etc.
Node Application Example • Geo Hierarchy: • country > state > metro > city GeoNode Resource Policy Q: What policy apply to the Hyatt in Denver? A: A Colorado Policy
component Parts Bill-of-Material Example CREATE TABLE Components ( ComponentId Node, Sequence integer, Quantity integer, Name varchar(30), PartNumber integer); CREATE TABLE Parts ( PartNumber integer, Name varchar(30), ProviderId integer, Price Money(10,2) ); • A component can be made up of multiple components • A component is made up of multiple parts
Fine-Grained Auditing • Use triggers and user-define routines • Register event processing • Commit or rollback • Send events to file or outside process • Use a generic function for any table See the developerworks article:Event-driven fined-grained auditing with Informix Dynamic Serverhttp://www-106.ibm.com/developerworks/db2/library/techarticle/dm-0410roy/
New Trigger Use CREATE TRIGGER tab1instrig INSERT ON tab1 FOR EACH ROW ( EXECUTE PROCEDURE do_auditing2() ); • The API includes functions to find the context of the trigger.
Commit/Rollback MonitorProgram Callback Statement Register 8 4 7 3 2 1 6 5 Trigger Table EventTable Event-Driven Architecture
loans Retail Financial Services Healthcare Telco Manufacturing Clothing Food Novelties Inheritance and Polymorphism
Inheritance and Polymorphism (cont.) • SELECT branch_id, AVG(risk(loans)) FROM loans GROUP BY 1HAVING AVG(risk(loans)) > 1ORDER BY 2 DESC; • SELECT branch_id, AVGRISK(loans) FROM loans GROUP BY 1HAVING AVGRISK(loans) > 1ORDER BY 2 DESC;
Replacing Store Procedures with UDA • Business problem: Merger multi-polygon types (ESRI) • Original Solution: • SPL stored procedure (82 lines, 3 SELECT statements, 1 insert statement, 2 embedded FOREACH) • New Solution: User-Defined Aggregate • 23 lines of SPL, no SQL, no FOREACH SELECT a.user_id, a.case_id, a.case_type, a.serial_nr_full,a.spt_disp_theme_cd, do_union(a.shp)FROM case_shp aWHERE user_id = "user0001" GROUP BY 1, 2, 3, 4, 5INTO TEMP my_temp_table WITH NO LOG;
Original Code CREATE PROCEDURE union_caseid() DEFINE GLOBAL theUserId CHAR(8) DEFAULT USER; . . . BEGIN SELECT MAX(a.se_row_id) INTO max_serowid FROM case_shp a WHERE user_id = theUserId; FOREACH -- case_id in current townshipSELECT unique case_id INTO p_caseid FROM case_shp WHERE user_id = theUserId LET init = 'true'; LET newSeRowId = max_serowid + 999999; FOREACH -- shape and its display atributesSELECT a.case_type, a.serial_nr_full, a.spt_disp_theme_cd, a.shp, a.se_row_id INTO p_casetype, p_serialnrfull, p_sptdispthemecd, p_shp, p_serowid FROM case_shp a WHERE user_id = theUserId AND case_id = p_caseid IF ( p_shp IS NOT NULL ) THEN IF ( init = 'true' ) THEN LET init = 'false'; LET unionResult = p_shp; ELSE LET unionResult = (union(p_shp::MULTIPOLYGON, unionResult::MULTIPOLYGON)::MULTIPOLYGON); END IF -- init = 'true‘ END IF -- p_shp IS NOT NULL AND END FOREACH -- shape and its display atributes INSERT INTO case_shp VALUES (theUserId, p_caseid, p_casetype, p_serialnrfull, p_sptdispthemecd, unionResult, newSeRowId ); END FOREACH -- case_id in current township END -- BEGIN END PROCEDURE;
User-Defined Aggregate Code CREATE FUNCTION do_union_iter(state lvarchar, arg lvarchar) RETURNING lvarchar DEFINE retval lvarchar; IF (state IS NULL ) THEN RETURN arg ; END IF IF (arg IS NULL) THEN RETURN state ; END IF LET retval = state || arg ; RETURN retval ; END FUNCTION ; CREATE AGGREGATE do_union WITH (ITER=do_union_iter, COMBINE=do_union_iter);
User-Defined Aggregates • A lot more flexible than standard aggregate functions • Can take any type as input • ex.: row type • Can take an initialization parameter of any type • ex: row type • Can return complex types • ex: list
Fabric Classification • Business Problem:Provide an efficient way to select fabrics • Indexing Colors:Cielab coding (3 dimensions) and other attributes(Total of 5 dimensions) • Fabric Type HierarchyRequires a hierarchy-aware typeex: We want a “natural” fabric • Fabric Style and Patternsex: What does “Victorian” mean?
Fabric Classification Queries • Find the fabrics that have “blue” in them • “Blue” is not a specific value but a range of values in 3 dimensions • Find “natural” fabrics that have “blue” in them • What does “natural” mean? • Find “silk” fabrics that has “blue” in them • What does “silk” mean? There could be many types of silk An application wants the answer to the questions, not a list of potential candidates with “false positives”
Problem with Indexing Colors • B-tree index indexes only one dimension • Selects too many rows • Rule of thumb:Index selected if it returns less than 20% of the rows • Result: likely to do a table scan (read all rows) • Solution: Use an R-tree index • Multi-dimensional index • Requires the creation of a 5-dimension type (UDT) • R-tree index • Preferred indexing method for spatial data(spatial, geodetic) • Data types: point, line, polygon, etc.
Fabric Natural Synthetic Cotton . . . Wood . . . Silk . . . . . . Hard Soft Pine Handling the Fabric Type Hierarchy • Fabric types can be seen as a hierarchy • Hierarchies are difficult to deal with in a RDBMS • The hierarchical processing is only a part of the total required processing • We can use the Node type to solve our problem
Handling Fabric Types and Patterns • This was not addressed in the PoC • Create a pattern hierarchy • Define “Victorian” in terms of colors, fabric types, and patterns.
Solution Benefits • Scalability due to indexing • Colors • Hierarchy • Application simplification • No manipulation of extra data • No special algorithms to handle color matching, etc. • Performance • Less data movement • Less data through the network
Other examples • Other multi-dimensional problems: • 3D CAD drawings • Support for Globally Unique Identifier (GUID)http://www-128.ibm.com/developerworks/db2/library/techarticle/dm-0401roy/index.html • Soundex/Phonex
Advanced Feature: Named Memory • Usage:Allocate memory of a chosen duration that can be retrieved by name • Durations from function to server • Example:BeCentric uses names memory to keep formatted information in memory for quick sanctioning testing (financial entity validation)
Advanced Feature: Virtual Table/Index Interface • Think of those as similar to device drivers • Purpose functions:am_create, am_drop, am_open, am_close, am_insert, am_delete, am_update, am_stats, am_scancost, am_beginscan, am_getnext, am_rescan, am_endscan, am_getbyid, am_check • Usage:Used to make “something” look like a table or an index • Examples: • Excalibur Text (virtual index interface) • TimeSeries: make a TimeSeries look like a relational table (Virtual Table Interface)
Key to Successful Extensibility Project • Start small, develop your expertise • Remember the first OO projects • Use pre-built extensions • Study examples • Approaches: • Use DataBlade Modules • Use Bladelets • Build your own
What are DataBlade Modules? • Business Components • A set of functionality that solves a specific business problem • Building Blocks • Can include: • User-defined types • User-defined Functions • Tables, views • Client component • A DataBlade can come from IBM, a third party or be built in-house
Large Object Locator (LLD) Binary data type Node Basic Text Search (BTS) MQ Series Spatial XML and XSLT Web feature services Web Geodetic TimeSeries Real-Time Loader C-ISAM Image Foundation Video Foundation Excalibur Text Excalibur Image IDS DataBlade Modules
Spatial is Everywhere • Where are my stores located related to my distributors? • How can I efficiently route my delivery trucks? • How can I micro-market to customers fitting a particular profile near my worst performing store? • How can I set insurance rates near to flood plain? • Where are the parcels in the city that are impacted by a zoning change? • Which bank branches do I keep after the merger based on my customers locations (among other things)?
Complex Spatial Example CREATE TABLE e_Yellow_Pages ( Name VARCHAR(128) NOT NULL, Business Business_Type NOT NULL, Description Document NOT NULL, LocationGeoPoint NOT NULL, Occupied SET( Period NOT NULL ) ); -- -- “Show me available service stations -- specializing in Porsche brakes -- within 30 miles of where I am?” -- SELECT Y.Name FROM e_Yellow_Pages Y WHEREContains( Y.Location, Circle( :GPS_Loc, ‘30 Miles’ ) ) AND Y.Business MATCH “Automotive Service” ANDDocContains(Y.Description, “Porsche AND brakes”) AND NOTBooked ( Y.Occupied, Period (TODAY,TODAY+5)); • Raise the level of abstraction at the database level. • Buy components to solve common problems, and build components to achieve a competitive edge
+90(90° N) R latitude longitude 0 -90(90° W) Geodetic DataBlade • GeoSpatial objects consist of data that is referenced to a precise location on the surface of the earth • Includes altitude and time range dimensions • Applications: • Global, Polar, Trans-Pacific, High Accuracy • Basic computations: Complex, Expensive • Example: • Find all the atmospheric elements that were present in the Denver area during the last thunderstorm
Spatial vs. Geodetic: Distance Issue What is the distance from Anchorage to Tokyo? The shortest path is the shorter of thetwo possible geodesic paths:the thick part of the great circle
Spatial vs. Geodetic: straddling the 180th Meridian split flat-plane representation into 2 or more pieces MULTIPOLYGON( (-180 30,-165 30,-165 40, -180 40,-180 30), (180 30,180 40,165 40, 165 30, 180 30) ) POLYGON( (165 30, -165 30, -165 40, 165 40) )
XML Capabilities • Self-describing data • Useful for information exchange • Useful for flexible data definition • IDS provides: • Functions to generate XML (UDA). • Functions to extract part of an XML document. • Functions to test for the existence of an element in an XML document. • XSLT transformation • Extract, exist and validate functions use Xerces-C. • Open source provided by IBM. • <customer> • <row> • <customer_num>101</customer_name> • <fname>Ludwig </fname> • <lname>Pauli </lname> • <company>All Sports Supplies</company> • <address1>213 Earstwild Court</address1> • . . . • </row> • . . . • </customer>
Basic Text Search (BTS) DataBlade • Text search engine built into IDS. • Provides a variety of word and phrase searching on an unstructured document repository. • Search engine is provided by the CLucene text search package (open source). • Examples: SELECT pid FROM t WHERE bts_contains(text_data, 'foo AND bar'); SELECT id FROM product WHERE bts_contains(brands, 'standard', score # real) AND score > 70.0;
Basic Text Search (BTS) DataBlade • Supports Wildcards and Fuzzy Search: • “?” single character wildcard • “*” multiple character wildcard • “~” fuzzy search SELECT cat_advert, score FROM catalog WHERE bts_contains(cat_advert, 'soarness~ and classic', score # real); • Proximity, range searches • Use stop words • Can index XML documents • Index all or selected XML elements • Index all or selected attributes
TimeSeries DataBlade Module • A time series is a set of data as it varies over time • TimeSeries DataBlade optimizes storage usage • 50% Savings not uncommon • Optimized Access Time • 10 times performance improvement typical • Calendars • Defines period of time that data may or may not be collected • SQL, Java, and C interfaces • VTI Interface • Makes a time series look like a regular table
Who’s Interested in TimeSeries • Capital Markets • Arbitrage opportunities, breakout signals, risk/return optimization, portfolio management, VaR calculations, simulations, backtesting... • Telecommunications: • Network monitoring, load prediction, blocked calls (lost revenue) from load, phone usage, fraud detection and analysis... • Manufacturing: • Machinery going out of spec; process sampling and analysis • Logistics: • Location of a fleet (e.g. GPS); route analysis • Scientific research: • Temperature over time... • Other • Live Sports statistics
TimeSeries Performance Example • Financial brokerage firm: • Online stock collection • < 1 sec lag requirement • > 100,000 messages/sec • >5 TB of history kept online • Runs on a 4 CPU Linux box! Extreme Scalability!
MQ DataBlade • Message queues are useful for exchanging information asynchronously • Communication mechanism in an SOA environment • Used as an Enterprise Service Bus • Access WebSphere Message Queue (WMQ): • Function interface. • Table interface. • Transactional • must be used within a transaction • Message is sent only if the transaction commits
IDS Bladelets • Located at:http://www.ibm.com/developerworks/db2/zones/informix/library/samples/db_downloads.htmlandhttp://www.iiug.org/ver1/software/index_ORDBMS.html • Bladelets: • mrLvarchar, Node, regexp, shape, exec, period, etc. • For detailed information, see: "Open-Source Components for Informix Dynamic Server 9.x“Jacques Roy, William W. White, Jean T. Anderson, Paul G. BrownISBN 0-13-042827-2 Includes: Node, Period, ffvti, exec, shape, sqllib/IUtil, regexp, mrLvarchar, JPGImage