Data modeling and metadata

Data modeling and metadata From graphs to graphs

Metadata • Full metadata: relational schemas • Self defining data: XML, key/value, key/document • No metadata: untagged images, video, audio • Parallel metadata: tagged images, video, audio

Full schema metadata • Origins: • Semantic networks in AI • Metadata mixed in with data • Objects (nodes in graph), has-a (arcs in graph), is-a (arcs in graph), types (nodes), subtypes (nodes) • Essentially a network with metadata and all instances of the metadata • Goal was to model knowledge of real world, not to manage volumes of data

Early databases • Slow to adopt data structuring abstractions because speed of access was the focus • Hierarchical and network databases • Links between records of one file to records of another • E.g., each claim record is linked to a subscriber record • Also, sets of records and sets of links

Relational databases • First true abstraction of metadata separated from data • Minimal structure in order to accommodate fast retrieval of tuples • Abstractions • Relation • Attribute • Tuple • PKs, CKs, FKs, null/not null

Concurrent with relational database development: “semantic” databases • Like semantic networks (quite deliberately), only metadata separated from data • Not object-oriented • No object IDs • No classes instantiated from types • A wide variety of competing models, with “the” Semantic Model being one of them

Semantic databases, continued • Other modeling notions • Components or aggregates that are necessary parts of an object and cannot be changed, like the day you were born or the VIN of a car • Versus Properties or attributes that can be changed, like your name or the transmission in a car • Cause and effect relationships • Such as a sales visit leading to a sale • And many other specialized relationships • Interestingly, no query facilities and no commercial systems that were successful

Persistent programming languages • Not necessarily object-oriented • Host language is the only language • Data can be persistent or not, often selectively • Strong notion of metadata as programming data types

Object-oriented databases • Strong notion of object ID and object identity • Types/subtypes and classes • Strong sense of metadata separate from data • Behavioral encapsulation

Object-relational databases • Objects in the small • User defined data types for attribute domains • No behavioral encapsulation

One-of-a-kind semantically rich databases • Engineering/CAD data • Complex objects • Lots of singleton types, but with strict notion of metadata • Complex constraints • Far reaching component and constraint relationships

One-of-a-kind scientific/medical/financial databases • Managing type-based, voluminous data with little internal structure (imaging) • Managing textual data with some structure and lots of domain-based terminology • Often there are real-time demands made on distributed databases – very difficult problem • By putting timing constraints on specific parts of the data processing code

Self-defining data • Inspired by need to stream data live and process it in one pass • Also inspired by the need to vary the structure of individual pieces of data, like documents and other items that don’t really have a shared type construct • XML developed as a shared language model for semi-structured (or self-defining) data • Developed in part to assist the construction of the semantic web • Data is streamed on the Internet or from sensors

Self-defining data, continued • NoSQL databases that store extremely high volumes of loosely structured data • Documents with internal structure • Values with no meaning within the database • Usually no formal query language, as data is interpreted programmatically (either partially or fully); sometimes there is a library of common query templates

No metadata databases • Early blob and continuous data • Images • Video • Audio • Flash • All processing of data taking place in complex programs that do not retrieve metadata or insert metadata in the data • E.g., image processing, facial searching, language searching

Recent blob/continuous data • Development of parallel metadata databases that contain low level and semantically rich tagging • Only the metadata database is actively searched • Searching can be enhanced by downloading small samples • Feedback loops to improve tag interpretation • Tags taken from shared namespaces

Assertion based databases • Usually use triples (assertions) • Triples are chained together to make new inferences • Metadata is treated like data • Joe owns a Ford • Fords are cars • SQL-like, triple-hopping query languages

Graph databases • Networks of objects that blur the boundary between data and metadata • Supports levels of connectivity orders of magnitude bigger than in network and hierarchical databases of old • Has a purpose that is reminiscent of network/hierarchical databases – to represent the fluid and highly interconnected nature of complex data, such as that collected from social media • Use graph-like query and programming interfaces

Graphics/animation/gaming data • Shares a lot of properties with scientific and engineering data • Innately mathematical • Straight and curved line 2D geometry used in 3-space • Bezier and NURBS for curves • Matrix mathematics for 3D manipulation • Transpose, Scale, Rotate • Mapping to pixel based data for presentation

Graphics/animation/gaming, continued • For real-time rendering, low polygon objects and bounding box collision mathematics used • Creates the most aggressive demands on processing and graphics card technology • Often no notion at all of metadata at all • Even non-real-time animation demands low quality interactive rendering

Procedural data • Used heavily in photo/video processing • Focusing, removing objects, adding color effects, changing lighting, etc. • There are standalone apps and plugin products • Used heavily in animation • Procedural textures and materials that don’t need to tiled • Environment procedures (often sun and sky) • Cloning to make crowds • Lighting and camera objects

Metadata for procedural data • Big problem • Difficult to crisply define the “meaning” of procedural data • Often, the reason procedural data exists is that the task is too complex • This sort of data is often inherently non-declarative • The marketplace is filled with competing, varying products, each with its own interface, and they are too powerful to scrap

Procedural data, continued • Mathematical packages used for minding • Almost ironically, these are somewhat easier to package declaratively, since the mathematics can be so complex that its foundation is used in a black box fashion

Data modeling and metadata

Data modeling and metadata

Presentation Transcript

DNS Data and Metadata Extraction

Data Modeling

Data Modeling

Data Modeling

Modeling Deployment Content and Metadata

CZO Integrated Data Management Data Model and Metadata

Data system and modeling

Data and Metadata Challenges

Data Modeling

EaGLe: Data Archiving and Metadata

Data and Process Modeling

NEESGrid Data and MetaData Technology

Data and Metadata Integrity

Metadata and the UK Data Archive

Data and Process Modeling

IC Data and Metadata Specifications

NEESGrid Data and MetaData Technologies

Data and Metadata Standardisation

Data and metadata modeling

Data and Metadata Standardisation

Data modeling