1 / 23

Data modeling and metadata

Data modeling and metadata. From graphs to graphs. Metadata. Full metadata: relational schemas Self defining data: XML, key/value, key/document No metadata: untagged images, video, audio Parallel metadata: tagged images, video, audio. Full schema metadata. Origins: Semantic networks in AI

vangie
Télécharger la présentation

Data modeling and metadata

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data modeling and metadata From graphs to graphs

  2. Metadata • Full metadata: relational schemas • Self defining data: XML, key/value, key/document • No metadata: untagged images, video, audio • Parallel metadata: tagged images, video, audio

  3. Full schema metadata • Origins: • Semantic networks in AI • Metadata mixed in with data • Objects (nodes in graph), has-a (arcs in graph), is-a (arcs in graph), types (nodes), subtypes (nodes) • Essentially a network with metadata and all instances of the metadata • Goal was to model knowledge of real world, not to manage volumes of data

  4. Early databases • Slow to adopt data structuring abstractions because speed of access was the focus • Hierarchical and network databases • Links between records of one file to records of another • E.g., each claim record is linked to a subscriber record • Also, sets of records and sets of links

  5. Relational databases • First true abstraction of metadata separated from data • Minimal structure in order to accommodate fast retrieval of tuples • Abstractions • Relation • Attribute • Tuple • PKs, CKs, FKs, null/not null

  6. Concurrent with relational database development: “semantic” databases • Like semantic networks (quite deliberately), only metadata separated from data • Not object-oriented • No object IDs • No classes instantiated from types • A wide variety of competing models, with “the” Semantic Model being one of them

  7. Semantic databases, continued • Other modeling notions • Components or aggregates that are necessary parts of an object and cannot be changed, like the day you were born or the VIN of a car • Versus Properties or attributes that can be changed, like your name or the transmission in a car • Cause and effect relationships • Such as a sales visit leading to a sale • And many other specialized relationships • Interestingly, no query facilities and no commercial systems that were successful

  8. Persistent programming languages • Not necessarily object-oriented • Host language is the only language • Data can be persistent or not, often selectively • Strong notion of metadata as programming data types

  9. Object-oriented databases • Strong notion of object ID and object identity • Types/subtypes and classes • Strong sense of metadata separate from data • Behavioral encapsulation

  10. Object-relational databases • Objects in the small • User defined data types for attribute domains • No behavioral encapsulation

  11. One-of-a-kind semantically rich databases • Engineering/CAD data • Complex objects • Lots of singleton types, but with strict notion of metadata • Complex constraints • Far reaching component and constraint relationships

  12. One-of-a-kind scientific/medical/financial databases • Managing type-based, voluminous data with little internal structure (imaging) • Managing textual data with some structure and lots of domain-based terminology • Often there are real-time demands made on distributed databases – very difficult problem • By putting timing constraints on specific parts of the data processing code

  13. Self-defining data • Inspired by need to stream data live and process it in one pass • Also inspired by the need to vary the structure of individual pieces of data, like documents and other items that don’t really have a shared type construct • XML developed as a shared language model for semi-structured (or self-defining) data • Developed in part to assist the construction of the semantic web • Data is streamed on the Internet or from sensors

  14. Self-defining data, continued • NoSQL databases that store extremely high volumes of loosely structured data • Documents with internal structure • Values with no meaning within the database • Usually no formal query language, as data is interpreted programmatically (either partially or fully); sometimes there is a library of common query templates

  15. No metadata databases • Early blob and continuous data • Images • Video • Audio • Flash • All processing of data taking place in complex programs that do not retrieve metadata or insert metadata in the data • E.g., image processing, facial searching, language searching

  16. Recent blob/continuous data • Development of parallel metadata databases that contain low level and semantically rich tagging • Only the metadata database is actively searched • Searching can be enhanced by downloading small samples • Feedback loops to improve tag interpretation • Tags taken from shared namespaces

  17. Assertion based databases • Usually use triples (assertions) • Triples are chained together to make new inferences • Metadata is treated like data • Joe owns a Ford • Fords are cars • SQL-like, triple-hopping query languages

  18. Graph databases • Networks of objects that blur the boundary between data and metadata • Supports levels of connectivity orders of magnitude bigger than in network and hierarchical databases of old • Has a purpose that is reminiscent of network/hierarchical databases – to represent the fluid and highly interconnected nature of complex data, such as that collected from social media • Use graph-like query and programming interfaces

  19. Graphics/animation/gaming data • Shares a lot of properties with scientific and engineering data • Innately mathematical • Straight and curved line 2D geometry used in 3-space • Bezier and NURBS for curves • Matrix mathematics for 3D manipulation • Transpose, Scale, Rotate • Mapping to pixel based data for presentation

  20. Graphics/animation/gaming, continued • For real-time rendering, low polygon objects and bounding box collision mathematics used • Creates the most aggressive demands on processing and graphics card technology • Often no notion at all of metadata at all • Even non-real-time animation demands low quality interactive rendering

  21. Procedural data • Used heavily in photo/video processing • Focusing, removing objects, adding color effects, changing lighting, etc. • There are standalone apps and plugin products • Used heavily in animation • Procedural textures and materials that don’t need to tiled • Environment procedures (often sun and sky) • Cloning to make crowds • Lighting and camera objects

  22. Metadata for procedural data • Big problem • Difficult to crisply define the “meaning” of procedural data • Often, the reason procedural data exists is that the task is too complex • This sort of data is often inherently non-declarative • The marketplace is filled with competing, varying products, each with its own interface, and they are too powerful to scrap

  23. Procedural data, continued • Mathematical packages used for minding • Almost ironically, these are somewhat easier to package declaratively, since the mathematics can be so complex that its foundation is used in a black box fashion

More Related