240 likes | 368 Vues
Data Management. David Nathan & Peter Austin & Robert Munro. This section. Data management Properties of data Relational data model XML Example. something happened. . representations, lists, summaries, analyses. something inscribed. cleaned up, selected, analysed.
E N D
Data Management David Nathan & Peter Austin & Robert Munro
This section • Data management • Properties of data • Relational data model • XML • Example
something happened representations, lists, summaries, analyses something inscribed cleaned up, selected, analysed you applied knowledge, made decisions archived, presented, published NOT OF INTEREST! recapitulates representations, eg transcription, annotation recording you applied knowledge, techniques made decisions, applied linguistic knowledge FOCUS OF INTEREST! archived & ... ?? something happened Workflows - description vs documentation Description Documentation
Choosing values/priorities • Standards & compliance • Adeptness with tools • Modelling of phenomena, architecture of data • Dissemination/publishing • Preserving • Ethics, responsibility, protocol • Range, comprehensiveness • Intellectual rigour • Which are priorities? • Which are dispensible?
Data should be: • explicit • consistent • robust • meaningful • conventional • adaptable, convertible, machine readable etc • useful!
“Portability” • Bird and Simons 2003: language documentation data needs to have integrity, flexibility, longevity
“Portability” • complete • explicit • documented • preservable • transferable • accessible • adaptable • not technology-specific • (also appropriate, accurate, useful etc!!)
Data management • the way that data is structured is also information, that may be complex • properly structured data allows: • usage including manipulation, conversion, derivation • preservation • machine readability
Data management systems • a data management system is a system you design for storing data and metadata: • information about content and structures • relationship between units of information • it is not necessarily tied to any particular software, or even a computer
Naive managment using filenames • a (too) simple management system: • information about a recording is captured in the filenames: 1st_int_john_5Aug.wav market_conv_mj.wav …. • what does ‘int’ mean? • what information about the recording is missing?
Data modeling • World/universe • Domain • Relevant • entities • properties • relationships • We also need formal ways to represent these
Data modeling • data modelling is the process of designing your data management system: • what information do you need to record? • what are the units of information? • what are their properties (attributes)? • what are the relationships between the units of information? • how is the information etc likely to change in the future? • how can all this be represented?
Data management • two well-known formats for structured data: • relational database • eXtensible Markup Language (XML) • these are methods, not softwares or hardwares • any system for well-structured data could be OK, but generally: • smaller community of users so less tools and support • ... so errors more likely
Databases • Note that database has 3 senses: • a body of related information • type of software (eg Oracle, Access, Filemaker) • a model for the domain of information (ie. formulation of entities and relationships)
Relational format • Uses tables • Table rows represent entities in a domain • Table columns represent properties/attributes of entities • Each cell represents one atomic unit of data • The order of rows and columns has no significance
TABLE NAME field name Representing a relational design • simplest example
Representing a relational design • less trivial entity TABLE NAME field 1 field 2
CONTINENT name COUNTRY name Representing a relational design • less trivial domain = one to many
AUTHOR ..... SUBJECT name ..... name Non-trivial domains • non-trivial domains have many-to-many relationships
From model to implementation • implementing table relationships CONTINENT COUNTRY name name id id continent_id
Designing a database • Determine the domain, entities and relationships • Experiment with scenarios • Any non-trivial model will evolve as it is thought out and tested • Normalisation is the process of refining models
Practical example • Create a database model for some audio metadata
What does all this achieve? • conceptual/intellectual validity • scalable, searchable, modular • machine readable • in fact, portable: • complete • explicit • documented • preservable • transferable • accessible • adaptable • not technology-specific