210 likes | 332 Vues
This article explores the transformation of variables in the Data Documentation Initiative (DDI), specifically the evolution from DDI 1.0 to DDI 3.0. It focuses on how variables, such as the 'Nativity' variable for birthplace categorization, are defined and distributed across multiple data modules in the new version. By examining concepts like category quantity, derived classifications, and statistical frequencies, we shed light on the complexities and enhancements that DDI 3.0 brings to data management and documentation.
E N D
The Variable Explosion Or how the DDI variable spread out to inhabit multiple modules in DDI 3.0
Once there was a mild mannered DDI 1.0 variable <var ID="V11" name="V11" catQnty="2"> <location StartPos="25" EndPos="25" width="1" RecSegNo="1" fileid="WLT1"/> <qstn ID="Q5" seqNo="4"> <qstnLit>What country were you born in?</qstnLit> </qstn> <labl level="var">Nativity</labl> <catgry ID="CV11_1"> <catValu>1</catValu> <labl level="catgry">Native</labl> <catStat type=“freq”>798920</catStat> </catgry> <catgry ID="CV11_2"> <catValu>2</catValu> <labl level="catgry">Foreign</labl> <catStat type=“freq”>210023</catStat> </catgry> <concept source="archive">Place_of_Birth</concept> <derivation><drvdesc>If US code as 1, else code as 2</drvdesc></derivation> </var>
It was well contained and kept close to home <var ID="V11" name="V11" catQnty="2"> <location StartPos="25" EndPos="25" width="1" RecSegNo="1“ fileid="WLT1"/> <qstn ID="Q5" seqNo="4"> <qstnLit>What country were you born in?</qstnLit> </qstn> <labl level="var">Nativity</labl> <catgry ID="CV11_1"> <catValu>1</catValu> <labl level="catgry">Native</labl> <catStat type=“freq”>798920</catStat> </catgry> <catgry ID="CV11_2"> <catValu>2</catValu> <labl level="catgry">Foreign</labl> <catStat type=“freq”>210023</catStat> </catgry> <concept source="archive">Place_of_Birth</concept> <derivation><drvdesc>If US code as 1, else code as 2</drvdesc></derivation> </var>
It was well contained and kept close to home <var ID="V11" name="V11" catQnty="2"> <location StartPos="25" EndPos="25" width="1" RecSegNo="1“ fileid="WLT1"/> <qstn ID="Q5" seqNo="4"> <qstnLit>What country were you born in?</qstnLit> </qstn> <labl level="var">Nativity</labl> <catgry ID="CV11_1"> <catValu>1</catValu> <labl level="catgry">Native</labl> <catStat type=“freq”>798920</catStat> </catgry> <catgry ID="CV11_2"> <catValu>2</catValu> <labl level="catgry">Foreign</labl> <catStat type=“freq”>210023</catStat> </catgry> <concept source="archive">Place_of_Birth</concept> <derivation><drvdesc>If US code as 1, else code as 2</drvdesc></derivation> </var>
It was well contained and kept close to home <var ID="V11" name="V11" catQnty="2"> <location StartPos="25" EndPos="25" width="1" RecSegNo="1“ fileid="WLT1"/> <qstn ID="Q5" seqNo="4"> <qstnLit>What country were you born in?</qstnLit> </qstn> <labl level="var">Nativity</labl> <catgry ID="CV11_1"> <catValu>1</catValu> <labl level="catgry">Native</labl> <catStat type=“freq”>798920</catStat> </catgry> <catgry ID="CV11_2"> <catValu>2</catValu> <labl level="catgry">Foreign</labl> <catStat type=“freq”>210023</catStat> </catgry> <concept source="archive">Place_of_Birth</concept> <derivation><drvdesc>If US code as 1, else code as 2</drvdesc></derivation> </var>
It was well contained and kept close to home <var ID="V11" name="V11" catQnty="2"> <location StartPos="25" EndPos="25" width="1" RecSegNo="1“ fileid="WLT1"/> <qstn ID="Q5" seqNo="4"> <qstnLit>What country were you born in?</qstnLit> </qstn> <labl level="var">Nativity</labl> <catgry ID="CV11_1"> <catValu>1</catValu> <labl level="catgry">Native</labl> <catStat type=“freq”>798920</catStat> </catgry> <catgry ID="CV11_2"> <catValu>2</catValu> <labl level="catgry">Foreign</labl> <catStat type=“freq”>210023</catStat> </catgry> <concept source="archive">Place_of_Birth</concept> <derivation><drvdesc>If US code as 1, else code as 2</drvdesc></derivation> </var>
It was well contained and kept close to home <var ID="V11" name="V11" catQnty="2"> <location StartPos="25" EndPos="25" width="1" RecSegNo="1“ fileid="WLT1"/> <qstn ID="Q5" seqNo="4"> <qstnLit>What country were you born in?</qstnLit> </qstn> <labl level="var">Nativity</labl> <catgry ID="CV11_1"> <catValu>1</catValu> <labl level="catgry">Native</labl> <catStat type=“freq”>798920</catStat> </catgry> <catgry ID="CV11_2"> <catValu>2</catValu> <labl level="catgry">Foreign</labl> <catStat type=“freq”>210023</catStat> </catgry> <concept source="archive">Place_of_Birth</concept> <derivation><drvdesc>If US code as 1, else code as 2</drvdesc></derivation> </var>
It was well contained and kept close to home <var ID="V11" name="V11" catQnty="2"> <location StartPos="25" EndPos="25" width="1" RecSegNo="1" fileid="WLT1"/> <qstn ID="Q5" seqNo="4"> <qstnLit>What country were you born in?</qstnLit> </qstn> <labl level="var">Nativity</labl> <catgry ID="CV11_1"> <catValu>1</catValu> <labl level="catgry">Native</labl> <catStat type=“freq”>798920</catStat> </catgry> <catgry ID="CV11_2"> <catValu>2</catValu> <labl level="catgry">Foreign</labl> <catStat type=“freq”>210023</catStat> </catgry> <concept source="archive">Place_of_Birth</concept> <derivation><drvdesc>If US code as 1, else code as 2</drvdesc></derivation> </var>
It was well contained and kept close to home <var ID="V11" name="V11" catQnty="2"> <location StartPos="25" EndPos="25" width="1" RecSegNo="1" fileid="WLT1"/> <qstn ID="Q5" seqNo="4"> <qstnLit>What country were you born in?</qstnLit> </qstn> <labl level="var">Nativity</labl> <catgry ID="CV11_1"> <catValu>1</catValu> <labl level="catgry">Native</labl> <catStat type=“freq”>798920</catStat> </catgry> <catgry ID="CV11_2"> <catValu>2</catValu> <labl level="catgry">Foreign</labl> <catStat type=“freq”>210023</catStat> </catgry> <concept source="archive">Place_of_Birth</concept> <derivation><drvdesc>If US code as 1, else code as 2</drvdesc></derivation> </var>
By 2.1 it has gotten a bit bolder • You could arrange the variables into nCubes creating multidimensional structures • It now had an optional place to put data location information • In allowed for more explicit nesting and category structure
Modularity ruled! • Concepts were captured early • Questions developed and linked to concepts • Categories were defined… • …and grouped into specific relationships • Variables were created using these category groups • Data storage was designed • Physical instances of data files were created and summary statistics calculated
Fortunately they all hung on to their “handies” and stayed in touch…
Implications for creating documentation • You can create in the order that information becomes available • You can reuse pieces and imply relationships • Support community wide concept, question, category and variable banks • Create new physical instances or formats without changing other modules
It requires • Thinking about a variable as a development process rather than as an artifact • New tools to facilitate information capture and create the appropriate links
Categories • Individual labels and definitions • Comparability is by definition NOT by label • Can be used by multiple Category Groups
Category Groups • A category group is made up of 1 or more categories • Flat – no hierarchies • Hierarchical - Levels • Regular • Irregular • Provides specific category codes
Variable construction • Variable • No categories • Ranges • Open • Category Groups • Full • Level • Discrete • Range • Cherry pick
NCube • They grew up, got a capital N • Still composed of one or more variables • Provide for multiple measures • Gained features to make them comparable to SDMX structures
What have we got • Logical product • Categories • Category Groups [name may change to protect the innocent] • Assembled into variables • Variables assembled into NCubes • Concepts, questions, physical location and summary stats are in other modules