100 likes | 188 Vues
Large Instance Points. 16th Eurofiling Workshop Wednesday 12 December Herm Fischer Mark V Systems Limited and Arelle open source XBRL processor. Study results. RAM consumption vs. instance size Instance XML to DOM (XML only) 1 : 3 to 1 : 10 Instance XBRL to formula processor 1 : 60
E N D
Large Instance Points 16th Eurofiling Workshop Wednesday 12 December Herm Fischer Mark V Systems Limited and Arelle open source XBRL processor
Study results • RAM consumption vs. instance size • Instance XML to DOM (XML only) 1 : 3 to 1 : 10 • Instance XBRL to formula processor 1 : 60 • Instance XBRL to SAX object model 1 : 15 • Conclusion • Constant-memory streaming approach suggested • Non-XML technologies eventually required
US SEC form “SD” • Mining and oil exploration “payout” details • Sample size: • Instance: 21.7 MB • Every filer uses extension taxonomy • 150,006 facts • 21,001 contexts • 5 units • 0 footnotes
Streaming XBRL in XML syntax • Base spec 2.1 working group note (WGN) & task force • Compatible organization of instance for streaming • SAX vs DOM improves speed, no XML persistence • Constant memory usage • Use for 2.1 & XDT validation • Challenges for • Financial validation (~GFM) • Formula processing • XPath (XML node access)
XBRL streamability issues • Order • Freedom to order facts, contexts, units, footnotes • XML syntax detail • Formula & Table Xpath to nodes and XML structure • Validation and formula strategies • Designed for complete instance in memory • Complex fallback and existence strategies
XBRL streaming approach • Constant memory • Backwards compatibility • Order constrained within instances • Contexts/units located as needed in instance
Financial Validation EFM/GFM • Full object models analysis in memory for • Context, unit, fact duplications (could use hashes) • Fact cross-dimension analysis (only some concepts) • History of concepts used • Roll ups, roll forwards, aggregations • Full DTS model in memory for • Concept issues, label/definition issues • Missing / improper calculations, roll-ups, roll-fwds • Can be re-architected for streaming environment
Global Ledger Architecture • Multiple content models required in parallel • Transactions • Company data • Account data • Reformulate for independent streams or persistent company/account models.
Formula Response • Define subset of XPath for streamed processor • No node-axis features (“/” or “[“ operators) • More functions for context, typed dimensions, etc • Should allow use of non-XML implementations • Define subset of formula processing • Consider SQL infrastructures • Consider OLAP features • Reconsider use of features like • Fallbacks • Multiple instances of large data sets
Abstract Model Response • Abstract model is based on OMG MOF & CWM • Abstracted XBRL semantics from syntax • Implementation will layer on XML as well as • CWM (OLAP supportive technologies) • Next step is a pilot project • Implement abstract model demonstration • Evolve and tweak specs • Provide prototype implementation