730 likes | 758 Vues
Taxonomy Governance. Ron Daniel, Jr. & Joseph A. Busch Taxonomy Strategies LLC. Agenda. 1:30 Welcome & Introductions 1:45 Exercise: Taxonomy Revisions 2:15 Fundamental Processes 2:30 Governance Team Roles and Structures 3:00 Tools 3:05 Break 3:15 Exercise: Organizational Self-Assessment
E N D
Taxonomy Governance Ron Daniel, Jr. & Joseph A. Busch Taxonomy Strategies LLC
Agenda • 1:30 Welcome & Introductions • 1:45 Exercise: Taxonomy Revisions • 2:15 Fundamental Processes • 2:30 Governance Team Roles and Structures • 3:00 Tools • 3:05 Break • 3:15 Exercise: Organizational Self-Assessment • 3:30 Maturity Model • 3:40 Designing and Building Maintainable Taxonomies & Metadata • 4:00 Additional Processes • 4:20 Q &A • 4:30 Adjourn
Who we are: Joseph Busch • Over 25 years in the business of organized information • Founder, Taxonomy Strategies • Director, Solutions Architecture, Interwoven • VP, Infoware, Metacode Technologies • Program Manager, Getty Foundation • Manager, Pricewaterhouse • Metadata and taxonomies community leadership • President, American Society for Information Science & Technology • Director, Dublin Core Metadata Initiative • Adviser, National Research Council Computer Science and Telecommunications Board • Reviewer, National Science Foundation Division of Information and Intelligent Systems • Founder, Networked Knowledge Organization Systems/Services
Who we are: Ron Daniel, Jr. • Over 15 years in the business of metadata & automatic classification • Principal, Taxonomy Strategies • Standards Architect, Interwoven • Senior Information Scientist, Metacode Technologies • Technical Staff Member, Los Alamos National Laboratory • Metadata and taxonomies community leadership • Chair, PRISM (Publishers Requirements for Industry Standard Metadata) working group • Acting chair: XML Linking working group • Member: RDF working groups • Co-editor: PRISM, XPointer, 3 IETF RFCs, and Dublin Core 1 & 2 reports.
Government Commodity Futures Trading Commission Defense Intelligence Agency ERIC Federal Aviation Administration Federal Reserve Bank of Atlanta Forest Service GSA Office of Citizen Services (www.firstgov.gov) Head Start Infocomm Development Authority of Singapore NASA (nasataxonomy.jpl.nasa.gov) Small Business Administration Social Security Administration USDA Economic Research Service USDA e-Government Program (www.usda.gov) Commercial Allstate Insurance Blue Shield of California Debevoise & Plimpton Halliburton Hewlett Packard Motorola PeopleSoft Pricewaterhouse Coopers Siderean Software Sprint Time Inc. Commercial subcontracts Agency.com – Top financial services Critical Mass – Fortune 50 retailer Deloitte Consulting – Big credit card Gistics/OTB – Direct selling giant NGO’s CEN IDEAlliance IMF OCLC Recent & current projects
Participant Introductions • Who are you? • What do you do? • What brings you here today?
Agenda • 1:30 Welcome & Introductions • 1:45 Exercise: Taxonomy Revisions • 2:15 Fundamental Processes • 2:30 Governance Team Roles and Structures • 3:00 Tools • 3:05 Break • 3:15 Exercise: Organizational Self-Assessment • 3:30 Maturity Model • 3:40 Designing and Building Maintainable Taxonomies & Metadata • 4:00 Additional Processes • 4:20 Q &A • 4:30 Adjourn
Taxonomy Governance Overview • Is “Taxonomy Governance” synonymous with “Taxonomy Maintenance”? • What kinds of changes can be made, and what are their costs? • What kinds of information are needed to determine the changes? • What kind of group should maintain the taxonomy? • What kinds of rules should the group follow to decide on changes? • What should the group do beyond maintaining the taxonomy?
Exercise: Taxonomy Modifications • Divide into small groups • Review assigned sample taxonomy • Discuss changes you would make • In 10 minutes, a spokesperson will speak for the group and briefly: • Tell us something good about the taxonomy • Characterize the short-term changes your group would make • Characterize the questions your group would like answered before making other changes
Exercise Notes • Team Members: • Something good about the taxonomy: • Short term changes: • Questions for other changes:
Group 2 Sample Taxonomy Top Level Random Samples of Detailed Categories Business / Accounting / Firms / Directories Business / Biotechnology & Pharmaceuticals / Education & Training Business / Employment / By Industry Business / Healthcare / Employment / Regional Business / Small Business / Finance / Accounting Reference / Education / Colleges & Universities / North America / United States / Maryland / Columbia Union College / Athletics Reference / Education / K-12 / Home Schooling / Unschooling / Chats and Forums Regional / Europe / Ireland / Business & Economy / Employment / Health & Medical Science / Math / Academic Departments / South America / Colombia Science / Social Sciences / Linguistics / Translation / Associations Society / People / Women / Science & Technology / Mathematics
Group 3 Sample Taxonomy Top Level Detail in Auto Products Category Source: http://householdproducts.nlm.nih.gov/products.htm
Editorial Rules Metadata Specification, Design for maintainability How to put it into action? User Characterization Content and Metadata Maintenance ROI Predictions • Short-term changes will center on rules of style – ‘&’ vs. ampersand, capitalization, plurals • Faceted subdivision will only be suggested by experienced practitioners, by groups given low-level details of a taxonomy, or both. People will critique the UI Presentation • Questions for Long-term changes will focus, in decreasing order, on: • Who are the users and what are they doing? • What is the content and how much is in the various categories? • … • What kind of money depends on the taxonomy, and what kind of maintenance expenses are justified? • Anything else people want to cover?
Agenda • 1:30 Welcome & Introductions • 1:45 Exercise: Taxonomy Revisions • 2:15 Fundamental Processes • 2:30 Governance Team Roles and Structures • 3:00 Tools • 3:05 Break • 3:15 Exercise: Organizational Self-Assessment • 3:30 Maturity Model • 3:40 Designing and Building Maintainable Taxonomies & Metadata • 4:00 Additional Processes • 4:20 Q &A • 4:30 Adjourn
Fundamental Processes • What are the two fundamental processes every organization should implement to maintain its metadata and taxonomies? • Query log / Click trail examination • Tagging Error Correction • What are the key outlooks a taxonomist should try to instill in their organization?
Fundamental Process #1 – Query Log Examination • How can we characterize users and what they are looking for? • Query Log & Click Trail Examination • Sophisticated software available, but don’t wait. • 80/20 Rule – 80% of value from 20% of possible reports. • Greatest value comes from: • Identifying a person as responsible for search quality • Starting a “Measure & Improve” mindset • Greatest challenge: • Getting a person assigned (≥ 10%) • Getting logs turned back on • What to do after the obvious fixes have been made • UltraSeek Reporting • Top queries • Queries with no results • Queries with no click-through • Most requested documents • Query trend analysis • Complete server usage summary Click Trail Packages iWebTrack NetTracker OptimalIQ SiteCatalyst Visitorville WebTrends
Fundamental Process #2 – Tagging Error Correction • For the Taxonomy to be used, its values must be associated with content. • We will refer to this as “Tagging”. • Errors will happen, and some will be found. What are you going to do about them? • Define an error correction process. • Process will accommodate questions like: • Is it an error? What is the cost to correct or not correct? Does the correction need to be scheduled? etc. • Once an error is corrected, NEVER lose that fact. • Manually reviewed pages are vital for training automatic classifiers. • Has implications for metadata specification and review procedures. • Over time, multiple error detection methods will be defined. • e.g. Statistical sampling of newly added pages • Gradually, additional error correction processes may be defined to deal with particular types of errors.
How are we going to build and maintain metadata structures and controlled vocabularies? The taxonomy problem How are we going to populate metadata elements with complete and consistent values? The tagging problem How are we then going to use metadata in applications and demonstrate benefits? The ROI problem Taxonomy Governance is a standards process. Take tips from other standards efforts Team, with comment-handling responsibilities and an appeals process Issue Logs Announcements Release Schedule Foster a “Measure & Improve” Mindset Fundamental Outlooks Must know this to address other problems!
Agenda • 1:30 Welcome & Introductions • 1:45 Exercise: Taxonomy Revisions • 2:15 Fundamental Processes • 2:30 Governance Team Roles and Structures • 3:00 Tools • 3:05 Break • 3:15 Exercise: Organizational Self-Assessment • 3:30 Maturity Model • 3:40 Designing and Building Maintainable Taxonomies & Metadata • 4:00 Additional Processes • 4:20 Q &A • 4:30 Adjourn
Taxonomy Business Processes • Taxonomies must change, gradually, over time if they are to remain relevant • Maintenance processes need to be specified so that the changes are based on rational cost/benefit decisions • A team will need to maintain the taxonomy on a part-time basis • Taxonomy team reports to some other steering committee
Web CMS Archives Intranet Search ERMS ’ ’ CVs Other Controlled Items Definitions about the Controlled Vocabulary Governance Environment Change Requests & Responses Published CVs and STs Consuming Applications 1: Syndicated Terminologies change on their own schedule 2: CV Team decides when to update CVs Syndicated Terminologies ISO 3166-1 Vocabulary Management System Other External Notifications Intranet Nav. 3: Team adds value via mappings, translations, synonyms, training materials, etc. ERP DAM Custodians … 4: Updated versions of CVs published to consuming applications Other Internal … ’ ’ Controlled Vocabulary Governance Environment
Other Controlled Items • Taxonomy Team will have additional items to manage: • Charter, Goals, Performance Measures • Editorial rules • Team processes • Tagger training materials (manual and automatic) • Outreach & ROI • Communication plan • Website • Presentations • Announcements • Roadmap
Taxonomy governance | Generic team charter • Taxonomy Team is responsible for maintaining: • The Taxonomy, a multi-faceted classification scheme • Associated taxonomy materials, such as: • Editorial Style Guide • Taxonomy Training Materials • Metadata Standard • Team rules and procedures (subject to CIO review) • Team evaluates costs and benefits of suggested change • Taxonomy Team will: • Manage relationship between providers of source vocabularies and consumers of the Taxonomy • Identify new opportunities for use of the Taxonomy across the Enterprise to improve information management practices • Promote awareness and use of the Taxonomy
Editorial Rules • To ensure consistent style, rules are needed • Issues commonly addressed in the rules: • Sources of Terms • Abbreviations • Ampersands • Capitalization • Continuations (More… or Other…) • Duplicate Terms • Hierarchy and Polyhierarchy • Languages and Character Sets • Length Limits • “Other” – Allowed or Forbidden? • Plural vs. Singular Forms • Relation Types and Limits • Scope Notes • Serial Comma • Spaces • Synonyms and Acronyms • Term Order (Alphabetic or …) • Term Label Order (Direct vs. Inverted) • Must also address issue of what to do when rules conflict – which are more important?
Executive Sponsor Advocate for the taxonomy team Business Lead Keeps team on track with larger business objectives Balances cost/benefit issues to decide appropriate levels of effort Specialists help in estimating costs Obtains needed resources if those in team can’t accomplish a particular task Technical Specialist Estimates costs of proposed changes in terms of amount of data to be retagged, additional storage and processing burden, software changes, etc. Helps obtain data from various systems Content Specialist Team’s liaison to content creators Estimates costs of proposed changes in terms of editorial process changes, additional or reduced workload, etc. Small-scale Metadata QA Responsibility Taxonomy Specialist Suggests potential taxonomy changes based on analysis of query logs, indexer feedback Makes edits to taxonomy, installs into system with aid of IT specialist Content Owner Reality check on process change suggestions Business Lead Custodians Responsible for content in a specific CV. Training Representative Develops communications plan, training materials Work Practices Representative Develops processes, monitors adherence IT Representative Backups, admin of CV Tool Info. Mgmt. Representative Provides CV expertise, tie-in with larger IM effort in the organization. Roles in Two Taxonomy Governance Teams Team structure at a different org.
Firewall Application Tagging UI UI Tagging Logic Taxonomy governance | Where changes come from Firewall Firewall Application Application Tagging Tagging UI UI UI UI Application Logic Content Content Tagging Tagging Logic Logic Taxonomy Taxonomy Staff Staff Query log Query log notes notes analysis analysis ‘ ‘ missing missing ’ ’ concepts concepts End User End User Tagging Staff Tagging Staff • Recommendations by Editor • Small taxonomy changes (labels, synonyms) • Large taxonomy changes (retagging, application changes) • New “best bets” content • Team considerations • Business goals • Changes in user experience • Retagging cost Taxonomy Editor Taxonomy Editor experience experience Taxonomy Team Requests from other Requests from other parts of the organization parts of NASA
Different organizations will need to consider their own change processes. Organization 1: A custodian is responsible for the content, but checks facts with department heads before making changes. Organization 2: Analysts suggest changes, editors approve, copyeditors verify consistency. Change process MUST also consider cost of implementing the change Retagging data Reconfiguring auto-classifier Retraining staff Changes in user expectations Taxonomy Change Cases Case 1. Renaming a term Case 2. Adding a new leaf term Case 3. Inserting a new term Case 4. Splitting a term Case 5. Deleting a leaf term or subtree Case 6. Deleting a term Case 7. Moving a subtree Case 8. Merging terms Case 9. Adding a CV Case 10. Deleting a CV Processes
Taxonomy governance | Taxonomy maintenance workflow Problem? Yes No Add to enterprise Taxonomy Suggest new name/category Review new name Copy edit new name Problem? Taxon-omy No Yes Analyst Taxonomy Tool Editor Copywriter Sys Admin
Agenda • 1:30 Welcome & Introductions • 1:45 Exercise: Taxonomy Revisions • 2:15 Fundamental Processes • 2:30 Governance Team Roles and Structures • 3:00 Tools • 3:05 Break • 3:15 Exercise: Organizational Self-Assessment • 3:30 Maturity Model • 3:40 Designing and Building Maintainable Taxonomies & Metadata • 4:00 Additional Processes • 4:20 Q &A • 4:30 Adjourn
Taxonomy editing tools vendors Most popular taxonomy editor? MS Excel Immature industry – no vendors in upper-right quadrant! high Ability to Execute High functionality, high cost ($100k!) low Widely used, cheap, single-user Niche Players Visionaries Completeness of Vision
Sample Taxonomy Editor Functionality • Standard and Custom Fields • Standard and Custom Relations • Data Typing, Restrictions, and Inference • Flexible Reporting • Flexible Importing • Multiple Vocabulary Support • Inter-Vocabulary Relations • Unique IDs • ISO Codes not sufficient • Workflow • Voting • Change Request Management • Programmability Term Editing Hierarchy Browser
Where do I put the metadata? • Where can I store metadata? • In the content – HTML Headers, File properties, etc. • In a centralized repository – Search index, MDDB, etc. • In multiple systems – Common case • Where should I store metadata? • Consultant’s answer – “It depends.” • If you are moving files through a process, putting it in the file keeps it from getting dropped at system borders. • If you are doing search across multiple documents, it has to be at least copied out of the files. • If you make copies of files and modify them, consistent in-file metadata will be impossible. • Real question is not where to STORE the metadata, it is how to MAINTAIN the metadata. • Web CMS as an example. • Central Metadata Database is a very advanced practice.
Agenda • 1:30 Welcome & Introductions • 1:45 Exercise: Taxonomy Revisions • 2:15 Fundamental Processes • 2:30 Governance Team Roles and Structures • 3:00 Tools • 3:05 Break • 3:15 Exercise: Organizational Self-Assessment • 3:30 Maturity Model • 3:40 Designing and Building Maintainable Taxonomies & Metadata • 4:00 Additional Processes • 4:20 Q &A • 4:30 Adjourn
Agenda • 1:30 Welcome & Introductions • 1:45 Exercise: Taxonomy Revisions • 2:15 Fundamental Processes • 2:30 Governance Team Roles and Structures • 3:00 Tools • 3:05 Break • 3:15 Exercise: Organizational Self-Assessment • 3:30 Maturity Model • 3:40 Designing and Building Maintainable Taxonomies & Metadata • 4:00 Additional Processes • 4:20 Q &A • 4:30 Adjourn
What Processes Should I Try to Institute? • Processes will vary from one organization to another. • Assessing the Organization’s state is the first step. • Determining the ROI and potential resources follows. • Plan on instituting processes over time, beginning with basic ones.
Background Rate your organization’s search & metadata maturity from 1 to 10. What was the most recent change to your organization’s search & metadata processes? What is the next step for your organization’s search & metadata processes? Basic Is there a process in place to examine query logs? Is there an organization-wide metadata standard, such as an extension of the Dublin Core, for use by search tools, multiple repositories, etc.? Intermediate Is there an ongoing data cleansing procedure to look for ROT (Redundant, Obsolete, Trivial content)? If so, describe briefly. Does the search engine index more than 4 repositories around the organization? Are system features and metadata fields added based on cost/benefit analysis, rather than things that are easy to do with the current tools? Are tools only acquired after requirements have been analyzed, or are major purchases sometimes made to use up year-end money? Are there hiring and training practices especially for metadata and taxonomy positions? If so, describe briefly. Advanced Are there established qualitative and quantitative measures of metadata quality? If so, describe briefly. Can the CEO explain the ROI for search and metadata? Optional Your name: Organization: E-mail: Search and Metadata Self-Assessment Form Contact information will not be used for marketing purposes. It will only be used to follow-up and clarify issues around the survey.
Agenda • 1:30 Welcome & Introductions • 1:45 Exercise: Taxonomy Revisions • 2:15 Fundamental Processes • 2:30 Governance Team Roles and Structures • 3:00 Tools • 3:05 Break • 3:15 Exercise: Organizational Self-Assessment • 3:30 Maturity Model • 3:40 Designing and Building Maintainable Taxonomies & Metadata • 4:00 Additional Processes • 4:20 Q &A • 4:30 Adjourn
Metadata Maturity Model • Taxonomy governance processes must fit the organization • As consultants, we notice different levels of maturity in the business processes around Content Management, Taxonomy, and Metadata • Honestly assess your organization’s metadata maturity in order to design appropriate governance processes • We are starting to define a maturity model, similar to the CMMI model in the software world.
Shameless Plug: Tomorrow Morning at 9:45 Call for Data: Leave Self-Assessments with us Metadata Maturity Model
Purpose of Maturity Model • Estimating the maturity of an organization’s information management processes tells us: • How involved the taxonomy development and maintenance process should be • Overly sophisticated processes will fail • What to recommend as next steps • Maturity is not a goal, it is a characterization of an organization’s methods for achieving particular goals. • Mature processes have expenses which must be justified by consequent cost savings or revenue gains. • Metadata Maturity may not be core to your business.
Agenda • 1:30 Welcome & Introductions • 1:45 Exercise: Taxonomy Revisions • 2:15 Fundamental Processes • 2:30 Governance Team Roles and Structures • 3:00 Tools • 3:05 Break • 3:15 Exercise: Organizational Self-Assessment • 3:30 Maturity Model • 3:40 Designing and Building Maintainable Taxonomies & Metadata • 4:00 Additional Processes • 4:20 Q &A • 4:30 Adjourn
Overview of Best Practices in Metadata and Taxonomy • Avoid monolithic ‘subject’ taxonomies • May have a browsing taxonomy constructed from combined facets. • Use (or map to) Dublin Core for basic information. • Extend with custom elements for specific facts. • Use pre-existing, standard, vocabularies as much as possible. • Validate author names with LDAP directory • ISO country codes for locations • Product & service info from ERP system • Designate a team to manage the taxonomies and related materials • Taxonomy Editorial Rules, Processes, Training materials, Outreach & ROI • Design a Metadata QC Process • Start with an error-correction process, then get more formal on error detection. • In the future, large-scale ontologies like CYC may be valuable in automated error detection.
Factor “Subject” into smaller facets • Size • DMOZ tries to organize all web content, has more than 600k categories! • Difficulty in navigating, maintaining • Hidden facet structure • “Classification Schemes” vs. “Taxonomies”
Facet Principles • Basic facets with identified items – people, places, projects, instruments, missions, organizations, … Note that these are not subjective “subjects”, they are objective “objects”. • Subjective views can be laid on top of the objective facts, but should be in a different namespace so they are clearly distinguishable. • For example, labels like “Anarchist” or “Prime Minister” can be applied to the same person at different times (e.g. Nelson Mandela).
1 Identify Objectives Interview core team and stakeholders Review tagged samples, default procedures Interview alpha users Interview beta users 2 Inventory Content ID sources, spider assets & extract metadata Gather additional sources, if any Gather additional sources, if any 3 Specify Metadata Define fields & purpose Revise if needed, bake into alpha CMS Modify CMS for beta Modify for 1.0 4 Model Content Define content chunks & XML DTDs Revise if needed, bake into alpha CMS Modify CMS for beta Modify for 1.0 5 Specify Vocabularies Compile controlled vocabularies Revise, use in alpha CMS Revise, use in beta CMS Revise using team procedure 6 Specify Procedures Start with UI sketches, off-the-shelf rules. Tailor the default materials alpha workflows in CMS Modify & extend workflows Finalize procedure materials 7 Train Staff Manually tag small sample Use alpha CMS to tag larger sample Use beta CMS to tag larger sample Finalize training materials & train staff Stage Plan & Prototype Alpha Dev & Test Beta D&T Final D&T Participants Project Team Stakeholders and SMEs Friendly Users Audiences Iterative Development Vision (More participants and tagged content at each iteration)
Planning for Taxonomy Changes • Error Correction – What to do when end-users and tagging staff notice problems? • Provide for it in the Error Correction Process • Add Query Log Analysis to help detect user problems • How to answer questions re. things to add, delete, or rearrange in the taxonomy? • Keep a visible issue log • Discuss with SMEs, tag samples, use other testing methods • Per-facet changes: • Corporate reorganizations, Product lineup changes, Country splits & merges, … will happen. Prepare for them when deploying those facets • Long-term – what facets to create, when, and why • See Taxonomy Roadmap section
Agenda • 1:30 Welcome & Introductions • 1:45 Exercise: Taxonomy Revisions • 2:15 Fundamental Processes • 2:30 Governance Team Roles and Structures • 3:00 Tools • 3:05 Break • 3:15 Exercise: Organizational Self-Assessment • 3:30 Maturity Model • 3:40 Designing and Building Maintainable Taxonomies & Metadata • 4:00 Additional Processes • Brief remarks on Measurements, ROI, Training, Roadmap • 4:20 Q &A • 4:30 Adjourn
Measuring Metadata and Taxonomy Quality • Taxonomy development is an iterative process • Develop an organizational idea, then test it by tagging sample content • Elicit feedback via walk-throughs and card sorting exercises • Use both qualitative and quantitative methods • Time, budget, and availability of tagged data will determine what methods are possible.