1 / 34

Terminology for Statistics

Terminology for Statistics. How Can End Users Connect? Stephanie W. Haas School of Information and Library Science University of North Carolina at Chapel Hill stephani@ils.unc.edu. Overview. Terminology and End User Searching Characteristics of users and searches Types of queries

jamesjwood
Télécharger la présentation

Terminology for Statistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Terminology for Statistics How Can End Users Connect? Stephanie W. Haas School of Information and Library Science University of North Carolina at Chapel Hill stephani@ils.unc.edu

  2. Overview • Terminology and End User Searching • Characteristics of users and searches • Types of queries • Other sources of confusion • Ideas for Solutions • Goals • What needs to be solved • Possible tools and structures • Final Points Open Forum 2000

  3. Terminology and End User Searching • Characteristics of users and searches • Types of queries • Other sources of confusion Open Forum 2000

  4. Searching isn’t easy • “Query matching is effective only when the search is specific, the searcher knows precisely what he or she wants, and the request can be expressed adequately in the language of the system” (Borgman, 1996, p. 494) • If you don’t know what to call it, you can’t find it. • If you don’t know what it means, you can’t use it. Open Forum 2000

  5. The Mapping Problem Search Data Element(s) Agency Term(s) User’s Term(s) User’s Information Need Open Forum 2000

  6. Inside the System – Metadata Registry • Statistical experts’ understanding and usage • Crisp operational definitions (ideal) • Unambiguous terms (ideal) • Minimal or predictable contextual effects Data Element(s) Agency Term(s) Open Forum 2000

  7. Outside the System • Choice of terms may depend on: • user’s domain knowledge • user’s search knowledge • user’s notion of what is available • terms seen elsewhere • luck? User’s Term(s) User’s Information Need Open Forum 2000

  8. Users’ Knowledge Varying sophistication of questions • What is the universe for this survey question, given the questions leading up to it? • What is the current unemployment rate? Please send me the answer before my 9:00 class tomorrow. Open Forum 2000

  9. Types of Queries • Correct (matching) term consumer price index  consumer price index • Obvious synonym health care  medical care (CPI) • Conceptual cluster of synonyms/near synonyms woman, female, girls  women Open Forum 2000

  10. Types of Queries (2) • “External” terms, common outside the agency, no direct data element equivalent inside the agency. inflation (generally use CPI or PPI) turnover (retention rate? job or profession tenure?) new jobs (first appearance on payroll?) Open Forum 2000

  11. Types of Queries (3) • “Trendy” terms. Subset of external terms. cyberjobs (from magazine article) Webmaster (recent coinage) reinvention Open Forum 2000

  12. Types of Queries (4) • Concept access ”Give me everything you have about worker benefits” Good answer requires pulling together information from many sources (which may be more or less compatible). (See MapStats for example. http://www.fedstats.gov/mapstats/) Open Forum 2000

  13. Contributing Factors • Confusion about basic statistical concepts seasonal adjustment “Indicates the adjustment of timeseries data to eliminate the effect of intrayear variations which tend to occur during the same period on an annual basis.” (BLS Selective Access) Open Forum 2000

  14. “To seasonally adjust a given economic time series is to eliminate that part of the change in the series which can be ascribed to the normal seasonal variation” “Seasonal adjustment is a mathematical process whereby the effects of recurring non-economic factors are removed from an economic time series.” (Dictionary of U.S. Government Statistical Terms, 1991) Open Forum 2000

  15. “A term applied to time series from which periodic oscillations with a period of one year have been removed.” (Cambridge Dictionary of Statistics, 1998) What is this number, and what does it mean? rate, index, ratio, value Open Forum 2000

  16. Contributing Factors (2) • Major conceptual distinctions and when they apply. • Different levels of geographical regions, and the data available at each level (nation, region, state, metropolitan area, county) • Establishment data vs. household data • Note the importance of context in the use of these terms and data. Open Forum 2000

  17. Contributing Factors (3) • Inherent ambiguity: the pay concept • Carol Hert & John Fieber, search terms from FedStats Web Page (http://www.fedstats.gov/), 11/98, 28,248 unique queries • Agency terms used for pay concept include: income, compensation, earnings, wage, salary Open Forum 2000

  18. BLS/CPS Terms • Total combined income • “includes money from jobs, net income from business, farm or rent, pensions, dividends, interest, social security payments and any other money income received” (CPS) • Compensation • “sometimes used to encompass the entire range of wages and benefits” (BLS Glossary of Compensation Terms) Open Forum 2000

  19. BLS/CPS Terms (2) • Usual weekly earnings • “include any overtime pay, commissions, or tips usually received” (CPS concepts) • Hourly earnings • “hourly rate as stated by the employer…does not include tips, commissions, or any other non-hourly wages.” (CPS interviewer manual) Open Forum 2000

  20. What does this user want?correction officer, income • Monetary income received - including that unrelated to job • Compensation, including benefits - total job package • Usual weekly earnings - including regular overtime • Hourly earnings - excluding overtime Open Forum 2000

  21. Ideas for Solutions • Goals • What needs to be solved • Possible tools and structures Open Forum 2000

  22. Goals for Possible Solutions • Maintain the distinction between agency (authority) terms and user terms. • Note the distinction between a terminology and user vocabulary • Often lack of structure, stability, or context (although patterns do exist) Open Forum 2000

  23. Not equally weighted terminologies T1 T2 Data Element Concepts Data Elements Open Forum 2000

  24. Asymmetrical Structure Agency Terms User Terms Data Element Concepts registry contents Data Elements Open Forum 2000

  25. Maintenance Issues • Indexing is not the primary function of the agency. • Less than total coverage will still help. • Can we assume: • Agency terms are adopted/defined slowly? • User terms are more volatile (especially the “trendy” ones)? • How often must mapping structures, procedures be updated? Open Forum 2000

  26. Easing Users’ Pain • No problem • same word(s), same meaning • different word(s), different meaning • Support needed (thesaurus, definitions, explanation) • different word(s), same meaning (synonyms) • same word(s) or different word(s), some relationship between meanings (e.g., BT, NT, part-of, domain specific) Open Forum 2000

  27. Same word(s) or different word(s), some undefined overlap in meaning • ??? Can these users be helped ??? • Same word(s), different meaning (if unnoticed by user) • Same word(s) or different word(s), no relationship (wrong source of information?) Open Forum 2000

  28. Providing Agency Information • Substituting agency term(s) for user term(s) and/or expanding user term(s) • Hidden or overt? • Automatic or interactive? • Displaying conceptual term clusters (e.g., gender, race, occupation) • Facilitating browsing Open Forum 2000

  29. Giving definitions and examples • source? • “official” or basic? • Highlighting usage notes (the footnotes) • Who needs to see them? • When? Open Forum 2000

  30. Crosswalk • Mapping between agency and user terms • Asymmetrical, build from users’ side • 80/20 principle for coverage • Multiple sources of terms: • Search sessions • Interviews with consultants, intermediaries • Media reports, textbooks, other “public” sources Open Forum 2000

  31. Asymmetrical Structure Agency Terms User Terms Data Element Concepts Crosswalk Data Elements Open Forum 2000

  32. “Enhanced Indexing” • Expanding agency pay terms, FedStats Web page (Hert & Haas, preliminary findings) • Assume that more overlap between terms increases users’ chances of success • Query sessions where 50% of terms were agency terms • Without expansion = 89% • With expansion = 73% Open Forum 2000

  33. Other Possibilities • Thesaurus, with relationships such as see and use for • Multilingual thesaurus or dictionary, treating terminologies as equal • Fully incorporate end-user terms into classification or data element concept entries (Desirable?) Open Forum 2000

  34. Final Points • Users are inventive in term use. • Users discourage easily. • Maintenance is a crucial concern. • Is the 80/20 principle useful? Open Forum 2000

More Related