Sampling in-library use

Statistics in practice: Measuring and managing Sampling in-library use Sebastian Mundt sebastian.mundt@unibw-hamburg.de

Framework • Selection • Examples • Conclusions

Sampling in library statistics “Official“ library statistics so far only allowed the full count: In libraries sampling has traditionally been used ... - for catalogue evaluation - in user surveys - in performance measurement (e.g. correct shelving). “Data referring to a period should cover the specified period in question, not the interval between two successive surveys.“ (ISO 2789:1991)

Sampling in library statistics Consequence: Important activities of use have previously not been reported in most countries. Category # datasets Libraries 35 Collections 41 Library use (lending) 19 Library use (other) 4 Expenditure 7 Library staff 4 (ISO 2789:1991) The revised International Standard ISO 2789:2001 now recommends sampling methods for ... The full count of some measures would be ... - too time consuming (costly), - practically impossible - too monotone. - information requests - in-house use - visits (gate count).

Sampling Sampling can be “selective“ as regards ... Sampling is selecting a subset of the population in question. A sample can be drawn randomly or not. The “accuracy“ of random samples can be measured in terms of error and confidence level. It depends on the sample size and the variance of the sample. - time (reporting period) - location (branch, service point) - objects (media) - persons (satisfaction, user behaviour)

Selection procedure purposive (judgement) sampling highly dependent on staff experience requires mimimum statistical knowledge ISO/FDIS 2789:2001 “The annual total is to be established from a sample count. The sample should be taken in one or more normal weeks and grossed up.” NISO Z39.7-2002 Draft Standard for Trial Use, Data Dictionary Version 2002a “A “typical week“ is a time that is neither unusually busy nor unusually slow. Avoid holidays, vacation periods, days when unusual events are taking place in the community or in the library. Choose a week in which the library is open its regular hours.“

“Typical week“ 100% 99,5% 96,8% 91,8% 85,1% 24,8% Administration: weekwise count is easier to organize. Cluster: weeks comprise days of different activity level. Visits per weekday (Münster UL) mon tue wed thu fri sat

“Typical week” % deviation of visits from annual mean (Münster UL) Periods of average activity as estimated by reference staff “Typical“ weeks can hardly be anticipated even from data collected over several years.

“Typical week“ Minimum/maximum values (Münster UL) +22,9% +21,7% max (all) +16,9% max (staff) +15,3% +15,8% +12.4% -11,6% min (staff) -15,1% -20.5% min (all) -17,4% -17,8% Data collected by purposive (judgement) sampling are a weak foundation for comparisons. -23,2% 1999 2000 1998

Selection method: case 1 Hourwise count is difficult to administer. A sample size of 52 hours (of 4,103 hours of service a year) was calculated given a confidence level of 90% and an error of +/- 11.23% Randomly and individually selected hours of the year (simple random sample) Total estimated by linear extrapolation Louisiana State University Libraries (reference statistics) Maxstadt, J.M. (1988): A new approach to reference statistics, C&RL (Feb. 1988), p. 85-88 Similar (daywise): Bauer, K. (2000): Gathering ARL reference data, http://info.med.yale.edu/assessment/methods.html

Selection method: case 2 Additional information (past data) is used to improve the sample. Separation of high and medium weeks difficult. Based on reference data of previous year, weeks were “classified“ in high, medium and low usage (stratified random sample). Sample size of 15 weeks was calculated given a confidence level of 95% and error of +/- 400 [ 10%]. Linear extrapolation of weighted class means. New York University / Bobst Library (reference statistics) Kesselman, M.; Watstein, S.B.: The measurement of reference and information services, JAL (1987, 1), p. 24-30

Selection method: case 3 Deals with “missing“ days. Allows small random sample of a few weeks once high correlation is confirmed. Extrapolation relative to boundary distribution (gate count) Found extremely high correlation (.957) between reference activity and gate count. University of South Carolina / Thomas Cooper Library (reference statistics) Lochstet, G.; Lehman, D.H.: A correlation method for collecting reference statistics, C&RL (Jan, 1999), p. 45-53

Selection method: case 4 visits refe-rence reser-vations inside reser-vations remote accountinfo rene-wals short loans visits 1.000 reference .876** 1.000 reserv. inside .802** .751** 1.000 reserv. remote .437** .347 .269** 1.000 account info .800** .765** .796** .220** 1.000 renewals In branch libraries the same datasets are collected. These can be used to extrapolate the sample count for visits and information requests. .523** .512* .568** .156** .759** 1.000 short loans .473** .383 .558** .117 .312** .140* 1.000 normal loans Which data from the library system can be used as boundary distribution? .506** .057 .656** -.019 .508** .283** .483** Münster University Library

Sampling locations main reading Branch A main 1.000 .437* 1.000 reading .593* .559* 1.000 Branch A Over the first half of 2002 no relationship between branches was found: Branch 1 Branch 2 Branch 3 Branch 4 Branch 1 1.000 Does reference activity in different branches correlate significantly? Does reference activity in different branches correlate significantly? -.064 1.000 Branch 2 .022 -.041 1.000 Branch 3 Branch 4 .122 .058 .031 1.000 Münster University Library University of the FAF / University Library, Hamburg - 4 branch libraries (3 interconnected) with separate service points and entrances in one building

Conclusions From the point of data collection management it seems useful to choose a week as sampling unit. “Normal“ weeks can hardly be anticipated even from data collected over several years. It is, however, likely that certain usage data show significant correlation and provide useful information for estimating totals. If data from automated systems are used for correlation the workload of sampling can be reduced. In-library use activities correlate with in-library use of automated systems. Significant remote use should be correlated separately (e.g. frequent e-mail reference). Sampling locations might reduce the workload of data collection further. Results, however, are ambivalent.

Sampling in-library use

Sampling in-library use

Presentation Transcript

IT use in a respondent driven sampling survey, Kampala, Uganda

Joint-Use Library: Challenges and Successes

Polyurethane Adhesive for use in Library Binding

How to Use the Library Catalog

Use of web 2.0 in Library RSS Feeds

Why should I use the library?

ERRORS IN SAMPLING

Why use the Hekman Library?

Why use the Hekman Library?

How to Use Online Library

The Use Mobile Technology In UBD Library Services

Analysis of Collection Use in the OhioLINK Library Consortium

library IEEE; use IEEE.STD_LOGIC_1164.ALL; use IEEE.STD_LOGIC_ARITH.ALL;

The use of collaborative tagging in public library catalogues

Sampling Designs Systematic Sampling Cluster Sampling Multistage Sampling

Sampling in Graphs

Estimation in Sampling!?

Estimation in Sampling

Acceptance Sampling and its Use in Probabilistic Verification

The use of collaborative tagging in public library catalogues