1 / 26

D o It Yourself Primo Statistics The Art of the (Relatively) Painless Extraction

D o It Yourself Primo Statistics The Art of the (Relatively) Painless Extraction. Anne L. Highsmith Director, Consortia Systems Texas A&M University hismith@tamu.edu http:// library.tamu.edu/directory/hismith. Our Environment. Our Primo Environment.

zanta
Télécharger la présentation

D o It Yourself Primo Statistics The Art of the (Relatively) Painless Extraction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Do It Yourself Primo Statistics The Art of the (Relatively) Painless Extraction Anne L. HighsmithDirector, Consortia SystemsTexas A&M Universityhismith@tamu.eduhttp://library.tamu.edu/directory/hismith

  2. Our Environment

  3. Our Primo Environment • Texas A&M University is a hosted, Direct customer, in production since June 2012. • As a hosted customer, we have a staging system as well as production. All program development for these extracts has been done on the production system. • We are currently on release 4.4.1

  4. Our Reporting Environment • Report server with an Oracle database • Oracle is separately licensed, so we can do development on it • Contains SFX/MetaLib extracts and statistics and a full copy of the Voyager database, rebuilt nightly from backup

  5. Viewing the Views

  6. How to see what’s available • Log in as primo user • Execute: s+ RPT00 • Execute: SELECT VIEW_NAME FROM ALL_VIEWS WHERE OWNER LIKE ‘%RPT00’ • CLICK_EVENTS • SEARCH_STATISTICS • SEARCH_STRINGS • To see view definition, execute: SELECT TEXT FROM ALL_VIEWS WHERE VIEW_NAME = ‘CLICK_EVENTS’

  7. SELECT ID, SUMMARY_TIMESTAMP EVENT_DATE, CLICK_TYPE EVENT_TYPE, CASE WHEN CLICK_VALUE='N/A' THEN '' ELSE CLICK_VALUE END CLICK_VALUE, CLICK_COUNT, SOURCE_VIEW, SOURCE_INSTITUTION, SOURCE_ON_CAMPUS, SOURCE_USER_GROUP from P41_PRM00.S_CLICK_SUMMARIES WHERE CLICK_TYPE NOT IN ('File System', 'DB Listener', 'Load', 'Indexes', 'Table Space', 'Search Problem', 'IO Wait', 'Memory')

  8. View Definitions • All stats views seem to be based on S_SEARCH_SUMMARIES & S_CLICK_SUMMARIES tables • Notice that CLICK_EVENTS excludes some system-type stats • SEARCH_STATISTICS is a subset of S_SEARCH_SUMMARIES, where SUMMARY_TYPE='SEARCH_COUNT‘ • SEARCH_STRINGS is a subset of S_SEARCH_SUMMARIES, where SUMMARY_TYPE = 'TOP_SEARCHES_SUMMARY'

  9. Pop quiz #1 • In 1745, settlers from the English colonies, assisted by the British fleet, invaded and captured the capital of one of the provinces of New France. Which one? • (Will accept the name of the French province, the modern Canadian province of which it is a part, the fortress, or the place name.)

  10. Data Anomalies

  11. SQL vs. BIRT Reports • Replicate BIRT report for Click Events

  12. SQL Selection Criteria Issues • Some tables contain “junk” • Out of 10M rows in the CLICK_EVENTS view, 36% had no institution name • Myriad variations in INSTITUTION_NAME

  13. Basic Selection Criteria SELECT event_type, click_value, click_count, institution, \"VIEW\" AS view_name, on_campus, user_group FROM p41_rpt00.click_events WHERE to_char(event_date,'YYYYMM') = '$previous_month' AND institution is not null AND lower(institution) not like 'primo%'

  14. Scope Names • Hoped that SCOPE_NAME would be equivalent to the Search Scope Name as it appears on the Search Scope List in the Primo Back Office. • Current default SCOPE_NAME appears as: • scope:("MSL"),scope:(libguides),scope:(archon),scope:(AMDB_VOYAGER),scope:(TAMU-SFX ),scope:(EVANS),scope:(tamu_dspace_qdc),primo_central_multiple_fe • Collected all known SCOPE_NAME values in a Perl module, TAMU_Primo.pm

  15. Scope Types • SEARCH_STATISTICS and SEARCH_STRINGS views contain an element called SCOPE_TYPE • SCOPE_TYPE in SEARCH_STRINGS should be limited to LOCAL/REMOTE • SCOPE_TYPE IN SEARCH_STATISTICS should be limited to LOCAL/REMOTE/DS

  16. Scope Types (Continued) • SEARCH_STATISTICS – 16% of SCOPE_TYPE values are something other than LOCAL/REMOTE • SEARCH_STRINGS – 12% of SCOPE_TYPE values are something other than LOCAL/REMOTE/DS • If the retrieved value didn’t match the list of defined values, I set it to null.

  17. Data I Can’t Make Sense of • SEARCH_STRINGS has only 149,127 rows in the view • Are these unique strings? • If yes, why does the same string appear in different rows? • What do the numbers, such as AVERAGE_RESULTS and SEARCH_COUNT, really mean?

  18. Example • “Fluid mechanics” appears as a search string in the default scope 5 times in the period 1/18/2014-3/5/2014. • AVERAGE_RESULTS by date • 18-Jan-14 210677 • 31-Mar-14 150528 • 27-Feb-14 58544 • 5-Mar-14 58576 • 5-Mar-14 74119

  19. Pop quiz #2 • Which city in western Canada was the birthplace/hometown of the following personalities: • Deanna Durbin, actress • Anna Pacquin, actress • Doug Henning, magician and entertainer • Sir William Stephenson, AKA Intrepid, spy • Guy Gavriel Kay, novelist and poet • Brett Hull, professional hockey player • Marshall McLuhan, media guru • Fred Turner, musician, Bachman-Turner Overdrive • Monty Hall, host of Let’s Make a Deal

  20. Perl Extract Programs

  21. Generalities • The extract and processing programs for the TAMU report server are written in Perl; the front end is written in PHP • The Primo stats extract programs I have written live on the production Primo server; they sftp output to the report server • The perl programs use a local symlink from /exlibris/product/perl-5.8.9/bin/perl to /exlibris/primo/scripts/perl

  22. Generalities (Continued) • The Primo group consists of 5 Perl programs and 1 module • click_extract.pl, click_compile.pl, facets.pl, search_statistics.pl, search_strings.pl, TAMU_Primo.pm • click_extract.pl extracts data from the CLICK_EVENTS view and stores it in output files, which are mined by click_compile.pl & facets.pl to create useful output. • search_statistics.pl & search_strings.pl extract data from their corresponding views to an output file

  23. Generalities (Continued) • Programs are designed to be run on a monthly basis, to be put into a cronjob and cumulate the previous month’s data. But they can also be run from the command line with parameters that let you select other months earlier in the calendar. • The programs that create output files also have a step to sftp the output to a different server. But you have to do the sftp setup between servers yourself.

  24. A Few Specifics • Facets.pl creates 2 sets of output files – one set which cumulates all facet requests and a second one that provides detail about certain facet types • If it’s a domain, language, library, resource type, or top-level facet, it cumulates the individual values under each of those types. So you would know how many times the facet for English language was applied or the facet for Thesis resource type.

  25. Normalization • Contained in TAMU_Primo.pm • Defines variations in the institution value, code versus spelled out name, and normalizes them all to the codes • Defines a list of valid view names • Normalizes the user groups. • Defines a long list of valid scope_names • Search_statistics.pl collects undefined scope_names and emails the list to a designated email account so that the list can be updated

  26. Pop quiz #3 • In 1993, the National Hockey League changed its conference and division names to boring stuff like “Eastern Conference” and “Pacific Division”. Before that, the historic conference and division names were based on people who had something to do with hockey and the history of the NHL. Give one of those old conference or division names.

More Related