1 / 25

Workshop on Computer-assisted Indexing

Workshop on Computer-assisted Indexing. Alexander Nevyjel. 34 th Consultative Meeting of INIS Liaison Officers 2-5 November 2008, Vienna, Austria. Agenda. Review CAI procedures ( workflow, formats, conventions) Thesaurus extension: Hidden terms tables Problems and how to overcome

oded
Télécharger la présentation

Workshop on Computer-assisted Indexing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. WorkshoponComputer-assisted Indexing Alexander Nevyjel 34th Consultative Meeting of INIS Liaison Officers 2-5 November 2008, Vienna, Austria 34th ILO Meeting

  2. Agenda • Review CAI procedures (workflow, formats, conventions) • Thesaurus extension: Hidden terms tables • Problems and how to overcome • Discussion and exchange of experiences • Hands-on training by INIS Subject Specialists(in their offices, open end for this afternoon)Tips, tricks, recommendations 34th ILO Meeting

  3. Objectives of Computer-assisted Indexing • Maintaining database quality • Saving of subject analysis manpower • Improving indexing consistency 34th ILO Meeting

  4. CAI-Workflow Batch Mode Interactive CAI Processing Conventional Processing 34th ILO Meeting

  5. CAI Batch and Online Processing • Input: MemSt-CC-yymmdd-xxxxxxxxxxx • Output: _MemSt-CC-yymmdd-xxxxxxxxxxx • MemSt is a standard prefix (meaning “member state”) • CC is the country code • yymmdd is the date when the file was generated • xxxxxxxxxxx is any additional identification • Examples • MemSt-AR-041203-thisismytestfile • MemSt-FR-041212-fileidentification 34th ILO Meeting

  6. CAI Batch Processing • Output: _MemSt-CC-yymmdd-xxxxxxxxxxx • These files will carry the CAI suggested descriptors in tag 800, preceded by the string ##CAI suggestions##; • Example: • 800^##CAI suggestions##; DESCRIPTOR1; DESCRIPTOR2; DESCRIPTOR3; ……. • sent back to the member state for reviewing 34th ILO Meeting

  7. CAI Online • File loaded to CAI online • All files of a Member State appear on the queue page as batch MemSt-XX • Please open only your own batch, do not touch other queues • Files in a queue will be opened one after the other, in the sequence as they have been loaded 34th ILO Meeting

  8. CAI Batch ProcessingReviewing Process • Delete all suggested descriptors which are too general • Add relevant descriptors which were not found • numerical values, e.g. pressure ranges, temperature ranges, etc • nuclear reactions • chemical compounds, alloys, etc. • CAI is cleaning up BT/NTs  clean up BT/NTs from manual additions • Clean up suggestions from homographic terms • Delete “##CAI suggestions## “ • Submit file to “INIS Input Box” 34th ILO Meeting

  9. CAI OnlineReviewing Process • Delete all suggested descriptors which are too general • Add relevant descriptors which were not found • numerical values, e.g. pressure ranges, temperature ranges, etc • nuclear reactions • chemical compounds, alloys, etc. • CAI is cleaning up BT/NTs  will give warnings for BT/NTs from manual additions • Clean up suggestions from homographic terms • Export file when finished • File will be exported to INIS Production System (or send back to MS for reviewing if requested) 34th ILO Meeting

  10. CAI Thesaurus extension “Hidden terms” are character patterns representing the different appearances of a concept in the free text, which is indexed by one or more descriptors. • handled similar to “forbidden terms” with one or more USE relations • CAI internal only • not exported to INIS production system • not exported to FIBRE • not printed in any appearance of the thesaurus • support identification of descriptors in the free text 34th ILO Meeting

  11. Hidden Terms: Compounds Descriptor hidden term free text MAGNESIUM BORIDES MgB_2 MgB2 MAGNESIUM CARBONATES MgCO_3 MgCO3 MAGNESIUM HYDRIDES MgH_2 MgH2 MAGNESIUM HYDROXIDES Mg(OH)_2 Mg(OH)2 IRON BROMIDES iron dibromide IRON BROMIDES iron tribromide ARSENIC IONS As"3"- As3- ACETYLENE C_2H_2 C2H2 ACETALDEHYDE C_2H_4O C2H4O ACETIC ACID C_2H_4O_2 C2H4O2 approx. 2000 hidden terms (expected 3000) 34th ILO Meeting

  12. Hidden Terms: Isotopes Descriptor hidden term free text CESIUM 137 Cesium 137, Cesium-137 "1"3"7cs 137Cs 137 caesium 137 Caesium, 137-Caesium caesium 137 Caesium 137, Caesium-137 137 cesium 137 Cesium, 137-Cesium 137 cs 137 Cs, 137-Cs 137cs 137Cs cs 137 Cs 137, Cs-137 cs"1"3"7 Cs137 cs137 Cs137 CESIUM 138 "1"3"8"mcs 138mCs cs"1"3"8"m Cs138m approx. 26.000 hidden terms 34th ILO Meeting

  13. Hidden Terms: Elementary Particles Descriptor hidden term free text B QUARKS bottom quarks T QUARKS top quarks ELECTRON NEUTRINOS #nu#_e νe MUON NEUTRINOS #nu#_#mu# νμ TAU NEUTRINOS #nu#_#tau# ντ RHO-770 MESONS #rho#(770) ρ(770) RHO-770 MESONS #rho#-770 ρ-770 OMEGA-782 MESONS #omega#(782) ω(782) OMEGA-782 MESONS #omega#-782 ω-782 KAONS NEUTRAL K"0 K0 KAONS NEUTRAL SHORT-LIVED K"0_S K0S KAONS NEUTRAL LONG-LIVED K"0_L K0L approx. 300 hidden terms 34th ILO Meeting

  14. Hidden Terms: UK/US Spellings Descriptor hidden term A CENTERS a centres ACTIVITY METERS activity metres ANALOG COMPUTERS analogue computers ANALOG SYSTEMS analogue systems ANESTHESIA anaesthesia ARCHAEOLOGY archeology AUSTRIAN ORGANIZATIONS austrian organisations BALLISTIC MISSILE DEFENSE ballistic missile defence BAYARD-ALPERT GAGES bayard-alpert gauges BEAM ANALYZERS beam analysers BEHAVIOR behaviour CATALOGS catalogues approx. 800 hidden terms 34th ILO Meeting

  15. Hidden Terms: Diacritics and Countries Descriptor hidden term Diacritics: BAECKLUND TRANSFORMATION backlund transformation BRUECKNER METHOD bruckner method BRUECKNER MODEL bruckner model BRUNSBUETTEL REACTOR brunsbuttel reactor MOESSBAUER EFFECT mossbauer effect Country Names: CAMBODIA kampuchea COTE D'IVOIRE ivory coast GREECE hellas MYANMAR burma SYRIA syrian arab republic THAILAND siam approx. 250 hidden terms 34th ILO Meeting

  16. Hidden Terms: Other Spellings Descriptor hidden term Singular/Plural FUNGI fungus FUNGI funguses G MATRIX g matrices G MATRIX g matrixes Reverse Sequence ATOM-MOLECULE COLLISIONS molecule-atom collisions ATOM-MOLECULE COLLISIONS atom-molecule scattering ATOM-MOLECULE COLLISIONS molecule-atom scattering ATOM-MOLECULE COLLISIONS atom-molecule reactions ATOM-MOLECULE COLLISIONS molecule-atom reactions ATOM-MOLECULE COLLISIONS atom-molecule interactions ATOM-MOLECULE COLLISIONS molecule-atom interactions approx. 900 hidden terms 34th ILO Meeting

  17. Hidden Terms: Other Spellings Descriptor hidden term Grammatical Variations PERIODICITY periodic PERIODICITY periodical PERIODICITY periodically Phrases versus compound terms RADIOWAVE RADIATION radio wave SPACE-TIME spacetime WAVE FUNCTIONS wavefunction Terminology GAMMA SPECTROMETERS #gamma#ray spectrometer GAMMA SPECTROMETERS #gamma#-ray spectrometer GAMMA SPECTROMETERS gammaray spectrometer GAMMA SPECTROMETERS gamma-ray spectrometer 34th ILO Meeting

  18. Hidden Terms: Other Spellings Descriptor hidden term Terminology SU-2 GROUPS su(2) theory SU-2 GROUPS su(2) symmetry SU-3 GROUPS su(3) theory SU-3 GROUPS su(3) symmetry Abbreviations CARBON DIOXIDE LASERS CO_2 laser CARBON DIOXIDE LASERS CO2 laser KOBAYASHI-MASKAWA MATRIX CKM matrix KORTEWEG-DE VRIES EQUATION kdv equation Numerical Values KEV RANGE kev MEV RANGE mev GEV RANGE gev 34th ILO Meeting

  19. CAI Thesaurus Extension • Thesaurus • Valid Descriptors 21.147 • Forbidden Terms 9.114 • CAI • Hidden Terms 34.105 • Total 64.366  Terminological Knowledge Base 34th ILO Meeting

  20. Terms which need special attentionNumerical values, ranges • ENERGY RANGES • MEV RANGE • MEV RANGE 01-10 • MEV RANGE 10-100 • MEV RANGE 100-1000 • PESSURE RANGES • Recognize pressure ranges • Translate from atm, bar, torr to Pascal • TEMPERATURE RANGES • Recognize temperature ranges • Translate from Celsius, Fahrenheit to Kelvin • Attention: the forbidden term (since 1992)high temperature USE TEMPERATURE RANGE 0400-1000 Kis leading often to wrong results 34th ILO Meeting

  21. Terms which need special attentionMulti-meaning • “+” and “-“ signs • K+ KAONS PLUS, KAONS MINUS, POTASSIUM IONS • Case sensitivity • TiN TIN (instead of TITANIUM NITRIDES) • …this can be … CaN  CALCIUM NITRIDES • gas  GALLIUM SULFIDES • “…who is the …”  WHO (World Health Organization) • Verbs versus Nouns • “… this leads us to …”  LEAD • “… this leaves it ….”  LEAVES 34th ILO Meeting

  22. Terms which need special attentionMulti-meaning • MPA • MAXIMUM PERMISSIBLE ACTIVITY • Mega Pascal (MPa) • GDP • GROSS DOMESTIC PRODUCT • GADOLINIUM PHOSPHIDES (GdP) • COBRA •  SNAKES • COBRA REACTOR  KBR-1 REACTOR • … in isotopes…..  INDIUM ISOTOPES • …at 195 deg K…  ASTATINE 195 34th ILO Meeting

  23. Terms which need special attention • Homographic terms • Solutions SOLUTIONS or MATHEMATICAL SOLUTIONS • Color  COLOR, COLOR CENTRES, COLOR MODEL • Flavor  FLAVOR, FLAVOR MODELS • Tunnel  TUNNELS, TUNNELING, TUNNEL EFFECT • Nuclear Reactions, e.g. 14N(γ,α)10B • Targets • Beams • Reactions 34th ILO Meeting

  24. Production BEAM PRODUCTION HEAT PRODUCTION HYDROGEN PRODUCTION ISOTOPE PRODUCTION PARTICLE PRODUCTION PLASMA PRODUCTION PRODUCTION Transport AIR TRANSPORT ATOM TRANSPORT BEAM TRANSPORT CHARGED-PARTICLE TRANSPORT ENVIRONMENTAL TRANSPORT PHOTON TRANSPORT RADIOACTIVITY TRANSPORT TRANSPORT Decay NUCLEAR DECAY ALPHA DECAY BETA DECAY ……. PARTICLE DECAY ELECTROMAGNETIC… HADRONIC… RADIATIVE… WEAK… Terms which need special attentionTerms which are often wrong 34th ILO Meeting

  25. CAI Hands-on training by Subject Specialists 34th ILO Meeting

More Related