580 likes | 602 Vues
Information Science: Where does it come from and where is it going?. Tefko Saracevic, PhD School of Communication, Information and Library Studies Rutgers University New Brunswick, New Jersey USA http://www.scils.rutgers.edu/~tefko. Gutenberg 1397-1468.
E N D
Information Science: Where does it come from and where is it going? Tefko Saracevic, PhD School of Communication, Information and Library Studies Rutgers University New Brunswick, New Jersey USA http://www.scils.rutgers.edu/~tefko Gutenberg1397-1468 © Tefko Saracevic
Information science: a short definition “the collection, classification, storage, retrieval, and dissemination of recorded knowledge treated both as a pure and as an applied science” Merriam-Webster © Tefko Saracevic
Organization of presentation • Big picture – problems, solutions, social place • Structure – main areas in research & practice • Technology – information retrieval – largest part • Information – representation; bibliometrics • People – users, use, seeking, context • Paradigm split – distancing of areas • Relations– librarianship, computer science • Digital libraries – whose are they anyhow? • Conclusions– big questions for the future © Tefko Saracevic
Part 1. The big pictureProblems addressed • Bit of history: Vannevar Bush (1945): • Defined problem as “... the massive task of making more accessible of a bewildering store of knowledge.” • Problem still with us & growing 1890-1974 © Tefko Saracevic
… solution • Bush suggested a machine: “Memex ... association of ideas ... duplicate mental processes artificially.” • Technological fix to problem • Still with us: technological determinant © Tefko Saracevic
At the base of information science:Problem Trying to control content in • Information explosion • exponential growth of information artifacts, if not of information itself PLUS today • Communication explosion • exponential growth of means and ways by which information is communicated, transmitted, accesses, used © Tefko Saracevic
applying technology to solving problems of effective use of information BUT: from aHUMAN & SOCIAL and not only TECHNOLOGICAL perspective technological solution, BUT … © Tefko Saracevic
People Information Technology or a symbolic model © Tefko Saracevic
Problems & solutions:SOCIAL CONTEXT • Professional practice AND scientific inquiry related to: Effective communication of knowledge records - ‘literature’ - among humans in the context of social, organizational, & individual need for and use of information • Taking advantage of modern information technology © Tefko Saracevic
or as White & McCaine(1998) put it: “modeling the world of publications with a practical goal of being able to deliver their content to inquirers [users] on demand.” © Tefko Saracevic
General characteristics • Interdisciplinarity - relations with a number of fields, some more or less predominant • Technological imperative - driving force, as in many modern fields • Information society - social context and role in evolution - shared with many fields Table of content © Tefko Saracevic
Part 2. StructureComposition of the field • As many fields, information science has different areas of concentration & specialization • They change, evolve over time • grow closer, grow apart • ignore each other, less or more • sometimes fight © Tefko Saracevic
most importantly different areas… • receive more or less in funding & emphasis • producing great imbalances in work & progress • attracting different audiences & fields • this includes • vastly different levels of support for research and • huge commercial investments & applications © Tefko Saracevic
Information or People or How to view structure? by decomposing areas & efforts in research & practice emphasizing Technology Table of content © Tefko Saracevic
Part 3. Technology • Identified with information retrieval (IR) • by far biggest effort and investment • international & global • commercial interest large & growing © Tefko Saracevic
Information Retrieval – definition & objective “ IR: ... intellectual aspects of description of information, ... search, ... & systems, machines...” Calvin Mooers, 1951 • How to provide users with relevant information effectively? For that objective: 1. How to organize information intellectually? 2. How to specify the search & interaction intellectually? 3. What techniques & systems to use effectively? 1919-1994 © Tefko Saracevic
Streams in IR Res. & Dev. 1.Information science: • Services, users, use; • Human-computer interaction; • Cognitive aspects 2. Computer science: • Algorithms, techniques • Systems aspects; evaluation 3. Information industry: • Products, services, Web • search engines – BIG! • Market aspects Problem: • relative isolation – discussed later © Tefko Saracevic
Started in the US through government support & in information science Now mostly done within computer science e.g Special Interest Group on IR, Association for Computing Machinery (SIGIR,ACM) IR research Gerard Salton1927-1995 © Tefko Saracevic
Contemporary IR research • Spread globally • e.g. major IR research communities emerged in China, Korea, Singapore • Branched outside of information science - “everybody does information retrieval” • search engines, data mining, natural language processing, artificial intelligence, computer graphics … © Tefko Saracevic
Major component of IR made it strong & affected innovation Long history – started with Cranfield tests in late 1950’s Measures – precision & recall based on relevance Testing in IR Cyril Cleverdon 1914-1997 © Tefko Saracevic
Text REtrieval Conference (TREC) • Major research, laboratory effort • Started in 1992, • “support research within the IR community by providing the infrastructure necessary for large-scale evaluation” • Methods • provides large test beds, queries, relevance judgments, comparative analyses • essentially using Cranfield 1960’s methodology • organized around tracks • various topics – changing over years © Tefko Saracevic
TREC impact • International – big impact on creating research communities • Annual conferences • reports, exchange results, foster cooperation • Results • mostly in reports, available at http://trec.nist.gov/pubs.html • overviews provided as well • but, only a fraction published in journals • Book (2005): • TREC: Experiment and Evaluation in Information RetrievalEdited by Ellen M. Voorhees and Donna K. Harman © Tefko Saracevic
Genomics Spam Blog Question answering Enterprise Million query (new) Legal Previous tracks: ad-hoc (1992-1999) routing (92–97) interactive (94-02) filtering (95-02) cross language (97-02) speech (97-00) Spanish (94-96) video (00-01) Chinese (96-97) query (98-00) and a few more run for two years only TREC tracks 2007116 groups from 20 countries © Tefko Saracevic
Broadening of IR – sample ever changing, ever new areas added • Cross language IR (CLIR) • Natural language processing (NLP IR) • Music IR (MIR) • Image, video, multimedia retrieval • Spoken language retrieval • IR for bioinformatics and genomics • Summarization; text extraction • Question answering • Many human-computer interactions • XML IR • Web IR; Web search engines • IR in context – big area for major search engines & newer research © Tefko Saracevic
Commercial IR • Search engines based on IR • But added many elaborations & significant innovations • dealing with HUGE number of pages fast • countering spamming & page rank games – adversarial IR - combat of algorithms • adding context for searching • Spread & impact worldwide • about 2000 engines in over 160 countries • English was dominant, but not any more © Tefko Saracevic
Commercial IR: brave new world • Large investments & economic sector • hope for big profits, as yet questionable • Leading to proprietary, secret IR • also aggressive hiring of best talent • new commercial research centers in different countries (e.g. MS in China) • Academic research funding is changing • brain drain from academe • Commercial search engines facing many challenges – hiring best talent • and providing brain-drain for academics © Tefko Saracevic
IR successfully effected: • Emergence & growth of the INFORMATION INDUSTRY • Evolution of IS as a PROFESSION & SCIENCE • Many APPLICATIONS in many fields • including on the Web – search engines • Improvements in HUMAN - COMPUTER INTERACTION • Evolution of INTEDISCIPLINARITY IR has a long, proud history Table of content © Tefko Saracevic
Part 4. Information • Several areas of investigation; • as basic phenomenon – not much progress • measures as Shannon's not successful • concentrated on manifestations and effects • no recent progress in this basic research • information representation • large area connected with IR, librarianship • metadata • bibliometrics • structures of literature © Tefko Saracevic
What is information? Intuitively well understood, but formally not well stated • Several viewpoints, models emerged • Shannon: source-channel-destination • signals not content – not really applicable, despite many tries • Cognitive: changes in cognitive structures • content processing & effects • Social: context, situation • information seeking, tasks © Tefko Saracevic
Information in information science:Three senses (from narrowest to broadest) • Information in terms of decision involving little or no cognitive processing • signals, bits, straightforward data - e.g.. inf. theory (Shanon), economics, • Information involving cognitive processing & understanding • understanding, matching texts, Brookes • Information also as related to context, situation, problem-at-hand • USERS, USE,TASK For information science (including information retrieval): third, broadest interpretation necessary © Tefko Saracevic
Bibliometrics “… the quantitative treatment of the properties of recorded discourse and behavior pertaining to it.”Fairthorne, 1969 • Many quantitative studies & some laws • Bradford’s law, Lotka’s law – regularities • quantity/yield distributions of journals, authors • also related areas: • Scientometrics • covering science in general, not just publications • Infometrics • all information objects • Webmetrics or cybermetrics • using bibliometric techniques to study the web Table of content © Tefko Saracevic
Part 5. People • Professional services • in organization – moving toward knowledge management, competitive intelligence • in industry – vendors, aggregators, Internet, • Research • user & use studies • interaction studies • broadening to information seeking studies, social context, collaboration • relevance studies • social informatics © Tefko Saracevic
User & use studies • Oldest area • covers many topics, methods, orientations • many studies related to IR • e.g. searching, multitasking, browsing, navigation • theoretical & experimental studies on relevance • Branching into Web use studies • quantitative & qualitative studies • emergence of webmetrics © Tefko Saracevic
Interaction • Traditional IR model concentrates on matching but not on user side & interaction • Several interaction models suggested • Ingwersen’s cognitive, Belkin’s episode, Saracevic’s stratified model • hard to get experiments & confirmation • Considered key to providing • basis for better design • understanding of use of systems • Web interactions: a major new area © Tefko Saracevic
Information seeking • Concentrates on broader context not only IR or interaction, people as they move in life & work • Number of models provided • e.g. Kuhlthau’s information search process, Järvelin’s information seeking • Includes studies of ‘life in the round,’ making sense, information encountering, work life, information discovery • Based on concept of social construction of information Table of content © Tefko Saracevic
Part 6. Paradigm split in technology - people • Split from early 80’s to date into: System-centered • algorithms, TREC, search engines • continue traditional IR model Human-(user)-centered • cognitive, situational, user studies • interaction models, some started in TREC • relevance studies © Tefko Saracevic
Human vs. system • Human (user) side: • often highly critical, even one-sided • mantra of implications for design • but does not deliver concretely • System side: • mostly ignores user side & studies • ‘tell us what to do & we will’ • Issue NOT H or S approach • even less H vs. S • but how can H AND S work together • major challenge for the future © Tefko Saracevic
IR in computer science completely technology oriented VERY international not aware at all of the other side SIGIR growing a lot: 2007 subm. 490, accept. 85, 17% 2006 subm. 399, accept. 74, 19% 1999 subm. 135, accept. 33, 24% IR, user studies, services in information science mostly people oriented aware, but participating less with other side only a few LIS people come to SIGIR, even fewer SIGIR to ASIST, none to ALA Great separation © Tefko Saracevic
Calls vs support • Many calls for user-centered or human-centered design, approaches & evaluation • Number of works discussing it, but few proposing concrete solutions • But: most support for system work • in the digital age support is for digital • Recent attempt at combining two views: Book: Ingerwersen, P. and Järvelin, K. (2005). The Turn: Integration of information seeking and retrieval in context.Springer. Table of content © Tefko Saracevic
Part 7. Relations, alliances, competition • With a number of fields... • Strongest: 1. Librarianship 2. Computer science © Tefko Saracevic
Common grounds IS & librarianship share: • Social role in information society • Concern with effective utilization of graphic & other types of records • Research problems related to a number of topics • Transfer to & from information retrieval © Tefko Saracevic
Differences IS & librarianship differ in: • Selection & definition of many problems addressed • Theoretical questions & framework • Nature & degree of experimentation • Tools and approaches used • Nature & strength of interdisciplinary relations © Tefko Saracevic
One field or two? • Point of many debates • Suggest: TWO fields in strong interdisciplinary relations • Not a matter of “better” or “worse” - matters little • common arguments between many fields • Differences matter in: • problem selection & definition • agenda, paradigms • theory, methodology • practical solutions, systems • Best example: IR & library automation © Tefko Saracevic
Which? • Librarianship. Information science • Library and information science • Libraryandinformationscience • Michael Buckland’s suggestion • Information science • Information sciences • Information • like in the “Information School” © Tefko Saracevic
IS & computer science • CS primarily about algorithms • IS primarily about information and its users and use • Not in competition, but complementary • Growing number of computer scientists active in IS – particularly in IR and digital libraries • Concentrating on • advanced IR algorithms & techniques • digital library infrastructure & various domains • human computer interaction © Tefko Saracevic
Interaction and IS • Two streams: • computer-human interaction • human-computer interaction • Many studies on: • machine aspects of interaction • human variables in interaction • Problems: little feedback between • very hard to evaluate • Web interactions: a major area • Another interdisciplinary area • computers sc., cognitive sc., ergonomics, Table of content © Tefko Saracevic
Part 8.Digital libraries • LARGE & growing area • “Hot” area in R&D • a number of large grants & projects in the US, European Union, & other countries • but “DIGITAL” big & “libraries“ small • “Hot” area in practice • building digital collections, hybrid libraries, • many projects throughout the world • but in the US funding drying out © Tefko Saracevic
Technical problems • Substantial - larger & more complex than anticipated: • representing, storing & retrieving of library objects • particularly if originally designed to be printed & then digitized • operationally managing large collections - issues of scale • dealing with diverse & distributed collections • interoperability; federated searching • assuring preservation & persistence • incorporating rights management © Tefko Saracevic
Research issues • understanding objects in DL • representing in many formats • metadata, cataloging, indexing • conversion, digitization • organizing large collections • managing collections, scaling • preservation, archiving • interoperability, standardization • accessing, using, searching • federated searching of distributed collections • evaluation of digital libraries © Tefko Saracevic
DL projects in practice • Heavily oriented toward institutions & their missions • in libraries, but also others • museums, societies, government, commercial • come in many varieties • Spread globally • including digitization • U California, Berkeley’s Libweb“lists over 7700 pages from libraries in over 145 countries” • Spending increasing significantly • often a trade-off for other resources © Tefko Saracevic