200 likes | 206 Vues
Graduate Curriculum for Biological Information Specialists: A Key to Integration of Scale in Biology. P. Bryan Heidorn Carole L. Palmer Dan Wright Melissa H. Cragin Graduate School of Library and Information Science University of Illinois at Urbana-Champaign
E N D
Graduate Curriculum for Biological Information Specialists: A Key to Integration of Scale in Biology P. Bryan Heidorn Carole L. Palmer Dan Wright Melissa H. Cragin Graduate School of Library and Information ScienceUniversity of Illinois at Urbana-Champaign 2nd International Digital Curation ConferenceDigital Data Curation in Practice22 November 2006 Graduate School of Library and Information Science University of Illinois at Urbana-Champaign
Outline • Scientific Collaboration Initiative • Information challenges of biological data • Research foundation of our program • Biological Information Specialists • Data Curation Concentration in MSLIS • Partners and internships • Integration of research and practice
Scientific Communication Initiative (SCI) Given: • The ever growing universe of information resources, informatics tools, and scholarly communication options that need to be understood, assessed, and coordinated SCI Aims: • Improve information transfer and integration, technology development and sustainability, and collaboration in the practice of science through • basic and applied research • education to train of information specialists to work cooperatively with research scientists • Complement, not duplicate, expertise of natural & computational scientists.
Information challenges in biology • Emergent complexity is hallmark of modern biology. • Complexity of the data is of greater consequence for scientific discovery than the volume of the data • New data practices must make sure that data at many scales is interoperable to support this kind of research • Data is active and part of with the scientific processin a changing producer-consumer economy • Scientific inquiry requires integration of lab data, procedures, literature and reference work • A critical shortage of personnel trained to manage biological information and data
Research foundation Modeling and computational neuroscience • Information and Discovery in NeurosciencePalmer, NSF IIS-0222848 Automated metadata extraction and inference • Automatic Museum Label Metatdata Extraction Heidorn, NSF DBI-9982849, NSF, DBI-0345387 • Georeferencing Museum Specimen Sources Heidorn, Moore 2005-2929-00 Terminology, schema, and ontology development • Plant Description Standards Heidorn, IMLS NR-00-01-0017-01 Collaborative data collection • BioDiversity Survey Collaboration and Verification Heidorn and Palmer, NSF BDI-0113918
Modeling and computational neuroscience • Large user group for experimental biological data, yet rarely (if ever) generate their own data sets • One of the communities making the most use of shared data repositories • Difficulties in re-purposing data collected under a specific set of experimental circumstances and constraints • Metadata difficult to gather • Needs generally not taken into account in planning, collection, and storage of experimental data
Automated metadata extraction and inference • Historical collections in botany, zoology, and entomology have been curated for centuries, along with rich metadata (~ 2 billion specimen labels) • Manual extraction is making collections globally accessible and usable (Darwin core/ABCD, DigIR, TCS) • Automated metadata extraction: HERBIS and Biogeomancer approach • Implication: Predictive ecological modeling under climate change
Terminology, schema, and ontology development Serving goals of knowledge representation, discovery, and data integration • Biodiversity: Informatics Core Ontology • Taxonomic Databases Working Group standards • Neuroscience: ontology and vocabulary development work aimed at integrating animal and human imaging data. • Text mining to ontology from texts
Biological Information Specialists At present: • Biologists at all degree levels self-trained in information technology • Information technologists at all degree levels self-trained in biology (both with gaps in knowledge for many months, years) • Differing roles of BIS in large and small
Master of Science in Biological Informatics Part of campus-wide bioinformatics masters program curriculum development funded by NSF/CISE/IIS, Education Research and Curriculum Development, 0534567 (Palmer, PI) Degree Program began September 2006 Combines Biology, Bioinformatics, Computer Science core with LIS courses from GSLIS long-standing top ranked program.
What does a BIS need to know? Biological trainingand interest in solving biological research problems Information skills • Evaluation and implementation of information systems: user based assessment and continual quality improvement for the development of tools that work and are used. • Information acquisition, management, and dissemination: development of digital libraries, data archives, institutional repositories, and related tools. • Information organization and integration: ontology development, structuring information for optimal use and sharing, and standards development.
LIS orientation • LIS is the only field concerned with the full landscape of scientific information and the interactions among fields • Focus on information needs of users, rather then internal criteria such as technical elegance • Tradition of the training scientific information professionals as informationist “The informationist concept meets a critical need for an intermediary between the expanding information universe and practitioners and researchers. Successful informationists may come from a variety of backgrounds and perform a variety of roles, but must have knowledge about both a subject domain and the process of locating, analyzing, and synthesizing information.” (Giuse et al. 2005, p. 2) emphasis added.
UIUC bioinformatics core coursework Cross-disciplinary course distribution requirement Example courses include: Bioinformatics: Computing in Molecular Biology Algorithms in Bioinformatics Principles of Systematics Computer Science: Algorithms Database Systems Biology:Human GeneticsIntroductory BiochemistryMacromolecular Modeling
Representing and Organizing Information Interfaces to Information Systems Building Digital Libraries Indexing and Abstracting Information Modeling Architecture of Networked Information Systems Information Sources and Services in the Sciences Implementation of Information Retrieval Systems Use and Users of Information Electronic Publishing Health Sciences Information Services and Resources Sample existing LIS courses
MSLIS Data Curation Concentration Data Curation Educational Program (DCEP) IMLS – Laura Bush 21st Century Librarian Program, RE-05-06-0036-06 (Heidorn, PI) Students with the DC concentration will be trained to add value to data and promote sharing across labs and disciplinary specializations
Integration of research and practice • Cooperating institutions: • Biomedical Informatics Research Network (UCSD) • Arrowsmith literature mining project (University of Illinois at Chicago, Neuroscience Dept.) • Smithsonian Institution • American Museum of Natural History • Missouri Botanical Garden • U.S. Army Strategic Environmental Management Program • MIT Data Services Librarian • Identify information problems and collect best practices from our partners to provide a broad understanding of information and data techniques, issues, and needs • Place students in internships with our partners at biological science institutions to gain real-world biological research experience • Cultivate new partners and new collaborative reseearch
New research directions Focus on integration and scale Informatics infrastructure as competitive edge Sample areas of development • Landinformatics Group Atmospheric science, hydrology, nutrient balance, carbon cycle, ecology, agronomy • Critical Zones Observatory • Focus on data integration problems across larger ranger sciences
References Giuse, N.B., Sathe, N., and Jerome, R. (2005). Envisioning the Information Specialist in Context (ISIC): A Multi-Center Study to Articulate Roles and Training Models. Medical Library Association. Palmer, C.L., Cragin, M.H., and Hogan, T.P. (2004). Information at the Intersections of Discovery: Case Studies in Neuroscience. Proceedings of the American Society for Information Science and Technology annual meeting 41: 448-455. Greenberg, Jane, P. Bryan Heidorn and Stephen Seiberling (2005). Growing Vocabularies for Plant Identification and Scientific Learning. International Conference on Dublin Core and Metadata Applications (DC-2005, Sept 15, 2005), Madrid, Spain.
Acknowledgements • Research grants: IIS 022848, DBI 0345387 • GSLIS Research Writing Group _____________________ • Scientific Communication Initiative: http://sci.lis.uiuc.edu/under development