1 / 91

Trends in Scholarly Communication

Trends in Scholarly Communication. Alex D. Wade Director, Scholarly Communication Microsoft Research. A bit about me…. Microsoft Research Labs. External Research Groups. Technology Learning Labs. Collaborative Institutes and Centers. Microsoft External Research.

jerome
Télécharger la présentation

Trends in Scholarly Communication

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Trends in Scholarly Communication Alex D. Wade Director, Scholarly Communication Microsoft Research

  2. A bit about me…

  3. Microsoft Research Labs External Research Groups Technology Learning Labs Collaborative Institutes and Centers

  4. Microsoft External Research Organization within Microsoft Research that engages in strong partnerships with academia, industry and government to advance computer science, education, and research in fields that rely heavily upon advanced computing Initiatives that focus on the research process and its role in the innovation ecosystem, including support for open access, open tools, open technology, and interoperability Developers of advanced technologies and services to support every stage of the research process

  5. External Research Global Themes Advanced Research Tools and Services

  6. External Research Global Themes • Data Intelligence: Understanding web-scale data challenges • Cloud Computing: Researching cloud-service technologies • Device Oriented Computing: Recognizing the reality is Cloud+Client • Many/Multicore: Understanding how to best exploit the emerging trends in chip design/architecture

  7. External Research Global Themes • Visualizing and Experiencing E3 Data + Information: Provide a unique experience to reduce time to insight and knowledge through visualizing data and information • Accessible Data: Ensure E3 data (remote and local sensing) is easily accessible and consumable in the scientists domain

  8. External Research Global Themes • Devices, Sensors and Mobility: Cellphone as a platform for healthcare in 2009; Proof points for the value of new modes of interaction with health data • Genomics in Healthcare: Bring the bioinformatics community to the Windows platform; Apply Microsoft research and tools to challenges in genomics • Modeling living systems (MSRC-led): Long-term healthcare impact in predictive/preventative medicine.

  9. External Research Global Themes • Scholarly Communication: Developing software tools for academics on top of MS technology to facilitate the full lifecycle of their day-to-day research workflow. Evolving modes of academic collaboration & dissemination to speed discovery • Education: Transform education through exploitation of novel uses of MS hardware/software (Gaming, Tablet PC, Surface, etc.)

  10. Mission • Tailor Microsoft software to meet the specific needs of the academic research community • Our approach: • Conduct applied projects to enhance academic productivity by evolving Microsoft’s scholarly communication offerings

  11. Why? • Listen and learn • Increase our relevance in academia • Neartime + longterm • Anticipate evolving requirements • Help researchers spend less time doing computer science and more time doing research

  12. Our Challenges • Audiences • Multi-audience problem • File formats & interoperability • Microsoft software not optimized for the specialized needs of academic researchers • IT Community in Academia • No rich ISV community filling the gap • DIY culture in academia • Open Source Software

  13. Who we work with

  14. Open Access Open Source Open Data Open Science “In order to help catalyze and facilitate the growth of advanced CI, a critical component is the adoption of open access policy for data, publications and software.” NSF Advisory Committee on Cyberinfrastructure (ACCI) • Microsoft Interoperability Principles • Open Connections to Microsoft Products • Support for Standards • Data Portability • Open Engagement http://www.microsoft.com/interop/

  15. “I [want] to clarify a common confusion that I hear from many colleagues: open source vs. open access. Although the terms are related in some ways (indeed, they derive from a very similar philosophy), they refer to two discrete concepts. • Open Access: Focuses on the unrestricted sharing of research results, typically through open access journals (PLoS ONE, PalaeontologiaElectronica, etc.). • Open Source: Computer software, typically (but not always) freely distributed, in which the source code is freely available.” http://openpaleo.blogspot.com/2009/10/its-open-access-week.html

  16. CodePlex Foundation http://www.codeplex.org/

  17. OSI Approved Open Source Licenses • Microsoft Public License (Ms-PL)http://opensource.org/licenses/ms-pl.html • Microsoft Reciprocal License (Ms-RL)http://opensource.org/licenses/ms-rl.html

  18. Themes • ‘Traditional’ Scholarly Communication • Original goals • Reactions to current state • Trends in Computing & Software • A New Paradigm in Research • Trends in Scholarly Communication

  19. Traditional Scholarly Communication

  20. The needs of academic researchers? • Discover and digest existing research • Conduct research • Experimentation (lots of domain specific stuff) • Data Collection & Analysis (computer programming) • Collaboration • Communicate research • Author • Disseminate • Measure Impact

  21. For the sake of Scientific Progress • Contract with Researchers • If you share • Then you get credit; science progresses more rapidly • Advantages • Registration • Validation & Quality • Discovery and Access • Aggregation • Dissemination • Reach, Speed, Efficiency

  22. Issues with the current system…

  23. Quality Control • Validation takes time & expertise • Is it fulfilling the promise WRT the science • Can this process keep pace with the growing volume and complexity of research • Tension between timeliness and quality • Preprints in physics

  24. Discovery & Access • Impediments: Business models and IP ownership have led to barriers to access • Fragmentation: Information is channeled into silos and gated communities, making aggregation and analysis cumbersome (if not impossible) • Timeliness: Research cycles to are slowed by the discovery  publication time lag • Products: System still anchored to the 'article' container to the exclusion of all else

  25. A Jumble “The truth is that [legal information is] an impossible jumble of materials: in various different formats - Word, pdf, HTML, text; some freely available, some only available for a fee; some only available through private vendors; some reachable by search engines, some not; some available in authenticated formats, some not; some timely, some not; some available in bulk downloadable formats for re-use, some not. The problem is not that legal information isn't available; it's that access to it is highly unstandardized.” https://clients.outsellinc.com/insights/?p=11087

  26. Research Library Expenditure Trends

  27. Registration and Measurement • Publish or Perish • “My Data” & “My Research” protectionism • Dark data and negative correlations • Citation Analysis and Impact Factor

  28. Trends inComputing & Software

  29. A Sea Change in Computing • Massive Data Sets • Federation, Integration & • Collaboration • There will be more scientific • data generated in the next • five years than in the history of • humankind • Evolution of • Many-core &Multicore • Parallelism everywhere • What will you do with • 100 times more • computing power? • The power of the • Client + Cloud • Access Anywhere, Any Time • Distributed, loosely-coupled, • applications at scale across • all devices will be the norm

  30. Data Tidal Wave

  31. A Digital Data Deluge in Research • Data collection • Sensor networks, satellite surveys, high throughput laboratory instruments, observation devices, supercomputers, LHC … • Data processing, analysis, visualization • Legacy codes, workflows, data mining, indexing, searching, graphics … • Archiving • Digital repositories, libraries, preservation, … • SensorMap • Functionality: Map navigation • Data: sensor-generated temperature, video camera feed, traffic feeds, etc. • Scientific visualizations • NSF Cyberinfrastructure report, March 2007

  32. Wireless Sensor Networks • Uses 200 wireless (Intel) computers, with 10 sensors each, monitoring • Air temperature, moisture • Soil temperature, moisture, at least in two depths (5cm, 20 cm) • Light (intensity, composition) • Soon gases (CO2, O2, CH4, …) • Long-term continuous data • Small (hidden) and affordable (many) • Less disturbance • >200 million measurements/year • Complex database of sensor data and samples With K. Szlavecz and A. Terzis at Johns Hopkins http://lifeunderyourfeet.org

  33. Joe Hellerstein—UC Berkeley Blog: “The Commoditization of Massive Data Analysis” • We’re not even to the Industrial Revolution of Data yet… • “…since most of the digital information available today is still individually "handmade": prose on web pages, data entered into forms, videos and music edited and uploaded to servers. But we are starting to see the rise of automatic data generation "factories" such as software logs, UPC scanners, RFID, GPS transceivers, video and audio feeds. These automated processes can stamp out data at volumes that will quickly dwarf the collective productivity of content authors worldwide. Meanwhile, disk capacities are growing exponentially,so the cost of archiving this data remains modest. And there are plenty of reasons to believe that this data has value in a wide variety of settings. The last step of the revolution is thecommoditization of data analysis software, to serve a broad class of users.” • How this will interact with the push toward data-centric web services and cloud computing? • Will users stage massive datasets of proprietary information within the cloud? • How will they get petabytes of data shipped and installed at a hosting facility? • Given the number of computers required for massive-scale analytics, what kinds of access will service providers be able to economically offer?

  34. The Future: an Explosion of Data Experiments Simulations Archives Literature Instruments The Challenge: Enable Discovery. Deliver the capability to mine, search and analyze this data in near real time. Enhance our Lives Participate in our own health care. Augment experience with deeper understanding. Petabytes Doubling every 2 years

  35. The Cloud • A model of computation and data storage based on “pay as you go” access to “unlimited” remote data center capabilities • A cloud infrastructure provides a framework to manage scalable, reliable, on-demand access to applications • A cloud is the “invisible” backend to many of our mobile applications • Historical roots in today’s Internet apps • Search, email, social networks • File storage (Live Mesh, MobileMe, Flicker, …)

  36. Types of Cloud Computing • Utility computing [infrastructure] • Amazon's success in providing virtual machine instances, storage, and computation at pay-as-you-go utility pricing was the breakthrough in this category, and now everyone wants to play. Developers, not end-users, are the target of this kind of cloud computing. • Platform as a Service[platform] • One step up from pure utility computing are platforms like Google AppEngine and Salesforce'sforce.com, which hide machine instances behind higher-level APIs. Porting an application from one of these platforms to another is more like porting from Mac to Windows than from one Linux distribution to another. • End-user applications [software] • Any web application is a cloud application in the sense that it resides in the cloud. Google, Amazon, Facebook, twitter, flickr, and virtually every other Web 2.0 application is a cloud application in this sense. From: Tim O'Reilly, O'Reilly Radar (10/26/08)—”Web 2.0 and Cloud Computing”

  37. The Rationale for Cloud Computing in eResearch • We can expect research environments will follow similar trends to the commercial sector • Leverage computing and data storage in the cloud • Small organizations need access to large scale resources • Scientists already experimenting with Amazon S3 and EC2 services • For many of the same reasons • Small, silo’ed research teams • Little/no resource-sharing across labs • High storage costs • Physical space limitations • Low resource utilization • Excess capacity • High costs of acquiring, operating and reliably maintaining machines is prohibitive • Little support for developers, system operators

  38. Cloud Landscape Still Developing • Tools are available • Flickr, SmugMug, and many others for photos • YouTube, SciVee, Viddler, Bioscreencast for video • Slideshare for presentations • Google Docs for word processing and spreadsheets • Data Hosting Services & Compute Services • Amazon’s S3 and EC2 offerings • Archiving / Preservation • “DuraCloud” Project (in planning by DuraSpace organization) • Developing business models • Service-provision (sustainability) • NSF’s “DataNet” – developing a culture, new organizations

  39. Semantic Computing

  40. A “Smart” Cyberinfrastructure for Research

  41. Why Semantic Computing http://cacm.acm.org/magazines/2009/12/52840-a-smart-cyberinfrastructure-for-research

  42. “Semantics-based computing” vs. “Semantic web” • There is a distinction between the general approach of computing based on semantic technologies (e.g. machine learning, neural networks, ontologies, inference, etc.) and the semantic web – used to refer to a specific ecosystem of technologies, like RDF and OWL • The semantic web is just one of the many tools at our disposal when building semantics-based solutions

  43. Towards a smart cyberinfrastructure? • Leveraging Collective Intelligence • If last.fm can recommend what song to broadcast to me based on what my friends are listening to, the cyberinfrastructure of the future should recommend articles of potential interest based on what the experts in the field that I respect are reading? • Examples are emerging but the process is presently more manual – e.g. Connotea, Faculty of 1000, etc. • Semantic Computing • Automatic correlation of scientific data • Smart composition of services and functionality • Leverage cloud computing to aggregate, process, analyze and visualize data

  44. A world where all data is linked… • Important/key considerations • Formats or “well-known” representationsof data/information • Pervasive access protocols are key (e.g. HTTP) • Data/information is uniquely identified (e.g. URIs) • Links/associations between data/information • Data/information is inter-connected through machine-interpretable information (e.g. paper Xis about star Y) • Social networks are a special case of ‘data networks’ Attribution: Richard Cyganiak; http://linkeddata.org/

  45. …and stored/processed/analyzed in the cloud visualization and analysis services scholarly communications Vision of Future Research Environment with both Software + Services domain-specific services search books citations blogs &social networking Reference management instant messaging identity mail Project management notification document store storage/data services knowledge management The Microsoft Technical Computing mission to reduce time to scientific insights is exemplified by the June 13, 2007 release of a set of four free software tools designed to advance AIDS vaccine research. The code for the tools is available now via CodePlex, an online portal created by Microsoft in 2006 to foster collaborative software development projects and host shared source code. Microsoft researchers hope that the tools will help the worldwide scientific community take new strides toward an AIDS vaccine. See more. compute services virtualization knowledge discovery

  46. A New Research Paradigm • Digital technologies are completely revolutionizing the way that researchers work… in all subject areas. Sarah Porter, JISC http://www.jisc.ac.uk/whatwedo/campaigns/res3/video.aspx

  47. “Digital technologies are also facilitating collaboration between researchers; which means knowledge can be shared more quickly and more effectively, and the potential for progress is massively advanced.” JISC report http://www.jisc.ac.uk/whatwedo/campaigns/res3/video.aspx

  48. eResearch: data is easily shareable Sloan Digital Sky Server/SkyServer http://cas.sdss.org/dr5/en/

More Related