1 / 21

Science-driven informatics for pcori pprn

Science-driven informatics for pcori pprn. Kristen Anton UNC Chapel Hill/ White River Computing Dan Crichton White River Computing February 3, 2014. From Information Design , Nathan Shedroff. White River Computing / UNC Chapel Hill. Architecture – what is it?. Architecture:.

anevay
Télécharger la présentation

Science-driven informatics for pcori pprn

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Science-driven informaticsfor pcoripprn Kristen Anton UNC Chapel Hill/ White River Computing Dan Crichton White River Computing February 3, 2014

  2. From Information Design, Nathan Shedroff White River Computing / UNC Chapel Hill

  3. Architecture – what is it? Architecture: • The fundamental organization of a system embodied in its components, their relationships to each other and to the environment, and the principles guiding its design and evolution (ANSI/IEEE Std. 1471-2000) Architecture is decomposed into four core pieces : • Process Architecture – describes the core processes for the system • Data Architecture – describes the information models and data standards for the system • Application Architecture – Portals, tools, etc. • Technology Architecture – Infrastructure elements White River Computing / UNC Chapel Hill

  4. Architecture Development Approach • Identify the drivers and requirements • Create an architectural description of the system – identified stakeholders, concerns and associated models • Identify core architectural principals • Separate the architecture into key viewpoints • Create a decomposition of the system identifying the elements and mapping to the requirements • Identified the high-level flows and analyze from the rpocess, information and application/technology perspectives • Generate the architectural models White River Computing / UNC Chapel Hill

  5. The view is what you see The viewpoint is where you look from Communicating an Architecture • One of the major challenges is communicating an architecture • Who are the PCORI stakeholders that care about the architecture? • How do we communicate their care-abouts? • Determine a useful view of the system for the stakeholder • Projects have suffered because a useful view wasn’t provided (Stakeholders) White River Computing / UNC Chapel Hill

  6. Software Development • The organization, implementation and deployment of the software should follow the identification of an architecture which aligns with the principles and needs of the stakeholders • The separation of the architecture into concerns will let us determine what capabilities exist and what capabilities need to be developed • Ultimately this will help to ensure that a system is deployed which will integrate White River Computing / UNC Chapel Hill

  7. Recommended Software Development Approach Project Formulation Jan 2014 – Mar 2014 Project Organization, Objectives, High Level Schedule and Project Plan System Formulation/ Architecture High-Level Architecture for System and Data, Architecture, Data Flows, Initial Data Structure, etc Feb 2014 – June 2014 June 2014 – June 2015 Site Development Development and deployment of the infrastructure and architecture; development of the core data model/ consistent with PCORnet “universal” data model? White River Computing / UNC Chapel Hill

  8. Supporting science-driven research needs: Case Study – Early Detection Research Network (EDRN) • Research network of collaborating scientists from more than 40 institutions – international network of networks • Focus on identifying and validating biomarkers of cancer at early stage/ preclinical Bioinformatics challenges in EDRN:Developing computing infrastructure that is “biomarker-centric.” Improve research capability by enabling real-time access to a variety of information that crosses institutional boundaries. White River Computing / UNC Chapel Hill

  9. Bioinformatics – Goals Supporting science-driven research needs • Coordinated discovery and validation of biomarkers across cancer research centers to increase accuracy of the results of studies • Accommodating various data types • Facilitation of analytics through data integration and single-point access • Support workflows associated with various types of information • Encouraging and supporting collaboration White River Computing / UNC Chapel Hill

  10. Bioinformatics – Goals Supporting science-driven research needs • Linking highly diverse systems together to integrate and present data for analytics • Defining a comprehensive information model for describing the problem space/ ontology • Providing software interfaces for capture, discovery, and access of data resources • Providing a secure transfer and distribution infrastructure • Enabling all data sources to be heterogeneous and distributed • Providing integrated portal for access to distributed data • Providing bioinformatics tools/ pipelines for uniform data processing White River Computing / UNC Chapel Hill

  11. Bioinformatics EDRN Knowledge Environment Functional architecture: Services • Data capture • Data discovery • Data access • Data retrieval • Data processing • Data distribution White River Computing / UNC Chapel Hill

  12. Bioinformatics EDRN Knowledge Environment Information architecture: Data Model across EDRN projects (“universal” data model) • Representation of information associated with data objects managed within the knowledge system • Models for: • Biomarkers • Studies • Participants • Organs • Data generated from instruments (e.g. mass spec, arrays) White River Computing / UNC Chapel Hill

  13. Bioinformatics EDRN Knowledge Environment Information architecture: Data Model • Relationships between and among objects • Standard set of metadata elements that can be used for annotating objects • Multiple metadata schemata for machine usable explanations of the metadata descriptions • Metadata descriptions describe the inception and composition of data • Common language for describing data and associated attributes: Common Data Elements (CDEs) • CDE has a Uniform Resource Identifier (URI) – URL form points to CDE definition page – used in XML standards White River Computing / UNC Chapel Hill

  14. EDRN Knowledge Environment Public Portal BIOINFORMATICS TOOLS EDRN science data results (protocol_id, participant_id) (protocol_id, participant_id) eCAS Science Warehouse ERNE Participant DB Participants and their characteristics Protocol DB EDRN science data results (local, distributed and varying degrees of validation) Distributed Specimen Databases Protocols and their descriptions (protocol_id, participant_id) VSIMS CDE Repository (protocol_id) Biomarker_DB • Descriptions of EDRN studies • Participants • Specimen tracking, etc Descriptions of biomarkers and their use (protocol_id) Data elements and their descriptions

  15. EDRN Knowledge Environment Success? • Biomarker Database holds 850 curated biomarkers, including panels/ signatures of biomarkers • Biomarker Database modeled to reflect the data model: activity in multiple organs, protocols, data files – facilitate single-point data access • eSIS contains 165 protocols • eCAS holds 56 data sets, with many files in each set, and more added daily – standard metadata around each set and each product • Two bioinformatics tools implemented: Proteomics “pipeline” (generating standardized biomarker identification files); REDCap(standardized data definition and capture at the project level) – additional in progress • Common Data Elements (CDEs) contributed to the NCI repository • CDE has a Uniform Resource Identifier (URI) – URL form points to CDE definition page – used in XML standards • Portal facilitates authorized access to almost 200,000 specimens • Publications and Resources White River Computing / UNC Chapel Hill

  16. EDRN Knowledge Environment Technology • Iterative development • Open Source philosophy and tools • Apache OODT (Object Oriented Data Technology) Software components developed independent of any data model: EDRN’s computing infrastructure can be replicated White River Computing / UNC Chapel Hill

  17. EDRN Knowledge Environment Technology White River Computing / UNC Chapel Hill

  18. Bioinformatics – Goals Supporting science-driven research needs: SHARE White River Computing / UNC Chapel Hill

  19. Bioinformatics – Goals Supporting science-driven research needs: SHARE Geisel School of Medicine at Dartmouth / UNC Chapel Hill

  20. Supporting science-driven research needs: PCORI PPRN Geisel School of Medicine at Dartmouth / UNC Chapel Hill

  21. Opportunity to offer our architecture to PCORnet? Synergy in data model Query across CCFA PPRN network …network of networks? Geisel School of Medicine at Dartmouth / UNC Chapel Hill

More Related