GSK's Evolving Approach to Clinical Data Standards and SHARE Implementation

SHARE ESUG Teleconference on 22-Mar-2011

What am I going to cover? • GSK’s current approach to standards and the need for change • Our plans / ongoing work and the similarity to CDISC SHARE • Information Model Technicalities (will probably skip) • Making the information model real … • SHARE content versus GSK content • What do you have to do in order to gain maximal benefits from share? • Flexibility in practice • Creating an eCRF • Slide pack on BRIDG and ISO21090 [included in the slide pack but will not be covered]

GSK’s Current Approach to standards and the need for change

What is the current GSK approach? • Current approach to standards is based on standard dataset definitions which combine terminology, rules and structure • The standards processes are managed through a Lotus Notes database solution and are made available to teams through multiple electronic solutions (an in-house Dataset Manager tool, a an in-house study specification tool, InForm libraries etc) • Standards are available at both the global (core – all therapy areas) and therapy area levels. Some standards have been defined at indication level within the therapy area standards • We align standard objects (CRF’s, data extraction programs, statistical displays, algorithms etc) to standard dataset definitions – a general rule is one eCRF module/page per dataset • Lots of documentation, but not integrated with the standards • Study teams are required to apply for changes or exemptions when they need to do something different for captured data

Some of our current issues • GSK standards, based on SDS 2.1 (the predecessor to SDTM) have limitations • duplicate variables and datasets • ambiguity (what is this?; how am I meant to use this?) • different datasets employ different structures … hard to become familiar • Data Management and Stats want different data structures in order to do their work • little opportunity for automation • hard to aggregate and reuse data other than the core standards (AEs, labs, vitals etc) • Lots of problems mapping our standards to SDTM • extra variables which don’t fit the domains • multiple different uses for an individual variable (some subtle differences but others not so subtle) • SDTM seen as an add-on deliverable … we don’t want to build our standards and tools around it • not an operational standard • doesn’t fit with our current complex toolset • doesn’t seem to fit with ADaM or our reporting process/macros • doesn’t do much to help with data aggregation • Standards too tied to our toolset • hard to automate across the study process • painful whenever a tool is replaced

Drivers for change • Regulatory requirements for clinical data are changing • new FDA requirements (i.e. CDISC) on their way • uncertainty about the future (e.g. HL7 v3) • Need to be able to share data more easily with development partners • We need more flexibility in using standards (from the study & project team perspective) whilst maintaining/increasing the benefits of standardisation • Want to minimise the effort associated with transforming data to standards, or using more than one standard • Need a less complex clinical computing environment/toolset • Need to be able to do more work with fewer resources • Currently replacing most of our clinical trial toolset … if we are going to change our standards, we have to change them now

GSK Long term vision Regulatory/legal/public mandate: • GSK is well prepared to provide regulators and others with the data they require, in the format required • always able to respond to regulatory queries quickly Operational efficiency: • increase operational efficiency through the implementation of a metadata driven approach • provide study teams with the flexibility to capture and process the data in an optimal way (study teams to have the ability to decide on structure and grouping of their data) • variables much more clearly defined: less ambiguity, less confusion Data Reuse: • ability to combine and analyse data across studies, indications and broader with little effort Traceability: • ability to trace all the way back from a result in a clinical report (e.g. a mean value or a p-value) to the value that was first entered in the CRF/eCRF … with an understanding, at each step, of what data/variables were used and what algorithms were applied

Our Plans / Ongoing work and the similarity to cdisc Share

So what are we doing? • Long term, we want to use SHARE content • Cannot wait for SHARE before changing our standards as we’re replacing systems now • Developed an Information Model which all our standards will follow, together with an implementation plan for this • standards being developed independently of our systems • new systems built to work with / take advantage of the new standards • Critically, our information model is based on the same industry standards as the SHARE information model

So what are we doing? • Metadata driven approach to developing, executing and reporting clinical trials • eProtocol tool • metadata repository • many systems consuming the metadata: eCRF tool, reporting tools … • Metadata Repository • structured based on our information model • houses all the clinical data definitions • houses operational metadata (information needed to create eCRFs, datasets, SDTM datasets etc)

Information Model Technicalities

Information Model Details • The information model is a combination of three industry standards: • the BRIDG model (a collaborative piece of work between CDISC, HL7, FDA and the US National Cancer Institute (NCI) • the ISO21090 datatype standard (applicable across Healthcare, not just regulated clinical research) … very similar to the HL7 abstract datatypes • the ISO11179 metadata registry standard

Simple explanation of these 3 standards BRIDG is a standard way of representing the world of clinical research • it doesn’t take us right down to variables, but it does take us down to meaningful objects such as “anatomic location”, “result”, “date” etc ISO21090 datatypes are a standard way of representing particular types of data • these take us from the BRIDG meaningful objects such as “result” to individual variables like “value”, “unit”, “code” The link between BRIDG and ISO21090 is that all the BRIDG meaningful objects have an ISO21090 datatype ISO11179 is a standard way of recording metadata in a metadata registry • we want to be compliant with it, but it isn’t something that operational folk need to understand or worry about

Sources of Information • BRIDG site: http://www.bridgmodel.org/ (we are using 3.0.3) • ISO21090 standard: http://gforge.hl7.org/svn/hl7v3/trunk/dt/iso/index.htm (logon with username= anonymous and blank password) … the 2011 published version is on the ISO website • Enterprise Architect is the modelling software used by BRIDG. Here is a link to a free viewer: http://www.sparxsystems.com/bin/EALite.exe • I have included a simple to understand slide set on BRIDG and ISO21090 (15 easy slides) at the bottom of this slide pack for those who want to understand more

Making the information model real …

What does this Information Model approach give us? • A well developed modelling of clinical research … there shouldn’t be anything missing • so we model clinical data in a consistent and formalised manner • A templated approach to the development of our standards • we end up selecting variables from a short list rather than manually creating them

And the usual inevitable downsides? • BRIDG model is complicated • but this is because clinical research and clinical data are complicated • use of a templated approach to implementation removes much of the complexity • you do need to train people (as always) • you need to take advantage of the capabilities to reap the biggest benefits • ISO21090 datatype standard has been accused of being too complicated • without tools to help you, I’m sure that is true • but it is the complexity that allows the development of a templated approach to standards creation • you need to train people … but mainly with regards to choices they have to make

So what does content look like? Fasting status indicator value = true Date Range low value = 23-Apr-2010 Concepts: BRIDG based modelling of the clinical data Blood Specimen Collection is a result of Accession Number Text value = 01876288485 Condition Code item code = CC51 display name value = haemolysed Blood Specimen is a test performed on Category Code code = HAEM display name value = Haematology BRIDG based associations between concepts (wording in blue describes things from the bottom up) Haemoglobin Test is a result of Result value = 151 unit = g/L Haemoglobin Result

So what does content look like? Fasting status indicator value = true Date Range low value = 23-Apr-2010 Blood Specimen Collection Concept attributes from BRIDG Is a result of Accession Number Text value = 01876288485 Condition Code item code = CC51 display name value = haemolysed Blood Specimen ISO21090 decomposition: “pre-variable attributes” Is a test performed on Category Code code = HAEM display name value = Haematology Haemoglobin Test ISO21090 decomposition: variables (shown with example values) Is a result of Result value = 151 unit = g/L Haemoglobin Result

What we get from the metadata • Concepts – clear definitions of clinical information (e.g. height, systolic blood pressure, weight result) • Associations – how the concepts connect together, rules for the use of concepts • BRIDG attributes – meaningful attributes for a piece of clinical data (e.g. method, date, anatomic site, result) … some may have codelists • ISO21090 decomposition: “pre-variable attributes” – various levels of clumping of variables; some may have codelists • Variables – clear, model based, unambiguous variables

Steps needed to create that information? • Choose which clinical scenario template we need (in this case, one containing specimen, lab test & lab result) • Enter information about each concept (a name, a description, a definition …) • Choose which of the BRIDG attributes we will need • Choose which associations are needed • Choose which bits of the ISO21090 decomposition we need • Enter the name of codelists when prompted (and select the set of codes in that codelist that you want to make available for this concept)

SHARE content versus GSK content

What we do expect from SHARE • We expect SHARE to provide us with these model based definitions (the concepts, concept attributes and decomposition together with the associations between concepts and the terminology) • We expect SHARE to provide us with the information needed to represent these definitions in the form of SDTM domains • There will be a SHARE metadata repository • GSK expect to import all the SHARE metadata into the GSK metadata repository

What we don’t expect from SHARE • We do not expect SHARE to provide us with all the rules that GSK will want to apply • We do not expect SHARE to provide us with all the operational metadata we need to create study objects (GSK datasets, GSK eCRFs) • GSK expect to add additional metadata to the GSK repository … we want to augment the SHARE content, not change it

Choices • Just use the SHARE variables and forget about the rest of the metadata • you get consistent industry standard variables • you can keep your own processes • but you may not use the variables in such a way that you can aggregate your data with that of others • you miss out on the additional benefits • Use the SHARE metadata to the full and augment with additional company metadata [the GSK approach] • you get all the benefits of using the SHARE metadata • you get additional capability to automate downstream processes

Creating a GSK standard using SHARE content • Rules … • define which variables are mandatory, optional, conditional in a study specification • define the conditionality rules e.g. either have to include variables for total daily dose/dose units or dose/dose unit and frequency • define which variables have to be populated if used in a study • (in fact, we may apply rules to associations, BRIDG based attributes, “pre-variable attributes” and codes as well as to variables)

Rules example: Subject Disposition Tick this … … and you MAY tick none, one or many of these Tick this … … and you MUST enter text here If the study includes pre-specified subreasons, an “other specify” subreason MUST be included and, if ticked, MUST be populated If the study does not include subreasons, the “specify” MUST be included and populated We should not expect SHARE to deliver these company specific rules

What extra metadata would we add? • Mappings from other standards to concepts & concept variables • legacy data • development partners • Mappings from SHARE terminology to GSK terminology and vice versa (mapping codes) • we want to use SHARE terminology as much as we can but there are always going to be cases where, for some reason, we need to deviate

Central role of concept metadata eCRF Non-GSK metadata Render as an eCRF mapping to concepts Concept Definitions Render in registry form Registry mapping to concepts GSK legacy metadata Render as SDTM SDTM

Central role of concept aligned data Non-GSK data Represent as SDTM dataset SDTM map data using metadata Concept aligned data Aggregations “Aggregate anything” map data using metadata Represent as registry format dataset GSK legacy data Registry

So what operational metadata would we add? • Metadata needed to render the definitions in a particular form e.g. an eCRF, a GSK dataset • length and precision for variables • whether a coded field should be represented as a drop down box or a radio button • and more • A study specification

Setting Up Studies • For each study, we will produce a fully detailed study specification • We will be doing this using the BRIDG modelling • key to taking full advantage of the concept metadata • This will be done at a fully detailed level • including which variables will be collected at which visits/timepoints • including which set of codes are available for use at that visit/timepoint (when codelisted) • all the inherent structure of the metadata will be utilised to the full

Setting Up Studies • Benefits of utilising the BRIDG trial design modelling • the study time and events are modelled using study design concepts1 and data collection concepts which makes for a fully integrated approach • BRIDG modelling provides metadata/data driven navigation capability, guiding study investigators through sometimes very complex study procedures • We can use the richness of the metadata included in the study specification to help with the creation of operational objects 1 Study design concepts include visits, timepoints, cycles, arms, epochs, treatment strategies & elements

SDTM • We expect to get totally consistent SDTM “for free” • concepts are associated with SDTM domains • concept variables are generated from BRIDG and ISO21090 • we expect there to be a mapping from BRIDG attributes/ISO decomposition to SDTM variables • We expect to standardise/eliminate the inherent SDTM wiggle-room through this process

What do you have to do in order to gain maximal benefits from share?

Important Actions • Always maintain a link back from operational objects to the SHARE definitions • Use the SHARE objects right from the design stage of a study • Augment the SHARE metadata with company specific metadata, for example • rules (e.g. use this object or that object but not both) • additional metadata to permit automation of eCRF screens (somewhat tool dependent)

Flexibility in Practice

System independent standards which are not tied to specific objects (e.g. dataset) This is the GSK standard for a dataset … Any variation requires an exemption or a new standard In the new standards each coloured block is a “standard” or “building block” and they can be combined in different ways to make objects (e.g. datasets).

Flexibility In Practice – Dataset Content An AE eCRF screen may look like this … With the new standards it can also look like this … There will still be standard objects (e.g. datasets) to provide the benefits of standardisation but also more flexibility (fewer exemptions required)

Flexibility In Practice – Datasets An existing GSK dataset may look like this … With the new standards the same data can also look like this … Or this … Or this … Or this … It all comes from the same building blocks (no exemptions required)

Flexibility In Practice – Transforming Non-Standard Data CDISC SDTM datasets GSK standard Without building blocks … Partner standard CDISC ADaM datasets Vendor X GSK Operational datasets … 9 mappings required With building blocks … CDISC SDTM datasets GSK standard Partner standard CDISC ADaM datasets Vendor X GSK Operational datasets … 6 mappings required In-licensed Compound New regulatory requirement … 1 new mapping … 1 new mapping

Creating an E-CRF

Creating a smart eCRF • SHARE will provide metadata about clinical information • SHARE will provide multiple levels of clumping of objects e.g. • value and unit • test and test result • albumin test is done using serum specimen • Your company will add additional metadata to create company-specific standard combinations of the SHARE content e.g. • either total daily dose object will be used or single dose object + dose frequency object will be used (but not both) • Your company will add additional metadata to indicate whether repeat values are allowed • only one primary reason for discontinuation is allowed (and must be provided) but multiple sub-reasons are permitted (and it is OK not to choose any) • Your company will add additional metadata and/or define rules to facilitate the automation of eCRF creation e.g. • represent this codelist as a radio button if it has less than 6 possible values and as a drop-down if it has 6 or more possible values • Some metadata will need to be created at a study level e.g. • is this a collected field or a hard coded field

Creating a smart eCRF • Two component parts • creating individual pages … need metadata to: • differentiate between hard coded information and collected information [study level metadata] • drive pop-ups (e.g. pregnancy test details if subject is female) [company and/or study level metadata] • allow repeat fields (e.g. medical history) [study may deviate from company level rule] • rules (get investigator to confirm values that are outside certain limits) [company and/or study level metadata] • navigation though the complete eCRF • general flow • exceptional flows e.g. if a particular event occurs, additional tests/visits necessitated [BRIDG contains functionality to record this as computable metadata]

Will not cover the following slides during the training They are for people to view after the meeting

Two industry standards: BRIDG and ISO21090 A simple explanation of what these are and what they provide

Information Model? • An information model is a combination of structure and nomenclature • modelling the structure of data • employing a set of terms to describe the objects • A good information model will ensure that nothing is glossed over and that similar things will be described in a similar manner

GSK’s rationale for using BRIDG and ISO21090 • We developed GSK standards with no underlying information model • these have the right content (the info we need in GSK’s clinical trials) • but consistency of approach, avoidance of duplication and ambiguity is not as good as we would like • In 2009 we started to develop an information model based approach to representing GSK’s clinical trial standards, in order to gain bigger benefits from standardisation • our original intention was not to implement BRIDG, but rather to use it as a tool … to guide us • we ran into various issues requiring solutions … some of these we addressed using our own solutions • at year end, we came to recognise that within BRIDG lies all the functionality we need to provide solutions to all our issues • in January 2010, we took the decision to implement BRIDG and an ISO datatype standard as we felt this is the optimal approach • using these we can address all our issues • and, we can develop a solution that will be at least similar to that of SHARE • and we will be using standards employed in the healthcare world

CDISC SHARE Project • In the early days of the SHARE project, it was agreed that SHARE would use the BRIDG model, the ISO21090 datatype standard and the ISO11179 metadata registry standard as its information model • Although SHARE could decide to implement these differently from GSK, currently the GSK and SHARE information models are very similar

BRIDG • An information model • Targeted at protocol driven research • Reasonably mature • Key collaborators: CDISC, HL7, NCI, FDA

GSK's Evolving Approach to Clinical Data Standards and SHARE Implementation

GSK's Evolving Approach to Clinical Data Standards and SHARE Implementation

Presentation Transcript

SHARE THIS!

229H SharePoint 2007 You Share I Share We All Share

SHARE Special Project SHARE Benefits Optimization

Partner Share

SHARE

Partner share

Partner share

Pair Share

Share Matrix

We Share…

SHARe

SHARE

To Share or not to Share?

Brain Share

Share

Share

Market Share: Share of Total Sales vs Relative Share

To Share or Not to Share?

Share Tips

Pair, Share!

Share Tranfer Vs Share Transmission