REMBRANDT Empowering Translational Research…

REMBRANDTEmpowering Translational Research… REpositoryofMolecularBRAinNeoplasiaDaTa HL7 Clinical Genomics SIG Atlanta, September’04

Agenda • Translational Research – Why do we care? • GMDI – How we got here? • Conceptual Model • Gene Expression Use Case analysis • Gene Expression Data analysis • Wire Frames • System Architecture • Object Model • Data warehouse design

Translational Research – Why do we care? • Iressa Drug Case Study (at Harvard Medical School) • Targeted towards lung cancer • Phase II trial – A minority of patients showed dramatic tumor shrinkage • Phase III randomized trial – No survival improvement. • Patients with mutations in Iressa’s target, EGFR, showed response to the drug. • Pharmacogenomics future is based on translational research Reference: Clinical Pharmacogenomics: Almost a reality; Modern Drug Discovery, August 2004

Scientific goals of GMDI • Develop a molecular classification schema that is both clinically and biologically meaningful, based on gene expression and genomic data from tumors (Gliomas) of patients who will be prospectively followed through natural history and treatment phase of their illness

Expression array data SNPArray data Clinical data Proteomics data Rembrandt Knowledgebase Better understanding Better treatments

REMBRANDT Project Goals • Produce a national molecular/genetic/clinical database of several thousand primary brain tumors that is fully open and accessible to all investigators (including intramural and extramural) • Provide informatics support to molecularly characterize a large number of adult and pediatric primary brain tumors and to correlate those data with extensive retrospective and prospective clinical data

Functional genomics data in the knowledge-base RNA DNA Protein 100K SNP array Tissue Arrays for ISH ArrayCGH Tissue Arrays (IHC) Proteomics (Mass Spec) Gene Expression Analysis Copy No. LOH Affy/Oligo Arrays cDNA/GenePix Arrays Real time RTPCR

Conceptual Model Demographics Prior_Therapy Survival Outcome Time course C3D Patient Trial Pathology User Input Sample CaCore Expr_Expt CGH_Expt SNP_Expt Change_Status Map_Location caArray Abnorm_Status Gene BAC_ID SNP E-value Abnorm_Status Call

REMBRANDT will Leverage NCICB and caBIG Infrastructure Components • Aligns with caBIG principles: • Open source • Open access • Syntactic and Semantic interoperability • Federated data • NCICB Infrastructure • caARRAY gene expression data repositories and analysis tools • Cancer Genome Anatomy Project (CGAP) genomic tools • C3D Clinical Informatics System • caCORE Infrastructure (caBIO, EVS, caDSR) • caBIG Infrastructure being delivered by caBIG workspaces

Typical Rembrandt Search • Show me the tumors (Tumor samples) that have amplification and over-expression of Genes EGFR & Cyclin D1. • Restrict the search to cases with • amplification confirmed by SNP Chip and CGH, • and over-expression confirmed by Oligo and cDNA Arrays • Presentation of Results • Which genes are under-expressed respect to normal? • Do this subset of tumors have a better survival? • Do they segregate to a certain age group, geographical area or ethnicity?

True Measure ofTranslation Research • To present the all DOWN Regulated Genes within each sample in the result set, we have to pivot the result set on its Gene Expression axis. • All Translational Queries should allow the ability to easily pivot between: • Disease View • Patient / Sample View • Experiment/ Annotations View • Time Course View

High-level Search Use cases

Gene Expression Search Use cases

Convert chp files to txt files Using GDAC SDK Calculate ratio of individual tumor intensity to Normal pool Calculate ratio of average intensities between tumor pool and Normal pool Calculate statistical significance for comparison, using Permax R module Gene Expression data analysis Binary chp files from GCOS

cDNA data handling Technical Replicates Pearson Correlation between one spot across all arrays and another spot for the same clone across all arrays For each array, calculate the average of expression measurement Yes Is Correlation > 0.7 No “inconsistent” call is made and no e-value Computed for that clone

UI Wire Frames

Architecture

Object Model • DomainElement • Represents the basic elements involved in translational research space. • All queries, views and presentation objects are composed of domain elements • Provides strong type checking and validations

Database Schema • Star schema • Is a generic, query optimized schema • A star schema consists of Fact tables and dimensions • Provides a highly de-normalized view of the data • Provides a data neutral framework from which queries can be executed with very fast results • Prototype usage will help us validate our approach

Database Schema • Fact Table • Contains key performance indicators • Helps eliminate expensive joins from queries • In the future, if multi-dimensional measures are required, then our schema is extensible to allow us to perform OLAP queries • Dimension • Dimensions are the categories of data analysis • When a report is requested "by" something, that something is usually a dimension. • For example, in a gene expression query, the two dimensions needed are genes (GENE_DM) and samples (BIOSPECIMEN _DM)

Database Schema

Problem we are trying to solve • A typical Rembrandt data portal search: • Show me all tumor samples that have amplification of 13q11.3, deletion of 10p21, D7S522 and the FHIT region confirmed by SNP chips and CGH analysis. • Display regions with LOH for these samples. • Which genes are under-expressed in these tumor samples with respect to normal? • Do this subset of tumors have a better survival? • Do they segregate to a certain age group, geographical area or ethnicity?

To solve this problem • Fact: Cancer develops as a result of Chromosomal aberrations • Duplications • Deletions • Somatic Mutations • We need to measure chromosomal aberrations Chrom N, Copy 1 Chrom N, Copy 2 Complete Loss Duplication LOH

How to measure aberrations? • CGH • SNP Arrays • Have higher resolution than CGH • Analyze chromosomal copy number and genotype in one experiment • SNP arrays help determine the following between normal blood sample and Tumor sample • Heterozygous to Homozygous: Loss of one allele • Heterozygous to No Call: Partial Loss of one allele/No Call • Homozygous to Homozygous: Unchanged/Loss of one allele

Genotype model for Rembrandt • Model basic science • Model SNPs in relation to chromosomal aberrations and as markers on the genome • Model to include annotations and external cross-references • Model Experimental observations • Capture observations such as LOH in relation to SNPs and chromosomal aberrations (CGH data) • Capture expression value for SNP elements on arrays to correlate with DNA copy number

Translational Research use case • The Clinical Genomics model should serve the translational research use case • Model should allow for associations between: • Basic science / molecular observations (Gene expression, SNP, pathway etc) • Clinical science (Prior therapy, outcome, demographics etc) data.

EVS caBIO caMOD caDSR MAGE-OM /caArray Genotype Model Clinical Trial Model Translational Research Space

Next Steps • Reviewing the HL7 Re-usable genotype R-MIM as a starting point to build a clinical genomics object model • Translating the genotype R-MIM into UML to establish relationships and cardinalities between various scientific “observations” • For REMBRANDT, Extending the caBIO Object Model • Developing a data warehouse infrastructure for REMBRANDT to define relevant translational spaces and relationships between them • Future: We plan to merge our clinical objects with the HL7 Clinical model

The Rembrandt Team! Internal Advisors • Ken Buetow • Peter Covitz • Sue Dubman • Mervi Heiskanen • Carl Schaefer • Christo Andonyadis • Scott Gustafson • Sharon Settnek External Advisors • Jean-Claude Zenklusen • Yuri Kotliarov • Howard Fine • Tracy Lugo • Bob Finkelstein • Ram Bhattaru • James Luo • Alex Jiang • Prashant Shah • Ryan Landy • Kevin Rosso • Jyotsna Chilukuri • Dana Zhang • Nick Xiao • Smita Hastak • Himanso Sahni • Subha Madhavan

I am done • Questions

REMBRANDT Empowering Translational Research…