10 likes | 173 Vues
Features of the AE object model MIAME-compliant able to import MAML-formatted data supports raw and processed data independence of: experimental platforms image analysis methods data normalization methods object model-based query mechanism
E N D
Features of the AE object model • MIAME-compliant • able to import MAML-formatted data • supports raw and processed data • independence of: • experimental platforms • image analysis methods • data normalization methods • object model-based query mechanism • will support upcoming OMG standard for expression data • Key constructs in the AE object model • notion of ExpressionValueSet • structured sample descriptions • several dimensions for ExpressionValues • Transformations working on ExpressionValueSets and their dimensions ArrayExpress – a public repository for microarray data Alvis Brazma, Ugis Sarkans, Helen Parkinson, Alan Robinson, Mohammadreza Shojatalab, Jaak ViloEBI, European Molecular Biology Laboratory Outstation – Hinxton (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK http://www.ebi.ac.uk/arrayexpress/ http://www.ebi.ac.uk/microarray/ The European Bioinformatics Institute is establishing a public repository for microarray based gene expression data ArrayExpress Alvis Brazma, Johan Rung, Ugis Sarkans, Thomas Schlitt, Jaak ViloEuropean Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK Microarrays, one of the latest breakthroughs in experimental molecular biology, are producing considerable amounts of gene expression and other functional genomics data. The handling, storage, and analysis of these data are becoming the major bottlenecks in the utilization of the microarray technology. Storing and annotating these data is not a trivial problem due to many reasons. The raw microarray data are images, which have to be transformed into gene expression matrices -- tables, where rows represent genes, columns represent various samples such as different tissues, and values at each position characterizing the expression level of the particular gene in the particular sample. This process is not a trivial one due to replicate measurements, replicate spots, different oligos reporting information about the expression level of the same gene, problems with sequence homology and potential cross-hybridisation, cross-platform comparisons, and so forth. The high-level gene expression matrices representing genes and respective expression levels, also have to be integrated with other genomic data and analysed further, if any knowledge about the underlying biological processes is to be extracted (see [1]). The European Bioinformatics Institute initiated an international effort to establish standards for microarray data representation, annotation and exchange [2]. Recommendations of MIAME - The Minimum Information About a Microarray Experiment - specify the minimum information that must be reported about a microarray (or any DNA array) based gene expression monitoring experiment in order to ensure the interpretability, as well as potential verification of the results by third parties. An XML based data exchange format - Microarray Markup Language (MAML) is being developed in collaboration with Microarray Gene Expression Database (MGED) Group (see www.mged.org). EBI is establishing a database ArrayExpress, a public repository for microarray data, which will accept data in MAML format. Expression Profiler, a set of online tools for gene expression data analysis has been developed at the EBI and is available for public use (www.ebi.ac.uk/microarray). The analysis software in the Expression Profiler facilitates the clustering, exploration, and visualization of the gene expression data, as well as linking the analysis results to tools and databases elsewhere. Expression Profiler includes tools that assist with the analysis of expression data in connection with other data types. Currently, the DNA sequence data can be analysed and visualized as well as expression data, permitting users to discover, study, and visualize putative transcription factor binding sites [3]. One of the prospects of analysing microarray data is a reverse engineering of gene regulatory networks from gene expression and other genomics data. We have been successfully using our tools for in silico prediction of transcription factor binding sites [3]. Furthermore, we are developing models for describing gene regulatory networks, and use this modelling approach to find insights into the regulation of gene expression in response to the activity of other molecules in the cell as well as extracellular signals. • MIAME • Minimum Information About a Microarray Experiment- to ensure its interpretability and reproducibility defined by the MGED group (www.mged.org) • MGED • The MGED group is an open discussion group established at the Microarray Gene Expression Database meeting MGED 1 (1999). The goal of the group is to facilitate the adoption of standards for DNA-array experiment annotation and data representation, as well as the introduction of standard experimental controls and data normalisation methods. The underlying goal is to facilitate the establishment of gene expression data repositories, comparability of gene expression data from different sources and interoperability of different gene expression databases and data analysis software (www.mged.org) • MAML • MicroArray Markup Language – a data exchange format based on XML able to capture MIAME, developed by the MGED, submitted to OMG in November 2000 by the EBI on the behalf of MGED (www.mged.org) • OMG • Object Management Group – an international organisation providing framework for developing software standards for industry and academia. Life Sciences Research domain task force issued a call for proposals for gene expression data. Rosetta Inpharmatics, EBI and Net Genics responded with two three proposals including MAML (EBI on the behalf or MGED) and GEML (Rosetta) (www.omg.org/lsr/) • MAGE-OM and MAGE-ML • Unification of MAML and GEML for joint OMG standard (www.geml.org/omg.htm) • The state of the art • The ArrayExpress object model for representing microarray data supporting MIAME requirements has been developed [2]. The Rational Rose MDL files and documentation are available on request: from arrayexpress@ebi.ac.uk. The data model is implemented in Oracle (SQL scripts available) • A data loader from the MAML file format (an XML based format developed by MGED) has been implemented. • ArrayExpress can accept a restricted number of data submissions in MAML format due to current limited resources for processing and storage (for enquiries about data loading contact arrayexpress@ebi.ac.uk) • An online tool for microarray data analysis (Expression Profiler) has been developed • Ongoing activities, future plans and expected schedule • New staff are being recruited – visit the EBI stand at the Job Fair • A web-based data query interface is under development (expected release October 2001) • A web-based data submission/annotation tool is under development (expected release October 2001) • EBI participates in developing MAGE OM (MicroArray Gene Expression data Object Model) within the OMG life science research task group. The final submission to OMG is due on 20 August 2001. The submission will include the Microarray Gene Expression Markup Language (MAGE ML) • MAGE OM will replace the current ArrayExpress model and MAGE ML will replace MAML as the data submission format in September 2001 • We expect that ArrayExpress will begin accepting unrestricted data submisssions in MAGE ML format from the beginning of 2002 • Expression Profiler will be linked to ArrayExpress EBI is organising an EMBO course: Analysis and Informatics of DNA-Array Gene Expression data (October 29 – November 3) For more information and applications see www.ebi.ac.uk/microarray/ Prototype of AE query interface Expression Profiler A web based tool for microarray data analysis (see poster by J. Vilo) ArrayExpress model - conceptual design A simplified version of AE object model • References • 1. A. Brazma, A. Robinson, G. Cameron, M. Ashburner. One-stop shop for microarray data. Nature, Vol 403 (2000), 699-700. • 2. A. Brazma, U. Sarkans, A. Robinson, J. Vilo, M. Vingron, J. Hoheisel, K. Fellenberg. Microarray data representation, annotation and storage. Chip Technollgy, in series Advances in Biochemical Engineering/Biotechnology, Springer (in press) • 3. A. Brazma, and J. Vilo, Gene Expression Data Analysis. FEBS Letters, Vol. 480 (2000) 17-24. • 4. J. Vilo, A. Brazma, I. Jonassen, A. Robinson, and E. Ukkonen. Mining for Putative Regulatory Elements in the Yeast Genome Using Gene Expression Data. ISMB'2000 AAAI press (August 2000), 384-394.