230 likes | 362 Vues
DSL for Pedigree Rearrangements. CSI5112 Software Engineering Team: Andrei Anisenia Margi Fumtiwala. Agenda. DSL overview Goals of DSL Tool support for the DSL DSL creation technology Sample usage Foreseen impact of language evolution Potential for analysis
 
                
                E N D
DSL for Pedigree Rearrangements CSI5112 Software Engineering Team: Andrei Anisenia Margi Fumtiwala
Agenda • DSL overview • Goals of DSL • Tool support for the DSL • DSL creation technology • Sample usage • Foreseen impact of language evolution • Potential for analysis • Conclusions
DSL Overview • Domain Specific Language (DSL) is a specific language being used to solve problems in a particular domain • It is not intended to work outside that domain • Contains very specific goals in design and implementation
DSL for Pedigree Rearrangements • Genetic analysis of biological data is one of the most important research directions in modern bioinformatics • The data to be analyzed is supplied in different textual formats by hospitals/researchers and may be analyzed by different bioinformatics tools • Our focus is on the type of genetic data called pedigree data presented in so-called PEDFILES • Typical pedigree data is presented by the family structure • Includes persons and child-parents relations between the persons
A simple pedigree example Graphical representation of the pedigree data Textual representation of the pedigree data - PEDFILES • Each line represents Pedigree data and Biological data for a specific person
Why do we need a graphical tool? • Usual work flow of a bio informaticians involve extensive editing of PEDFILEs • Initial PEDFILEs received from hospitals may include too big pedigrees or too much biological data in order to be analyzed by existing software • This problem requires splitting PEDFILEs into smaller sub pedigrees or/and removal of different persons from a pedigree in order to make it analyzable • The rearrangement of a pedigree requires multiple point changes within pedigree data • It is not feasible • Bio informaticians thus need some visual graphical environment for presentation and rearrangement of pedigrees
Scope of tool support for the DSL • Provides visual graphical environment for presentation and rearrangement of pedigrees • Provides the possibility to save the constructed and rearranged pedigreesto textual PEDFILEs
Overview of the approach • Meta-model: Construction and development of objects, relations, constraints and actions for modelling a predefined class of problems. • In the case of Pedigree Rearrangements problem the meta-model is defined as the set of the following: • Objects: • Person – represents single person in a pedigree. Holds the following data: name (unique within a pedigree), person's sex and biological data • Pedigree – a set of persons and relations among them. Pedigree has unique id (Pedigree ID) • Relations: • Person has a father • Person has a mother • Pedigree consists of persons
Constraints • Person can have maximum one father and one mother • Person can be associated to exactly one pedigree • Person can not be connected to persons in another pedigree • Adding a new relation of type child-parents shouldn't create directed cycles in a pedigree • In simple words, circular relationships can not exist between persons, ex: mother can not be a child....
Example – input PEDFILE Ped1 grandfather1 0 0 1 1 2 1 1 2 2 1 2 3 Ped1 grandmother1 0 0 2 1 1 1 2 3 4 2 12 3 5 Ped1 grandfather2 0 0 1 1 2 3 2 2 2 2 2 2 2 2 Ped1 grandmother2 0 0 2 4 4 4 4 423 2 3 2 4 Ped1 grandfather3 0 0 1 1 2 1 2 4 5 6 3 2 Ped1 father grandfather1 grandmother1 1 12 1 3 4 56 4 3 56 7 8 Ped1 mother grandfather2 grandmother2 2 1 2 4 5 6 3 4 2 45 32 Ped1 stranger grandfather3 grandmother2 1 2 3 4 5 1 3 4 6 7 8 56 Ped1 mother2 grandfather3 grandmother2 2 1 2 3 4 6 4 5 6 7 8 Ped1 child1 father mother 1 1 2 1 2 1 2 1 2 1 2 Ped1 child2 stranger mother2 2 1 2 34 54 6 7 8
Example – output PEDFILE Ped1 grandfather1 0 0 1 1 2 1 1 2 2 1 2 3 Ped1 grandmother1 0 0 2 1 1 1 2 3 4 2 12 3 5 Ped1 grandfather2 0 0 1 1 2 3 2 2 2 2 2 2 2 2 Ped1 grandmother2 0 0 2 4 4 4 4 423 2 3 2 4 Ped1 father grandfather1 grandmother1 1 12 1 3 4 56 4 3 56 7 8 Ped1 mother grandfather2 grandmother2 2 1 2 4 5 6 3 4 2 45 32 Ped1 child1 father mother 1 1 2 1 2 1 2 1 2 1 2 Ped2 grandfather3 0 0 1 1 2 1 2 4 5 6 3 2 Ped2 grandmother2 0 0 2 4 4 4 4 423 2 3 2 4 Ped2 stranger grandfather3 grandmother2 1 2 3 4 5 1 3 4 6 7 8 56 Ped2 mother2 grandfather3 grandmother2 2 1 2 3 4 6 4 5 6 7 8 Ped2 child2 stranger mother2 2 1 2 34 54 6 7 8
Model-to-text (M2T) transformation technique used • The Model to Text (M2T) transformations focus on the generation of textual artifacts from models • New PEDFILEs are generated using Xpand • Xpand is a language specialized on code (text) generation based on EMF models • Provides OCL-like expressions (semantics) with Java-like syntax
Language evolution – GUI adjustments • Different shapes and/or colors for person items according to person sexes • Persons dropped onto the pedigree “canvas” instead of being linked to it by arrows • These require minor changes in GMF components • Considered for the upcoming release on the 18th of April
Language evolution – functionality extensions • Pedigree layout is the crucial parameter in visual analysis of pedigree structure • Different auto-layout algorithms should provide different views of a pedigree structure • It takes a lot of time to build a pedigree from a given PEDFILE manually • Automatic construction of a pedigree by only browsing a PEDFILE would be helpful • These are hard to implement in existing DSL Toolkit and require extensions to the framework
Potential for analysis – Bayesian networks • These statistical networks are used to analyze biological data stored in pedigrees • They are provided as input to different statistical bioinformatics software (Superlink, Gene hunter, Allegro etc.) • BNs are graphs with specific properties, constructed using pedigree structure and biological data • BNs are much more complex to construct than pedigrees (more nodes, more links)
Potential for analysis – Bayesian networks, example pedigree
Potential for analysis – Bayesian networks, transformation • BNs are presented in structured textual files, just as PEDFILEs • Define meta-model for BayesianNetwork • Define model-to-model (M2M) transformation from Pedigree Model to Bayesian Network model • Define model-to-text (M2T) transformation for Bayesian Network • Run BN analysis software and find disease gene locations!
Conclusions • Our tool provides support for graphical representation of biological instances • Existing DSL creation tools do not provide full coverage for all DSL editor needs • Data to be manipulated is very complex, therefore automation of transformations between different models becomes essential • Software Engineering should enhance the ongoing research in biology, otherwise it is not feasible due to exponentially growing amount of biological data