300 likes | 464 Vues
MIND. M odels in d ecision making & d ata @nalysis. Enza Messina and Francesco Archetti. Main Activities. Research Areas Machine Learning Algorithms Probabilistic and Relational Models Optimization Under Uncertainty. World Wide Web Life Sciences Ambient Intelligence Finance.
E N D
MIND Models indecision making & data @nalysis Enza Messina and Francesco Archetti
Main Activities • Research Areas • Machine Learning Algorithms • Probabilistic and Relational Models • Optimization Under Uncertainty • World Wide Web • Life Sciences • Ambient Intelligence • Finance • Applicative Domains Faculty: Francesco Archetti Enza Messina Guglielmo Lulli Post Doc: Elisabetta Fersini Daniele Toscani PhD Students: Ilaria Giordani Cristina Elena Manfredotti Others: Gaia Arosio Irene Sberna Francesca Bargna
Statistical Learning and Relational Data • Traditional learning methods are consistent with the classical statistical inference problem formulation • are independent and identically distributed (i.i.d.) • but do not reflect the real world! • We need a solution able to deal with relationships and with uncertainty in more general terms ProbabilisticModels LearningTechniques ProbabilisticModels LearningTechniques SL SRL Relational Representation
Contact Patient flatten Machine Learning and Relational Data Traditional learning approaches • work well with flat representations • fixed length attribute-value vectors • assume independent (IID) sample • Problems: • introduces statistical skew • loses relational structure • incapable of detecting link-based patterns • must fix attributes in advance
Intell_Jane Diffic_CS101 Intell_George Diffic_CS101 Grade_Jane_CS101 Grade_George_CS101 Intell_George Diffic_Geo101 Grade_George_Geo101 Machine Learning and Relational Data • Bayesian nets use propositional representation • Real world has objects, related to each other Intelligence Difficulty These “instances” are not independent A C Grade B Daphne Koller, 2003
Heterogeneous Information LEARNER Inference Gene Cluster Exp. type GCN4 HSF Lipid Exp. cluster Endoplasmatic Level Probabilistic Relational Models • Integrate uncertainty with relational model • Convenient language for specifying complex models • “Web of influence”: subtle & intuitive reasoning • Framework for incorporating heterogeneous data by connecting related entities (consider also relation uncertainty) • New problems: • Relational clustering • Collective classification • Open Problems: Inference and Learning
Some Applications- Document Analysis - Life Sciences - Ambient Intelligence
Document AnalysisThe Web Case Relational instances representation for enhancing: Web Document Classification Web Document Ranking Enhancing document representation for inducing traditional learning algorithm
♦ document_id class #origin_ref #destination_ref Document AnalysisThe Web Case • Learning Models for Relational Data: • Relational Clustering 1. ConstraintLearning 2. ObjectiveFunctionAdaptation • Relational Classification: • Probabilistic Relational Models with Relational Uncertainty
Document AnalysisE-Forensics • InformationExtraction • Hearing Summarization • EmotionRecognition JUdicial MAnagement by Digital Libraries Semantics
Recent Publications Journal Papers E. Fersini, E. Messina, F. Archetti, A probabilistic relational approach for web document clustering, to appear in Journal of Information Processing and Management. E. Fersini, E. Messina, F. Archetti,Enhancing Web Page Classification using Visual Block Analysis, to appear in Journal of Information Processing and Management. Conference Papers F. Archetti, G. Arosio, E. Fersini, E. Messina, Emotion recognition in judicial domain: a multilayer SVM approach, Lecture Notes in Artificial Intelligence, Machine Learning and data Mining, Lipsia 2009. E. Fersini, E. Messina, F. Archetti, Probabilistic relational models with relational uncertainty: an early study in web page classification, IEEE WI-IAT Workshop, 2009. F. Archetti, G. Arosio, E. Fersini, E. Messina, Audio-based Emotion Recognition for Advanced Automatic Retrieval in Judicial Domain, Proc. ICT4JUSTICE, 1st Int. Conf. on ICT Solutions for Justice, Greece, 2008. F. Archetti, E. Fersini, E. Messina, Granular modeling of web document: impact on information retrieval systems, Tenth International Workshop on Web Information and Data Management – WIDM 2008 F. Archetti, E. Fersini, P. Campanelli, E. Messina, "A Hierarchical Document Clustering Environment Based on the Induced Bisecting k-Means" LNCS Flexible Query Answering Systems, 2006.
Relational clustering • Constraint Learning • Modify distance measure in clustering objective function Find a partition of a given set of instances using additional information coming from instances relationships. SEMI-SUPERVISED LEARNING METHOD where relations can be represented by pair-wise constraints on some of the istances (specifying whether two istances should be in same or different cluster) 14
Systems Biology Applications Learning gene regulatory networks Gene DNA Human cancer Control Coding + Transcription TF Gene expression Drug Activity RNA single strand Regulatory modules Modelling the pharmacology of cancer Gene drug interaction identification of a drug treatment for a given cell line based both on drug activity pattern and gene expression profile Collaborations
Recent Publications Journal Papers E. Messina, M. Sanguineti eds, Special Issue on OR and data mining for biological data, Comuters and OR, to appear. F. Archetti, I. Giordani, L. Vanneschi, Genetic Programming for Anticancer Therapeutic Response Prediction using the NCI-60 Dataset to appear in Computer and operations Research, 2009. L. Vanneschi, F. Archetti, M. Castelli, I. Giordani, Classification of Oncologic Data with Genetic Programming to appear in Journal of Artificial Evolution and Applications, 2009. G. Lulli, M. Romauch: A Mathematical Program to Refine Gene Regulatory Networks, Discrete Applied Mathematics, 157 (10), 2009. F. Archetti, S. Lanzeni, E. Messina, Graph Models and Mathematical Programming in Biochemical Networks Analysis and Metabolic Engineering Design, Computers & Mathematics with Applications, Vol. 55, n. 5, pp. 970-983, 2008. S. Lanzeni, E. Messina, F. Archetti, Towards metabolic networks phylogeny using Petri Net-based expansional analysis, BMC Systems Biology 2007. Conference Papers F. Archetti, I Giordani, D. Mari, E. Messina, G. Ogliari, A Systems Biology Approach to oral anticoagulation therapy, Systbiohealth Symposium,2008 I. Giordani, L. Vanneschi, E. Fersini. “Modelling the Relationship between the Microarray Data of the NCI-60 Anticancer Dataset with Therapeutic Responses by Genetic Programming”. SysBioHealth Symposium (ISBN: 978-88-903154-0-4), 2007. E. Fersini, C. Manfredotti, E. Messina, F. Archetti. “Relational Clustering for Gene Expression Profiles and Drug Activity Pattern Analysis”. SysBioHealth Symposium (ISBN: 978-88-903154-0-4), 2007. F. Archetti, S. Lanzeni, E. Messina, L. Vanneschi, Genetic Programming for Computational Pharmacokinetics in Drug Discovery and Development. Genetic Programming and Evolvable Machines, vol 8 (4), 2007. F. Archetti, S. Lanzeni, E. Messina, L. Vanneschi "Genetic Programming and other Machine Learning approaches to predict Median Oral Lethal Dose (LD50) and Plasma Protein Binding levels (%PPB) of drugs" Lecture Notes in Computer Sciences, EvoBIO 2007. Submitted Papers Archetti, Giordani, Messina, Mauri,A new clustering approach for learning transcriptional regulatory networks, submitted to Int. Journal of Data Mining and Bioinformatics. F. Archetti, S. Lanzeni, G. Lulli, E. Messina A mathematical model for optimal functional disruption of biochemical networks, submitted to Journal of Mathematical Modelling and Algorithms. E. Fersini, C. Manfredotti, E. Messina, F. Archetti Relational K-Means for Gene Expression Profiles and Drug Activity Pattern Analysis, submitted to Int. Journal of Mathematical Modelling and Algorithms.
Pharmacogenomics Application: Predict drug response to oral anticoagulation therapy (OAT) Grouping (Profiling) patients based on their clinical and genotypic features in order to suggest the correct drug dosage Haemorragic risk Thrombotic risk • Data on more than 1000 patients: • Clinical and therapeutical data: personal patients data, medical diagnosis, therapy, INR and dosage measurements • Genetic data: polymorphism of two genes: CYP2C9 and VKORC1 that contribute to differences in patients’ response. In collaboration with . 17
Inference and Decision Problems StateEstimation ActionSelection observation action belief Dynamic State Space Model State: a vector of variables some of which are not observable Transition Model p(xt|xt-1,at) Observation Model p(zt|xt) A set of possible actions given a belief state distribution Tracking the (hidden) state of a system as it evolves over time from sequentially arriving (noisy or ambiguous) observations
Multi-target tracking Multi-target tracking: finding the tracks of an unknown number of moving targets from noisy observations. Track: sequence of “States” travelled by a target need to be estimated (we’ll deal with on-line problems). Requires Data Association: PF tracking objects individually, lack a consistent way to resolve the ambiguities that arise in associating object with measurements • Exploiting relations can improve the efficiency of the tracker • Monitoring relations can be a goal in itself We model the transition probability of the system with a RDBN. In collaboration with
The main research topics we propose: A new representation modelling not only objects but also their relations (i.e. exploiting relations can improve the efficiency of the tracker). A new computational strategy based on a family of Sequential Monte Carlo methods called RelationalParticle Filter Statistical techniques for the detection of anomalous behaviours 21
Wireless Sensor Networks CH5 sink CH2 CH1 CH4 BN CH3 WSN 22 • Bayesian abstractions for virtual sensing through low cost data aggregation and net-wide anomaly detection • Modelling Cluster Heads as nodes of a BN • Inference to know sensor values also in presence of temporary faults: • Lack of communication (sensor failure or sleep) • Outlier due to sensor malfunctioning
Transportation & Logistics Lu f Pj f v h j destf u origf w k Models Decisions Data In collaboration with:
Recent Publications Journal Papers F. Archetti, M. Frigerio, E. Messina, D. Toscani, IKNOS - Inference and Knowledge in Networks of Sensors, to appear on Int. Journal of Sensor Networks, 2009. F. Chiti, R. Fantacci, F. Archetti, E. Messina, D. Toscani, An integrated Communications Framework for Context aware Continuous Monitoring with Body Sensor Networks, IEEE Journal on Selected Areas in Communications, Vol.27, No.4, pp. 379-386, 2009. P. Dell’Olmo, A. Iovanella, G. Lulli, B. Scoppola, Exploiting Incomplete Information to manage multiprocessor tasks with variable arrival rates, Computers and Operations Research, Vol. 35, no 5, 2008. G. Andreatta, G. Lulli, A Multi-period TSP with Stochastic Regular and Urgent Demands, European Journal of Operations Research, 2008. D. Bertsimas, G. Lulli, A. Odoni, The ATFM Problem: An Integer Optimization Approach, Integer Programming and Combinatorial Optimization, LNCS 5035, 2008. K.F. Doerner, W. J. Gutjahr, R.F. Hartl, G. Lulli, Stochastic Local Search Procedures for the Probabilistic Two-Day Vehicle Routing Problem, Advances in Computational Intelligence in Transportation and Logistics (A. Fink, F. Rothlauf Eds. )- Springer Series on Studies in Computational Intelligence, pp. 153-168, 2008. G. Lulli, S. Sen ,A Heuristic Algorithm for Stochastic Integer Program with Complete Recourse, European Journal of Operations Research, 2006. Conference Papers C. Manfredotti, Modeling and Inference with RDBNs, Canadian Artificial Intelligence Conference, Graduated Student Symposium, May, 2009. C. Manfredotti, E. Messina, F. Archetti.Improving Multiple Traget Tracking with RDBNs, working paper presented at AIROWinter 2009, International Conference of the Italian Operations Research Society, January, 2009. F. Archetti, E. Messina, D. Toscani, M. Frigerio, KOINOS - Knowledge from observations and inference in networks of sensors, Proceedings of IASTED International Conference on Sensor Networs, 2008. F. Archetti, C. Manfredotti, M. Matteucci, E. Messina and D. G. Sorrenti, Multiple Hypotesis Markov Chains For On-Line Anomaly Detection in Traffic Video Surveillance, Proceedings ICDP 2006: Imaging for Crime Detection and Prevention, 13-14 June 2006. F.Archetti, C.E. Manfredotti, E. Messina, and D. G. Sorrenti foreground-to-ghost Discrimination in Single-difference Pre-processing, Lecture Notes in Computer Science: Advanced Concepts for Intelligent Vision Systems, ACIVS’06, 263-274, 2006. Submitted Papers D. Toscani, F. Archetti, E. Messina, M. Frigerio, F. Chiti, R. Fantacci. SIFNOS – Statistical Inference and Filtering in Networks of Sensors. Submitted to IEEE Journal on Selected Areas in Communications - Simple WSN Solutions, 2009.
Ambient Intelligence Currently active Projects LENVIS - Localised environmental and health information services for all (EU-FP7) LIMNOS Logistics and Informatics for Mobility and Network OptimiSation (MIUR) In collaboration with SAL Lab. INSYEME – Integrated Systems for Emergencies (MIUR - FIRB) GREIS - Gestione del Risparmio Energetico attraverso Informazioni di Sicurezza (MIUR) In collaboration with NOMADIS Lab. H-CIM Health Care through Intelligent Monitoring (MIUR)
Dynamic State Space Models for Scenario Generation • Regime Switching Models • Observations: prices • Hidden var.: Regime Transition Model Markov Chain Observation Model Mixture of Gaussians (Autoregressive Process) (Autoregressive) Hidden Markov Model Recent Publications Messina, E., Toscani, D., Hidden Markov models for scenario generation, IMA Journal of Management Mathematics, Vol. 4, pp. 379-401, 2008. 27
Perspectives • Extend state space models to more general Relational Dynamic Bayesian Networks to account not only prices but also “exogenous” economic factors and unstructured information • Algorithms for managing risk tracking portfolio using all available evidence and taking into account all uncertainties Markets are good at gathering information from many heterogeneous sources and combining it appropriately, the same we would expect from models Projects & Collaborations PRIN 2007 ”Probabilistic Models for representing uncertainty in portfolio optimization problems” (with Università di Bergamo and Università della Calabria) Collaboration with Brunel University and CARISMA Research Centre.
A cooperation network for research projects and student mobility CARISMA Research Center Norwegian University of Science and Technology Brunel University University of Toronto Aachen University Hungarian Academy of Sciences Massachusset Institute of Technology Centre of Research and Technology Hellas • TXT e-Solutions • Siemens • Project Automation • Aegate Ltd • OptiRisk • Astra Zeneca • DELOS • Comerson