220 likes | 367 Vues
This project focuses on analyzing low replicate data with varying structures and technical variability, specifically in medical and ecological contexts. The study employs PCA (Principal Component Analysis) to distinguish between healthy and sick patients through human biopsy samples. Utilizing Z-scores and PDE modeling, the analysis identifies patterns and important features in the data. The research addresses challenges in sampling and resampling, revealing significant variations in the detected features and their implications on biological evidence and ecological dynamics.
E N D
A blind search for patterns Unravelling low replicate data
Data: Structure and variability • Structure • Between 500-10,000+ features • Each feature has an associate ion count for each sample aligned. • Data is not normally distributed. • Variability • Up to 30% technical variability • Each feature is effected differently
Data: Structure and variability The majority of features that are detected are singletons.
Low Replicate data • “Suck it and see” • One off project • Pump priming projects • Medical samples • Biopsy • Difficult to access • Ecological data • Resampling is difficult
Methods • Finger printing • PCA • Basic scoring • PDE model • Gradient search • Differential analysis
PCA • Very simple • Can be highly informative • Depends on the data • Used in pipeline • Data quality
Bruno Project • Samples : • Human biopsy • Replication – biopsy cut into equal parts PCA Analysis
N group • Non-cancer biopsy • T group • Cancer biopsy PCA Analysis Using PCA clustering we are able to distinguish between healthy and sick patients
PCA Analysis PCA reveled profile similarity which correlated with biological evidence
PCA Analysis • Human Urine project • 22 patients sampled • 11 healthy and 11 sick patients • Sample labels dropped
PCA Analysis Ecological Data Large number of samples without clear replication.
PCA Analysis • Cluster pattern: • Find the features which hold the cluster pattern
PCA Analysis Using PCA and profile similarity analysis subset of features of interest were found
Basic Scoring • Use Z-score to sort data • Use this to pull out important features. • Control – Exp • With two class problem we can use PDE modelling.
Basic Scoring : PDE modelling • Multi class problem • Plants • Wild type • act ko mutant • Treatments • Normal light • High light
Gradient Analysis • Use rate of change of abuandace to • Mine data for spesifc trends • Find features of intrest • Use PDE modelling of rates
Gradient Analysis Mining for features which showed rapid increase due to a specific treatment
Data Provided by: • Ecological data • Dave Hodgson • Nicole Goody • Gradient analysis • John Love • Data scoring • Nicholas Smirnoff • Mike Page • Brno • Ted Hupp • Rob O’Neill • Urine study • Steve Michell • John Mcgrath
Metabolomics and Proteomics Mass Spectrometry Facility @ The University of Exeter http://biosciences.exeter.ac.uk/facilities/spectrometry/ http://bio-massspeclocal.ex.ac.uk/ Nick Smirnoff (Director of Mass Spectrometry) N.Smirnoff@exeter.ac.uk Hannah Florance (MS Facility Manager) H.V.Florance@exeter.ac.uk Venura Perera (Bioinformatics and Mathematical Support) V.Perera@exeter.ac.uk
About me • Background • Applied Maths • Untargeted metabolite profiling • Research interests • Data driven modelling • Small molecule profiling • Gene regulatory network modelling • Application of mathematical methods • Metabolite identification using LC-MS/MS