310 likes | 443 Vues
Dr. Paul Lewis, a lecturer in Bioinformatics at Cardiff University, leads research at the Biostatistics and Bioinformatics Unit (BBU) supported by a £1.5 million grant from the Higher Education Funding Council for Wales. His work emphasizes the development of software for microarray analysis, focusing on techniques such as data normalization, pattern discovery, and clustering methods. The BBU aims to enhance bioinformatics resources across Welsh institutions and offers postgraduate programs in bioinformatics, genetic epidemiology, and innovative data analysis tools.
E N D
Dr Paul Lewis • Lecturer in Bioinformatics • Cardiff University • Biostatistics & Bioinformatics Unit
Biostatistics & Bioinformatics Unit (BBU) • Bioinformatics resource for Institutions across Wales • Backing of the Higher Education Funding Council for Wales • - £1.5 million grant through the Research Capacity Development Fund • 13 new posts in statistics & bioinformatics • UWCM, Cardiff University, Aberystwyth • MSc/Postgraduate Diploma/Postgraduate Certificate: • Bioinformatics • Genetic Epidemiology and Bioinformatics
Brief Overview of Microarray Bioinformatics • Introduce My Microarray Research Interests • My Microarray Analysis Software
Bioinformatics in Microarray Experiment Differential Gene Expression Experimental Design Pattern Discovery Annotation Hybridisation Class Prediction Data Normalisation
Normalization Remove non-biological influences on data (systematic variation) • 3 categories of Normalisation • Normalisation – transform data to make more like a normal distribution • log, lowess, linlog • Standardisation – expand or contract distribution so data from • different experiments can be compared • calculate Z-scores • Centralisation – move distribution so its centered around expected mean • mean / median / mean trimmed centering
Bioinformatics in Microarray Experiment Differential Gene Expression Experimental Design Pattern Discovery Annotation Hybridisation Class Prediction Data Normalisation
Find Differentially Expressed Genes Is fold change significant? With Replicates • Parametric tests • t-test (ANOVA) J. Comput. Biol. 2000 7: 817-838 • Bayesian t-test Bioinformatics 2001 17: 509-519. • Mixture modelling & bootstrapping (SAM) P.N.A.S. 2001 98: 5116-5121 • Regression modelling Genome Res. 2001 11: 1227-1236. • All give similar results but SAM reduces false positives • Non Parametric Tests • Wilcoxon rank sum test Bioinformatics 2002 18: 1454-1461 • Non-parametric t-test Bioinformatics 2002 18: 1454-1461 • Ideal discriminator method Bioinformatics 2002 18: 1454-1461 • low false positive rate but less power
Bioinformatics in Microarray Experiment Differential Gene Expression Experimental Design Pattern Discovery Annotation Hybridisation Class Prediction Data Normalisation
Pattern Discovery & Class Prediction Explore how genes or samples group: Clustering Hierarchical Cluster Analysis HIERARCHY K-Means Self Organising Maps (SOM) PARTITION Fuzzy ART Principal Components Analysis (PCA) Multidimensional Scaling (MDS) REDUCTION Correspondence Analysis (CoA) Assign genes to known groupings: Classification logistic regression neural networks linear discriminant analysis
Partitioning Clustering Methods K-Means & SOM • Need To Tell Methods Number of Clusters • Genes Partitioned into Clusters • What are Relationships Between Clusters?
2D & 3D Mapping Methods Data Projected onto 2 or 3 Dimensions CoA MDS But….What are Cluster Boundaries? PCA
Bioinformatics in Microarray Experiment Differential Gene Expression Experimental Design Pattern Discovery Annotation Hybridisation Class Prediction Data Normalisation
Annotation Online Tools: ARROGANT http://lethargy.swmed.edu/ DAVID http://apps1.niaid.nih.gov/david/ DRAGON http://207.123.190.10/dragon.htm EASE http://apps1.niaid.nih.gov/david/ FANTOM http://www.gsc.riken.go.jp/e/FANTOM/ GoMiner http://discover.nci.nih.gov/gominer/ MatchMiner http://discover.nci.nih.gov/matchminer/ Onto-Express http://vortex.cs.wayne.edu/Projects.html RESOURCERER http://pga.tigr.org/tigr-scripts/magic/r1.pl Affymetrix GO http://www.affymetrix.com Databases: Gene Ontology http://www.geneontology.org/ OMIM http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=OMIM LocusLink http://www.ncbi.nlm.nih.gov/LocusLink/ UniGene http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=OMIM LocusLink http://www.ncbi.nlm.nih.gov/LocusLink/
My Research Interests Pattern Discovery Take - 2D & 3D Mapping Methods Methods - Define Cluster Boundaries Make FUZZY Algorithm Development 2D & 3D Visualisation Tools EAS-I Biologist-Friendly Software Tools
Cluster Boundaries MDS CoA PCA
Fuzzy Clustering • Differs to standard clust by assigning membership of a gene to all clusters • Allows you to see the association of each gene within a cluster • Can calculate the number of clusters in Partitioning methods (Fuzzy ART) • Helps Combine Clusters • Helps to clear Ambiguity
Fuzzy Mapping Add Membership values of each gene to clusters
Fuzzy Partitioning K-Means & SOM
EASI DATA REDUCTION VISUALISATION
EASI BBUnit Microarray Pattern Discovery • Need for Comprehensive Pattern Discovery Software Suite • Fuzzy Data Analysis Suite • Visualisation Tools to explore data • Easy to use • Free • Web based version • Service by BBU • Increase traffic to BBU web site • Establish BBU for microarray • Cross platform
EASI INTERFACE Differential Gene Expression Pattern Discovery Utilities Normalisation • Hierarchical Cluster Analysis • SOM • K-Means • Fuzzy Art • PCA • MDS • CoA • Fuzzy C-Means • Log • Normalise • Mean Centre • Median centre • T test • ANOVA • Regression
INTERFACE EASI
Contact lewispd@cf.ac.uk http://bbu.uwcm.ac.uk
Acknowledgements • Pete Kille • Alan Clarke • Gareth Hughes (EASI team) • Karen Reed (Data) • Lesley Jones (Data, & EASI Collaborator) • BBU