Normalisieren von Microarraydaten

1. Normalisieren von Microarraydaten Michael Nuhn

2. Motivation F�r das ideale Experiment br�uchte man keine Normalisierung In realen Experimenten gibt es dagegen immer eine gewisse Menge an technischer Variation.

3. Technische Variation Die Me�werte bei einer Hybridisierung variieren jedes Mal ein wenig f�r dasselbe Gen. Das liegt daran, da� die Zellen die Gene nicht immer gleich exprimieren aber auch am Experiment selbst: Die Farbstoffe fluoreszieren unterschiedlich stark. Die Farbstoffe werden unterschiedlich gut in die DNA eingebaut. Einzelne Pins spotten die Oligos etwas anders auf den Slide Intensit�t kann vom Ort auf dem Slide abh�ngen, wenn die Oberfl�che des Slides nicht gleichm��ig beschichtet ist. S. 62 Scanning properties Scanning parameters Labeling efficiency Print-Tip Spatial effectsS. 62 Scanning properties Scanning parameters Labeling efficiency Print-Tip Spatial effects

4. Technische Variation Diese St�rfaktoren sollen auf die statistische Analyse m�glichst keinen Einflu� nehmen. Verfahren, um diese St�rfaktoren zu sch�tzen und herauszurechnen, werden Verfahren zur Normalisierung genannt. Der biologische Anteil am Signal soll dabei m�glichst erhalten bleiben. Die Me�werte der einzelnen Farbkan�le werden daher normalisiert, bevor sie miteinander verglichen werden. Ein h�ufig eingesetztes Verfahren ist die Loess Normalisierung. Folgende Annahme wird dabei gemacht: F�r die meisten Paare von Genen g1 aus Organismus 1 und g2 aus Organismus 2, bei dem g1und g2 ortholog sind, gilt, da� g1 und g2 gleich stark exprimiert werden. S. 62 http://discover.nci.nih.gov/microarrayAnalysis/Affymetrix.Preprocessing.jsp Why Normalize? Biologists have long experience coping with systematic variation between experimental conditions (technical variation) that is unrelated to the biological differences they seek. Normalization is the attempt to compensate for systematic technical differences between chips, to see more clearly the systematic biological differences between samples. Differences in treatment of two samples, especially in labelling and in hybridization, bias the relative measures on any two chips. The performance of expression arrays can vary in more ways than measures such as rt-PCR. Normalization methods that have worked well for these types of measures do not perform as well for microarray data. Affymetrix introduced a new approach for their 133 series chips, using a set of 100 'housekeeping genes': the chips are re-scaled so the average values of these housekeeping genes are equal across all chips. This is much better than using a single housekeeping gene, and probably adequate for about 80% of chips in practice. Most approaches to normalizing expression levels assume that the overall distribution of RNA numbers doesn't change much between samples, and that most individual genes change very little across the conditions. This seems reasonable for most laboratory treatments, although treatments affecting transcription apparatus have large systemic effects, and malignant tumours often have dramatically different expression profiles. If most genes are unchanged, then the mean transcript levels should be the same for each condition. An even stronger version of this idea is that the distributions of gene abundances must be similar. Statisticians use the term 'bias' to describe systematic errors, which affect a large number of genes. eep in mind that normalization, like any form of data 'fiddling' adds noise (random error) to the expression measures. You never really identify the true source or nature of a systemic bias; rather you identify some feature, which correlates with the systematic error. When you 'correct' for that feature, you are adding some error to those samples where the feature you have observed doesn't correspond well with the true underlying source of bias. Statisticians try to balance bias and noise, and their rule of thumb is that it's better to under-correct for systemic biases than to compensate fully. S. 62 http://discover.nci.nih.gov/microarrayAnalysis/Affymetrix.Preprocessing.jsp Why Normalize? Biologists have long experience coping with systematic variation between experimental conditions (technical variation) that is unrelated to the biological differences they seek. Normalization is the attempt to compensate for systematic technical differences between chips, to see more clearly the systematic biological differences between samples. Differences in treatment of two samples, especially in labelling and in hybridization, bias the relative measures on any two chips. The performance of expression arrays can vary in more ways than measures such as rt-PCR. Normalization methods that have worked well for these types of measures do not perform as well for microarray data. Affymetrix introduced a new approach for their 133 series chips, using a set of 100 'housekeeping genes': the chips are re-scaled so the average values of these housekeeping genes are equal across all chips. This is much better than using a single housekeeping gene, and probably adequate for about 80% of chips in practice. Most approaches to normalizing expression levels assume that the overall distribution of RNA numbers doesn't change much between samples, and that most individual genes change very little across the conditions. This seems reasonable for most laboratory treatments, although treatments affecting transcription apparatus have large systemic effects, and malignant tumours often have dramatically different expression profiles. If most genes are unchanged, then the mean transcript levels should be the same for each condition. An even stronger version of this idea is that the distributions of gene abundances must be similar. Statisticians use the term 'bias' to describe systematic errors, which affect a large number of genes. eep in mind that normalization, like any form of data 'fiddling' adds noise (random error) to the expression measures. You never really identify the true source or nature of a systemic bias; rather you identify some feature, which correlates with the systematic error. When you 'correct' for that feature, you are adding some error to those samples where the feature you have observed doesn't correspond well with the true underlying source of bias. Statisticians try to balance bias and noise, and their rule of thumb is that it's better to under-correct for systemic biases than to compensate fully.

5. Loess Normalisierung Ein Scatterplot der Intensit�ten ergibt folgendes Bild:

6. Loess Normalisierung Die Punktwolke sollte so aussehen. Nach der Transformation gibt es nur noch eine Gesamtintensit�t. Vorher gibt es zwei Intensit�ten:Nach der Transformation gibt es nur noch eine Gesamtintensit�t. Vorher gibt es zwei Intensit�ten:

7. Normalisierung � Global Loess Zur Normalisierung kann man eine Regressionsgerade durch die Punktwolke legen:

8. Normalisierung � Global Loess Von jedem Punkt wird der Wert der Regressionskurve subtrahiert.

9. Normalisierung � Print Tip Loess F�r jeden Pin wird eine eigene Regressionskurve erzeugt. Ansonsten ist das Verfahren dasselbe wie Global Loess.

10. Normalisierung innerhalb eines Arrays Dichte nach Normalisierung mit print tip loess A simpler method of describing spatial patterns is to focus attention on the print tip groups. There may be slight physical differences between the print tips, perhaps differences in length or in the size of the opening or deformations after many hours of printing. Even in the absence of differences between the pins, the print tip groups can be used as a surrogate for more general spatial variation across the array.A simpler method of describing spatial patterns is to focus attention on the print tip groups. There may be slight physical differences between the print tips, perhaps differences in length or in the size of the opening or deformations after many hours of printing. Even in the absence of differences between the pins, the print tip groups can be used as a surrogate for more general spatial variation across the array.

11. Normalisierung mehrerer Arrays Bei Verwendung mehrerer Slides m�ssen diese untereinander normalisiert werden wegen: Unterschieden in der Pr�paration Unterschiedliches Ausma� an Degradierung der mRNA Spots unterscheiden sich von Slide zu Slide, Anteil der fixierten cDNA Beim erneuten Labelling kann unterschiedlich viel Farbstoff verwendet worden sein. Hybridisierungsparameter Temperatur Zeiten Menge an Molek�len zum Hybridisieren Ein h�ufig eingesetztes Verfahren zu Normalisierung ist die Quantilsnormalisierung. Bolstad et al. (2003) The between-array step addresses the comparability of the distributions of log intensities between arrays. Log(2*x) = log(2) + log(x) The between-array step addresses the comparability of the distributions of log intensities between arrays. Log(2*x) = log(2) + log(x)

12. Quantilsnormalisierung Quantilsnormaliserung geht von der Annahme aus, da� die Verteilung der Genexpression bei allen Wiederholungen ungef�hr dieselbe ist Ziel der Quantilsnormaliserung ist es, da� die Me�werte ebenfalls auf allen Slides dieselbe Verteilung haben.

13. Quantilsnormalisierung Die Intensit�tswerte der einzelnen Slides werden in einen Vektor geschrieben und aufsteigend sortiert. Es wird ein neuer Vektor angelegt. In diesem wird f�r jede Zeile der Mittelwert notiert.

14. Quantilsnormalisierung Zu jedem Wert x ist der normalisierte Wert x� der Wert, der in demselben Quantil in dem Vektor aus Mittelwerten liegt.

15. Quantilsnormalisierung �bertr�gt dieselbe empirische Verteilung der Intensit�ten auf mehrere Arrays S. 21S. 21

Normalisieren von Microarraydaten

Normalisieren von Microarraydaten

Presentation Transcript

Mitglied von

Bewältigung von Belastungen – Aufbau von Ressourcen

Rosten von Eisen – Korrosion von Eisen

Paul von Beneckendorff und von Hindenburg

von Neumann

Eine Kampagne von - Partnerorganisation von

Von Thunen

Von Ranke

Zusammenstellung von

Abnahme von Bauleistungen Geltendmachung von Mängelansprüchen

Analyse von

Entwickelt von:

von Mentoren

Zusammenschaltung von Antennen Aufbau von Antennengruppen

Präsentiert von

SOMMERMITTAG von JOHANN WOLFGANG VON GOETHE

Lustiges von

präsentiert von

von Margitta

von

Läsionen von Nervenwurzeln, von Plexus(-Anteilen) und von peripheren Nerven

von Exportgeschäften