1 / 52

Introduction to biostatistics Lecture plan

Introduction to biostatistics Lecture plan. Basics Variable types Descriptive statistics : Categorical data Numerical data I nferential statistics Confidence interval s Hipot heses testing. DEFINITIONS.

nikki
Télécharger la présentation

Introduction to biostatistics Lecture plan

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to biostatisticsLecture plan Basics Variable types Descriptive statistics: Categorical data Numerical data Inferential statistics Confidence intervals Hipotheses testing

  2. DEFINITIONS STATISTICS can mean 2 things:- the numbers we get when we measure and count things (data) - a collection of procedures for describing and anlysing data. BIOSTATISTICS – application of statistics in nature sciences, when biomedical and problems are analysed.

  3. Why do we need statistics? • ????

  4. Basic parts of statistics: • Descriptive • Inferential

  5. Terminology • Population • Sample • Variables

  6. Variable types • Categorical (qualitative) • Numerical (quantitative) • Combined

  7. Categorical data Nominal • 2 categories • >2 categories Ordinal

  8. Numerical data • Continuous • Discrete

  9. Description of categorical data • Arranging data • Frequencies, tables • Visualization (graphical presentation)

  10. Frequencies and contingency tables From those who were unsatisfied 4 were males, 6 were females.

  11. Graphical presentation

  12. Graphical presentation

  13. Graphical presentation

  14. Graphical presentation

  15. Graphical presentation • Other: - Maps - Chernoff faces - Star plots, etc.

  16. Description of numerical data • Arranging data • Frequencies (relative and cumulative), graphical presentation • Measures of central tendency and variance • Assessing normality

  17. Grouping • Sorting data • Groups (5-17 gr.) according researcher’s criteria. To assess distribution, for graphical presentation in excel

  18. Frequencies, their comparison and calculation 197 students were asked about the amount of money (litas) they had in cash at the moment.

  19. Gaphical presentation of frequencies

  20. Normal distributions • Most of them around center • Less above and lower central values, approximately the same proportions • Most often Gaussian distribution

  21. Not normal distributions • More observations in one part.

  22. Asymmetrical distribution

  23. How would you describe/present your respondents if the data are numeric? 2 groups of measures: • Central tendency (central value, average) • Variance

  24. MEASURES OF CENTRAL TENDENCY • Means/averages (arithmetic, geometric, harmonic, etc.) • Mode • Median • Quartiles

  25. MEASURES OF CENTRAL TENDENCY • Arithmetic mean (X, μ)

  26. MEASURES OF CENTRAL TENDENCY Median (Me) – the middle value or 50th procentile (the value of the observation, that divides the sorted datain almost equal parts). It is found this way • When n odd: median is the middle observation • When n even: median is the average of values of two middle observations

  27. MEASURES OF CENTRAL TENDENCY • Mode (Mo) – the most common values • Can be more than one mode

  28. MEASURES OF CENTRAL TENDENCY • Quartiles (Q1, Q2, Q3, Q4) – sample size is divided into 4 equal parts getting 25% of observations in each of them.

  29. Is it enough measure of central tendency to describe respondents?

  30. MEASURES OF VARIANCE • Min and max • Range • Standard deviation – sqrt of variance (SD) • Variance - V= ∑(xi - x)2/n-1 • Interquartile range (Q3-Q1 or 75%-25%) IQRT

  31. What measures are to be used for sample description? If distribution is NORMAL • Mean • Variance (orstandard deviation) If distribution is NOT NORMAL • Median • IQRT or min/max Those measures are used also with numeric ordinal data

  32. X, Mo, Me • Mean~Median~Mode, • SD ir empyric rule

  33. EMPYRICAL RULE Number of observations (%) 1, 2 ir 2.5 SD from mean if distribution is normal

  34. Example X=8 SD=2,5 +2SD -2SD X

  35. Normality assessmentSummary • Graphical • Comparison of measures of central tendency; empyrical rule (mean and standard deviation) • Skewness and kurtosis(if Gaussian =0) • Kolmogorov-Smirnov test

  36. Boxplot 75th Procentile 75th Procentile Mean( *) Median 25th Procentile 25th Procentile Outliers

  37. Boxplot example

  38. Central limit theorem

  39. Inferential statistics • Confidence intervals • Hipothesestesting

  40. Confidence intervals Interval where the “true” value most likely could occur.

  41. The variance of samples and their measures X3, SD3; p3 X2, SD2; p2 X1, SD1; p1 X4; SD4; p4 X μ, σ, p0

  42. The variance of samples and confidence intervals μ, p0

  43. Confidence interval • Statistical definition: If the study was carried out 100 times, 100 results ir 100 CI were got, 95times of 100the “true” value will be in that interval. But it will not appear in that interval 5 times of 100.

  44. Confidence intervals(general, most common calculation) 95% CI : X ± 1.96 SE Xmin; Xmax Note: for normal distribution, when n is large 95% CI :p± 1.96 SE pmin; pmax Note: when p ir 1-p > 5/n

  45. Standard error (SE)

  46. Width of confidence inerval depends on: • Sample size; • Confidence level (guaranty - usually 95%, but available any %); • dispersion.

  47. Hipotheses testing H0: μ1=μ2; p1=p2; (RR=1, OR=1, difference=0) HA: μ1≠μ2; p1≠p2 (two sided, one sided)

  48. Hipotheses testing Significance level α (agreed 0.05). Test for P value (t-test, χ2, etc.). P value is the probability to get the difference (association), if the null hypothesis is true. ORP value is the probability to get the difference (association) due to chance alone, when the null hypothesis is true.

  49. Statistical agreements • If P<0.05, we say, that results can’t be explained by chance alone, therefore we reject H0and accept HA. • If P≥0.05, we say, that found difference can be due to chance alone, therefore we don’t reject H0.

  50. Tests Test depends on • Study design, • Variable type • distribution, • Number of groups, etc. • Tests(probability distributions): • z test • t test (one sample, two independent, paired) • Χ2 (+ trend) • F test • Fisher exact test • Mann-Whitney • Wilcoxon and others.

More Related