Elementary statistics for foresters Lecture 1 Socrates/Erasmus Program @ WAU Spring semester 2005/2006
Instead of introduction: a few quotes from smart people
Statistics is a pain • Every normal person who takes it knows that it is (almost always) badly taught, unreadable, and even when you follow the idea, you can't imagine where to apply it.
So, why to learn it? • The reason to learn this stuff is that it is terribly useful in very practical ways.
So, why to learn it? • The reason to learn this stuff is that it is terribly useful in very practical ways. • The difference between 50 and 100 plots established in the field may not seem important to someone in a warm, dry office, but it matters to somebody on a wet, cold, 60% slope.
So, why to learn it? • The reason to learn this stuff is that it is terribly useful in very practical ways. • The difference between 50 and 100 plots established in the field may not seem important to someone in a warm, dry office, but it matters to somebody on a wet, cold, 60% slope. • The reason to understand this stuff is that it can save real money, real time and real sweat.
Good news about statistics • Normal people can learn it, with little math background or attitude.
Good news about statistics • Normal people can learn it, with little math background or attitude. • It's true that many statisticians come from mathematics, but it isn't necessary or useful for most applications.
Good news about statistics • Normal people can learn it, with little math background or attitude. • It's true that many statisticians come from mathematics, but it isn't necessary or useful for most applications. • Ordinary people who can see statistics in perspective are often the most innovative and credible users of statistics.
Statistics has little to do with math • There are some exact probabilities that are used, and you may need math to calculate them, but this is a detail. • All the powerful ideas are logical ideas. • Calculators (or computers) now do all the math, you only have to work with the logical ideas.
Statistics has little to do with math • There are only a few important ideas, even though there is a mass of names and symbols swirling around them. • There are no more than a handful of important equations too, they just have lots of special forms.
Statistics is a matter of practice • Besides, statistics is a normal procedure many of you are expected to know how to use. • Just like driving a truck, it's a necessary part of doing the job. • Like using a chain saw or juggling, it's a matter of practice and the right approach.
Don't be intimidated! Look at some of the people that do this work - if they can learn it, so can you.
Basic concepts and definitions • However, before we start a real statistical adventure, we have to introduce a few definitions and basic ides to be able to communicate easier during the whole course.
Basic concepts and definitions • However, before we start a real statistical adventure, we have to introduce a few definitions and basic ides to be able to communicate easier during the whole course. • Let's start with a few definitions of statistics itself.
Statistics • Statistics is the science and practice of developing human knowledge through the use of empirical data. • Statistics is a method of analysis, treating about • collecting the data, • summarizing the data, and • making conclusions based on the data.
Statistics • Statistics is a discipline which deals with the • collection, • organization, and • interpretation of data. • Statistics is a collection of methods for • planning experiments, • obtaining data, • and then organizing, summarizing, presenting, analyzing, interpreting and drawing conclusions based on that data.
Descriptive statistics • Descriptive statistics are used to summarize or describe characteristics of a known set of data. • Used if we want to describe or summarize data in a clear and concise way using graphical and/or numerical methods.
Descriptive statistics • For example: we can consider everybody in the class as a group to be described. Each person can be a source of data for such an analysis. • A characteristic of this data may be for example age, weight, height, sex, country of origin, etc.
Descriptive statistics • Closer-to-forestry example: we can consider all pine stands in central Poland as a group to be characterized. • Each stand can be described by its area, age, site index, average height, QMD, volume per hectare, volume increment per hectare per year, amount of carbon sequestered, species composition, damage index, ...
Graphical description of data • Pictures are very informative and can tell the entire story about the data. • We can use different plots for different sorts of variables. We can use for example bar plots (histograms), pie charts, box plots, ... .
Numerical description of data • Numerical description is used for quick capturing of main data features using special values - statistical measures • Graphical and numerical methods will be discussed later (during a lecture on descriptive statistics)
Inferential statistics • Inferential statistics goes far beyond the simple description of the data • It means the use of sample to make inferences about a larger set of data from which the sample was chosen.
Inferential statistics • For example: we can consider participants of this course as a sample of all Socrates students taking part in all courses at the WAU this academic year and calculatethe average age of all of us. • Then we could state that the average age of all Socrates students is the same as ours (as in our sample).
Inferential statistics • Closer-to-forestry example: we can consider pine stands used for the large-scale inventory in Poland and calculate eg. an average volume per hectare. • Then we could state that the average volume per hectare of all pine stands in Poland is the same as in our sub-population (our sample).
Population • A population is the complete collection of elements used for the study (to be studied). • Population is something we are interested in. Like in the previous example: all pine stands in Poland could be our population of interest, all trees in a given forest tract, all Socrates students at WAU this year, ...
Population consists of individuals • Population is a collection of individuals, which can be described with data. • All set of pine stands in Poland consists in fact of single stands. Each stand can be described by some characteristics such as its area, age, site index, average height, QMD, volume per hectare, volume increment per hectare per year, amount of carbon sequestered, species composition, damage index, ...
Sample • A sample is a part of a population, a subset of elements drawn from the population.
Parameter • A parameter is a characteristic of the entire population. • For example: an average age (characteristic) of all spruce stands in Finland (population) is a parameter.
Statistic • A statistic is a characteristic of a sample (note the second meaning of the word "statistics"). • For example: an average age (characteristic) of spruce stands chosen for measurement during an inventory (sample) is a statistic.
Estimator • An estimator is a statistic (coming from a sample) used to inference about a parameter (of the entire population)
Inferential statistics • Taking these together: inferential statistics are used to figure out parameters (characterictics of the population) based on statistics (characteristics coming from a sample) which are estimators. • This is a major way of performing statistical analyses.
Variable • A variable is a value or characteristic observed on units in the sample that can vary from unit to unit in the sample. • Variables are just attributes of an individual. • Example: people in our lab can be described using various characteristics, such as sex, hair colour, country, height, weight, IQ, ...
Variable • Closer-to-forestry example: all trees in one of forest tracts in the Rogów forest district are somehow similar (they have their bole, branches, crown, leaves, ...), but • they can be described by a whole bunch of characteristics (variables, attributes) such as species, height, DBH, crown length, crown ratio, volume, taper, form factor, ...
Variables • qualitative (which means: describing belonging to a group or category, eg. sex, hair color, tree species) • quantitative (which means: possible to measure using a numerical scale, or numeric values for which addition and averaging make sense, eg. DBH, height, crown ratio, ...).
Variables • if variables can take only a finite set of values, we are talking about discrete variables (eg. age, DBH class, ...) • if variables can take any value (or any value from a given interval), we are talking about continuous variables (eg. height, DBH, ...) • In many cases, due to measurement limitations or simplifications, continuous variables can be treated as discrete (eg. DBH measured as rounded to 1mm)
How to measure / arrange variables? • nominal scale: data consist of names, labels or categories, with no particular ordering scheme (eg. species) • ordinal scale: data can be arranged in some order, but differences are meaningless in terms of values (eg. damage index) • interval scale: data can be arranged in some order with meaningful differences between values (eg. DBH class) • other
Dictribution • Variables have their distribution. • Distribution of a variable gives the values the variable can take and how often it takes on each value. • We'll talk more about variable distributions during the lecture on descriptive statistics, and more about theoretical distributions during the lecture on distributions.
Values and equations • population parameters: μ , σ2, σ, α, β, δ • sample statistics: x, s2, s, a, b, ... • Indices/subscripts: i, j, xi, yi • Not very informative to use just i, j, k • Sum: ∑ with index, ∑ without index • Using brackets
Sample • Sample is a part (subset) of the population • Sample (individuals to be measured) has to be chosen in a specific way • Sample is used to inference about the entire population (using a special procedure called estimation)
Why sample? • Sometimes all the individuals are measured • This is referred to as „a census” or „a 100% sample” • When do we measure all items? • when the area is small • when the values are high • when credibility is a value itself (legal cases) • When sampling process causes problems (boundary)
Why sample? • Population size • Cost • Time constraints • Destruction and disruption (eg. explosives) • Improving the accuracy (!) • Accurate and carefull measeurements • Acceptable sampling error • Means: eliminate measurement error and accept sampling error instead
Why sample? • Relative answers (when eg. volume per hectare more important than the total volume of the entire area; also if not possible to delineate the total area of interest)
Good sample? • Ramdomly selected • Probability of selecting a particular individual from the population is known, but not necessarily equal to the probability of selecting another • Accurate, precise, and unbiased • Wise and efficient • Flexible
Accuracy and precision • Accuracy is the degree of conformity of a measured/calculated quantity to its actual (true) value. • Precision is the degree to which further measurements or calculations will show the same or similar results.
The target analogy • Repeated measurements are compared to arrows that are fired at a target.
Accuracy • Accuracy describes the closeness of arrows to the bullseye at the target center. Arrows that strike closer to the bullseye are considered more accurate. • The closer are measurements to the accepted value, the more accurate the measurement is considered to be.