1 / 66

Univarient & Bivarient Geo-statistical analysis

Univarient & Bivarient Geo-statistical analysis. Mirza Muhammad Waqar Contact: mirza.waqar@ist.edu.pk +92-21-34650765-79 EXT:2257. RG712. Course: Special Topics in Remote Sensing & GIS. What is Statistics About?.

ulla-burt
Télécharger la présentation

Univarient & Bivarient Geo-statistical analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Univarient & Bivarient Geo-statistical analysis Mirza Muhammad Waqar Contact: mirza.waqar@ist.edu.pk +92-21-34650765-79 EXT:2257 RG712 Course: Special Topics in Remote Sensing & GIS

  2. What is Statistics About? • Statistics is the science of collecting, organizing, analyzing and interpreting data in order to make decisions • Statistics is the science of data-based decision making in the case of uncertainty

  3. Statistical Analysis Problem Statistical Cycle Plan Conclusion Analysis Data

  4. Problem • "I wonder if there are differences between...“ • What information will you need to answer the question? • Identify two or more sub-groups of the population to compare. • What variables are likely to show differences?

  5. Plan • If collecting data you will need to plan a survey of questionnaire. • Using available data sets is recommended • If using a data set decide what sub-groups of data are needed and choose from the available variables (choose carefully so you can answer the problem

  6. Data • Collect data by making a survey or questionnaire, OR take a sample from large data set. (at least 30 values) • For example, Census data • Clean the data set before continuing

  7. Analysis • Analyze the data to find similarities and differences. • You will need measures of central tendency (mean, median, mode) AND measures of spread (range, inter quartile range, standard deviation) • Use technology to calculate the statistics: calculator, or EXCEL (using excel)

  8. Conclusion • Remember that you are analysing and comparing data from a SAMPLE from a population • Is there a difference between the subgroups? • Comparisons made from a Box-and-Whisker graph • Comparisons bases on measures of central tendency • Comparisons made from measures of spread

  9. Role of Statistics in GIS • To describe and summarize spatial data. • To make generalizations concerning complex spatial patterns. • To use samples of geographic data to infer characteristics for a larger set of geographic data. • To determine if the magnitude or frequency of some phenomenon differs from one location to another. • To learn whether an actual spatial pattern matches some expected pattern.

  10. What is Geostatistics? • Applies the theories of statistical inference to geographic phenomena. • Methods of geostatistics are used in petroleum geology, hydrogeology, hydrology, meteorology, oceanography, geochemistry • A way of describing the spatial continuity as an essential feature of natural phenomena. • Recognized to have emerged in the early 1980’s as a hybrid of mathematics, statistics, and mining engineering.

  11. Some Useful Definitions • Data –information coming from observations, counts, measurements or responses. • The data you will be analyzing will almost always be a sample form a population. • Population – the collection of all outcomes, responses, measurements or counts that are of interest. • Sample – a subset of a population. • We will almost always be dealing with samples and hopping to make inference about the population.

  12. Some Useful Definitions • Parameter – numerical description of a characteristic of the population. • Statistic – a description of a characteristic of the sample. • We will often wish to make inferences about parameter based on statistics.

  13. Some Useful Definitions • Descriptive Statistics – relate to organizing, summarizing and displaying data. • Inferential Statistics – relate to using a sample to draw conclusions about a population. • Inferential statistics involves drawing a conclusion from some data.

  14. Inferences vs. Descriptive • Consider: • Average length of females and males: 90cm and 100cm respectively. • Descriptive statistics: the values. • Inference: males are (in general) taller than females.

  15. Descriptive Statistics • 3 categories of descriptive statics in geostatistics • Univariate Descriptive Statistics • Use to describe and summarize single data/variable • Bivariate Descriptive Statistics • Use to describe relationship between two data/variable • Spatial Descriptive Statistics • Describe data in term of space and time

  16. Univariate Description • Describe and summarize single variable • Graphical methods • Histogram • Cumulative Frequency • Numerical methods divides in three categories • Measurement of location • Measurement of spread • Measurement of shape

  17. Univariate Description • Measurement of location • Measurement of center location • Mean • Median • Mode • Measurement of other part • Qunatile • Quartile • percentile • Measurement of spread (variability) • Variance • Standard Deviation • Inter-Quartile range • Measurement of shape (symmetry & length) • Coefficient of skewness • Coefficient of Variation

  18. Frequency Table and Histogram • Histogram – is a bar graph that plots the frequency of distribution of dataset. • The horizontal scale is representing classes/bin • The vertical scale measures the frequencies of the classes. • Consecutive boundaries much touch

  19. Ideal Histogram for Image Analysis Vegetation Urban Area Frequency (f) Soil Water Band A

  20. Actual Histogram from Image Analysis Vegetation Urban Area Frequency (f) Soil Water Band A

  21. Histogram from Image Analysis • Very informative tool for analysis. • Histogram define the contrast of satellite image. • More the BV’s range, more the contrast. Low Contrast Histogram High Contrast Histogram

  22. Histogram from Image Analysis • We can also identify the largest land cover in satellite image by histogram. • Rough quantification of landcovers can be made using histogram. • This rough quantification leads to correct quantification. • Using histogram, range of a particular landcover can be identified in aspect of BV.

  23. Frequency Table • To develop a histogram a frequency table is used. • Frequency table: records how often observed values fall within certain intervals or classes.

  24. Constructing a Frequency Distribution • Decide on the number of classes to include in the frequency distribution. • Find the class width as follows: • Determine the range of the data • Divide the range by the number of classes and round up to the next convenient number • Find the class limits: • Start with the lowest value as the lower limit of the first class, add the class width to this to obtain the lower limit for the second class, etc. • Place a mark in the row for the class corresponding to each data point • Count the number of marks in each class.

  25. Frequency Table

  26. Cumulative Frequency Table and Histogram • Cumulative frequency of a class is the sum of the frequency of that class and all previous classes. • The cumulative frequency for the last class is always n.

  27. Cumulative Frequency Tables

  28. Cumulative Histogram

  29. Measure of Location • It provide us the information about where various part (information) of data lies • Center of data can be find by • Mean • Median • Mode • Location of other parts of the data are given by the quantiles

  30. Mean Median Mode • Mean – average of all the data points in the data/distribution • Unique and unbiased • Based on every data point in the dataset • Can be sensitive to outlaying observations • Median – middle value in an ordered array of number. • Unaffected by extremely large and extremely small values. • Mode – the most frequently occurring value in a dataset. • Unlike the mean and median, the mode is not always uniquely defined. • Bimodal – two values having same number of instances in the data • Multimodal – three or more values having same number of occurrences

  31. Univarient Statistics for Image Analysis • The histogram of satellite image can not be the uni-mode data. • Number of mode represents how many land covers exists in the satellite image. • We can’t make decision about transition zone using histogram.

  32. Univarient Statistics for Image Analysis Vegetation Urban Area Frequency (f) Soil Water Band A Vegetation Urban Area Frequency (f) Soil Water Band A

  33. Which Measure is Best? • No clear answer to this question. • The mean can be influenced by outliers while the mode may not be particularly “typical central value”. • Statistical inference based on the median and the mode is difficult.

  34. Percentiles • Divide a group of data into 100 parts • At least n% od data live below the nth percentile, and most (100-n)% of the data lie above the nth percentile. • Example – 90th percentile indicates that at least 90% of the data lie below it, and at most 10% of the data live above it. • The median and the 50% percentile have the same value.

  35. Percentiles (i): Computational Procedure • Organize the data into an ascending ordered array. • Calculated percentile location i • Determine the percentile’s location and its value. • If i is a whole number, the percentile is the average of the value at the i and (i+1) positions. • If i is not a whole number, the percentile is at (i+1) position in the order array.

  36. Percentiles: Example • Raw Data: 14, 12, 19, 23, 5, 13, 28, 17 • Order Array: 5, 12, 13, 14, 17, 19, 23, 28 • Location of 30th percentile i = = 2.4 • The location index, i, is not a whole number; i+1=2.4+1=3.4; the whole number portion is 3; the 30th percentile is at the 30th location of the array; the 30th percentile is 13.

  37. Quartiles

  38. Formulae in EXCEL • Calculating Means: Average(data) • Calculating Median: Median(data) • Calculating Mode: Mode(data) • Calculating Minimum: min(data) • Calculating Maximum: max(data) • Calculating Quartile: QUARTILE(data,quart) • Calculating Percentile: PERCENTILE(array,k)

  39. Measure of Spread/Variation • Measure of variability describe the spread or the dispersion of a dataset. • Common measures of variability • Range • Interquartile Range • Mean Absolute Deviation • Variance • Standard Deviation • Coefficient of Variation

  40. Range • The difference between the largest and the smallest values in a set od data • Simple to compute • Ignore all data points except two extremes • Range = Maximum – Minimum • Range tells us about the spread of data. • Some time range provides us very biased information when outliers exists in data

  41. Interquartile Range • Range of values between the first and third quartiles • Less influenced by extremes • Interquartile Range = Q3 – Q1

  42. Deviation, Variance and Standard Deviation • The deviation of a data entry x in a population data set is the difference between x and population mean µ, i.e. Deviation of x = x - µ • The sum of the deviation over entries is zero.

  43. Mean Absolute Deviation • Average of the absolute deviation from the mean M.A.D. = M.A.D. = = 4.8

  44. Variance • The population variance is the sum of squared deviation over all entries: Population Variance = σ2 =

  45. Population Variance • Average of squared deviation from the arithmetic mean σ2 = M.A.D. = = 26.0 Sample Variance S2 =

  46. Variance for Image Analysis • For variance analysis, we go for comparative analysis. • By comparing variance of all bands we come to know that which band has more dispersion.

  47. Variance for Image Analysis • Less the variance, it depicts that the homogeneity of the data is high. • Outlier can disturb the variance.

  48. Standard Deviation • The population standard deviation is the square root of the population variance i.e. σ = =

  49. Standard Deviation • Square root of the variance σ = σ = = Standard Deviation of Sample σ =

  50. Empirical Rules • Data are normally distributed (or approximately normally distributed)

More Related