MARV4/2

MARV4/2 K-nn project in brief Reija Haapanen

Knn-inventointi, tavoitteet • Oppia monilähdeinventoinnin menetelmän eri vaiheet, • Tuottaa kohdealueelta monilähdeinventointiin perustuva metsävara-aineisto tarkkuusarvioineen • Tutkia tarkemmin menetelmän toimivuutta, mm.: • vaihteleeko tarkkuus erityyppisissä metsissä, • miten koealojen määrä ja k:n arvo vaikuttavat tuloksiin • miten kaukokartoitusaineisto vaikuttaa tuloksiin • miten tuloksia voi käyttää jatkolaskennoissa (pitkän aikavälin suunnittelu, biomassaestimaatit, hiiliestimaatit) Tähän listaan tulee vastauksia ryhmien erillistöistä, eli kaikki eivät tee kaikkea (mutta saavat tietoonsa muiden tulokset)

Some background • Since 1990 (NFI8) Metla has calculated operatively forest resources by using satellite images and digital maps as auxiliary data • The process is called Multi-Source National Forest Inventory

Why? • By using inexpensive auxiliary data: • Results can be extended to smaller areal units: • NFI field plots  reliable results for Forestry Centres (ca. 200 000 ha) • Multi-Source NFI reliable results for a municipality or a large forest estate. Results are simultaneously obtained for a single pixel (25 x 25 m).

The process of MSNFI Field measurements Original satellite image Original map data Checks and models Pre-processing Transformations Pre-processed field data Pre-processed image data • Image analysis • Estimation/classification • (k-NN) • Post-processing Statistics and accuracy estimates for large areas Small area statistics Thematic maps

Applied algorithm: k-NN 1. For each satellite image pixel, k most similar field data plots are searched for 2. Forest variables are calculated to the pixels as weighted averages of the field measurements.

Knn… • With the method, we can simultaneously produce results of all measured or derived variables. • The concordance of the variables is preserved in a single pixel (DBH, H, Vol, BA), in regression this may be a problem (every var. needs own model).

When selecting the neighbours • The similarity is determined by known auxiliary variables. One way to assess the similarity is to measure the distance from the target to the observed sample plots in the auxiliary data space. • The measure is often the simple Euclidean distance: Where m is the number of image bands, xt is the grey-value measured at the dependent variable and xr is the grey-value measured at the independent variable. The neighbours can be weighted according to the distance (the closer the heavier weight):

Efficiency, information content • In large area inventories the advantages are obvious: sparse field data can be extended to un-visited areas • In small areas, wall to wall maps can be produced by standwise inventory or measuring every tree (expensive). However, RS material costs, analysis costs and information needs must be taken into account. • Almost no knowledge of diseases, needs for silvicultural treatments, or quality can be obtained via remote sensing

Where does this work best? • In forests typical of the region in question (= many observations) • In homogeneous forests • Single tree species forests • In forests with one crown layer NOTE! These are not k-nn specific properties!

Parameters • Needed to calibrate the method for each image/data combination • Important parameters are e.g. • Number of neighbours (k) • Bands to be used, distance measure to be used • Geographical restriction of nn-search • Weighting of neighbours, weighting of bands • Use of stratification • Parameters for topographical correction

Parameters • The size of k (how many neighbours): • small (k=1-2)  the variation in field data is best preserved, but noise is large • large (~10-15-20-)  the RMSE (average error) is smaller but the estimates are also closer to the average (smoothing)

Parameters • Sample size, decided before going to field! • It has been recommended in Scandinavian conditions to have at minimum 400-600 forested field plots at hand for the estimation of forest variables from and area of a satellite image (Katila & Tomppo 2001).

Features Generally, a compromise must be made between the coverage, spatial resolution and spectral resolution of applied images: • large area coverage and wide spectral range of Landsat type satellite images • superior spatial resolution of aerial photographs and VHR satellite images, allowing the utilization of the textural information • Types between

Generally, adding more features improves the results, but: • in k-NN and other methods employing distance measures, the Curse of Dimensionality enters into the picture when the nbr of bands increases • Detrimental features should not be used, since they can pull the estimates into wrong direction • some features are more important than others  apply weigths

Feature weighting • It is difficult to control the weighting of the different input variables: • Features with large variation receive the highest weights, unless the image features are standardized • After standardization the weights still don't reflect the potential of the features • Their variation and usefulness for the estimation should be taken into account simultaneously

Selection of image features for estimation • Correlation between image features and field variables generally a good indicator, BUT: • Image features often highly correlated with each other as well • Two correlated features may or may not complement each other • Features showing weak correlation with field variable may be useful with other features (extraneous variables) • Even two seemingly useless features may be useful together • Correlation analysis, or filters based on correlation coefficients not sufficient! •  Subset selection algorithms testing combinations of features instead of single features are needed

Haapanen & Tuominen 2007

Masks and auxiliary data • Using digital masks generally improves estimation accuracy: • Land-use classes • sepratate forest from other land-uses  avoid mis-classifications, get estimates only for forest pixels • Separate mineral soils from peatlands  improves accuracy of both strata • Digital elevation model • Topographical correction for differences in illumination, constraints for nearest neighbors, remove illogical estimates An important task: delineate the wanted area

Problems, error sources • Measurement errors • Lacking field sample, does not cover all variation present in the image: rare habitats, mixels. • Low correlation between field data and image values due to: • Positioning errors (image rectification, misslocated sample plot) • Noise caused by atmosphere, illumination, topography • Mixels, insufficient spatial resolution • Insufficient radiometric resolution: image values saturate in large stands, mature spruce forests and wet treeless peatlands mix etc. • Wrong type of reference data due to changes in vegetation zones • Estimator works badly • No up-to-date imagery is available due to clouds etc. • NOTE! None of these is exclusively related to k-nn or satellite imagery!

Error estimation • We can get an idea of the error in each produced pixel by performing the knn-estimation within the field sample  Cross-validation • This means that each field plot is left out at a time and the forest variables estimated using the other plots • In our excercise, the REFE program first calculated this estimate of the RMSE and then produced the image with k-nn. NOTE: these were separate tasks!

MARV4/2

MARV4/2

Presentation Transcript