1 / 22

Andreja Naumoski, PhD student,

NOVEL MEMBERSHIP FUNCTION IN PROCESS OF BUILDING PATTERN TREES FROM DIATOMS COMMUNITY IN LAKE PRESPA. Presented by:. Andreja Naumoski, PhD student, Teaching and Research Assistant at the Faculty of Electrical Engineering and Information Technologies, Skopje, Republic of Macedonia. Outline.

Télécharger la présentation

Andreja Naumoski, PhD student,

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NOVEL MEMBERSHIP FUNCTION IN PROCESS OF BUILDING PATTERN TREES FROM DIATOMS COMMUNITY IN LAKE PRESPA Presented by: Andreja Naumoski, PhD student, Teaching and Research Assistant at the Faculty of Electrical Engineering and Information Technologies, Skopje, Republic of Macedonia.

  2. Outline • Modelling the physico-chemical parameters • Bio-indicator data (diatom species abundance data). • Data description • Pattern tree methodology • Water Quality Models • Conclusion • Q&A Section

  3. Modelling the physico-chemical parameters • Usual approaches to water quality evaluation are divided in two main categories: • One based on physical and chemical methods, and • Another considering biological community’s evaluation. • Physical and chemical monitoring reflects only instantaneous measurements. • Biotic parameters on the other hand provide better evaluation of environmental changes, • Diatoms survival development integrates a period of time reflecting conditions that might not be anymore present at the time of sampling and analysis.

  4. Modelling the physico-chemical parameters • Usual approaches to water quality evaluation are divided in two main categories: • One based on physical and chemical methods, and • Another considering biological community’s evaluation. • Physical and chemical monitoring reflects only instantaneous measurements. • Biotic parameters on the other hand provide better evaluation of environmental changes, • Diatoms survival development integrates a period of time reflecting conditions that might not be anymore present at the time of sampling and analysis.

  5. Water quality classes (WQC) • In this paper, we build water quality models (WQMs) that are focused on predicting the diatoms-indicator relationship using the diatoms as bio-indicators of the ecological status of the lake. • WQMs take into account only the specified target abiotic factors of the environment, but still some temporal aspects may be taken into account. • The obtained models revealed interesting connections between the diatom species and the water quality, and vice versa, use the WQC to classify the diatoms in one of the WQ classes.

  6. WQM using diatoms as bio-indicators • The change of the physical-chemical environment is well detected using the property of the diatom bio-indicator status. • This relationship between the presence/abundance of these diatoms and the specific abiotic factors can be studied using machine learning techniques. • This is done under the implicit assumption that both are observed at a single point in time for a given spatial unit.

  7. Machine Learning Methodology – Pattern Trees& Data Description

  8. Pattern Trees • Pattern trees are hierarchical structures, that we use to classify the diatoms into one WQC. • The main question is: why to use pattern trees (PT) in the process of diatom classification? • First of all, the pattern trees are robust to over fitting, which is not the case with the classical methods and decision trees. • Secondly, they obtain a compact structure, which is essential in the process of representation of the knowledge gain from the biological data. • And third, these models can achieve high classification accuracy. • One of the reasons, why this method is better compared with the previous one, is the use of different fuzzy membership functions.

  9. Proposed Log-normal membership function • In this paper we introduce modified Log-normal membership function, which in general is specified by three parameters, represented as follows: • Where, the parameter a and c are usually positive and the b parameter is located at the centre of the curve. We also propose modification by taking the mean (μ) and standard deviation (σ) values of the given data range into account. In this way, each fuzzy term will reflect the very nature of the tested dataset with distributed Log-normal MF in the entire range.

  10. Data description • Acquired within the monitoring programme of the EU project TRABOREMA: • Measurement period of 16 months • Physical/chemical and biological analyses were performed • The physico-chemical properties of the samples provided the environmental variables for the habitat models, while the biological samples provided information on the relative abundance of the studied diatoms. • The following physico-chemical properties of the water samples were measured: temperature, dissolved oxygen, Secchi depth, conductivity, pH, nitrogen compounds (NO2, NO3, NH4, inorganic nitrogen), SO4, and Sodium (Na), Potassium (K), Magnesium (Mg), Copper (Cu), Manganese (Mn) and Zinc (Zn) content. • Three water quality classes and TOP10 diatoms are input to the pattern trees algorithm.

  11. TOP10 most abundant diatoms

  12. Water Quality Classes definition Physical-chemical parameters Name of the WQC Parameter range Saturated Oxygen Oligosaprobous SatO > 85 β-mesosaprobous 70-85 α-mesosaprobous 25-70 α-meso / polysaprobous 10-25 pH acidobiontic pH < 5.5 acidophilous pH > 5.5 circumneutral pH > 6.5 alkaliphilous pH > 7.5 alkalibiontic pH > 8 Indifferent pH > 9 fresh < 20 Conductivity fresh brackish < 90 brackish fresh 90 – 180 brackish 180 - 900 Saturated Oxygen [*], Conductivity [**] and pH [*, **]. [*]Krammer. K., and Lange-Bertalot. H,. 1986: ”Die Ssswasserflora von Mitteleuropa 2: Bacillariophyceae. 1 Teil,” pp. 876, Stuttgart: Gustav Fischer-Verlag. [**] Van Der Werff. A., and Huls. H,. 1957,1974: “Diatomeanflora van Nederland”. Abcoude - De Hoef.

  13. Experimental Design • The configuration of the experiments is set up as follows: • 1) A simple fuzzification method based on three evenly distributed membership functions including the modified log-normal function for each input variable is used to transform the crisp values into fuzzy values (Train). • 2). Two experiments are carried out, with the first (Exp2 – odd-even) using odd labelled data as training set and even labelled data as test set, and the second (Exp3 – even-odd) using even labelled data as training set and odd labelled data as test set and • 3) Standard 10-fold cross validation is used for testing the prediction performance accuracy of the algorithm (xVal) against standard crisp classifiers.

  14. Experimental Results

  15. Pattern tree generated for the fresh - Conductivity WQC. • From Rule1 can be easily seen that the NSROT, CPLA and DMAU, especially DMAU and CPLA with higher abundant than the NSROT diatom can be found in the water where Conductivity WQC is fresh. Using the generated rule we can immediately see what is the mean and the standard deviation value of these diatoms in measure sample. • These diatoms according the model tree exist in these waters, together with other diatoms, but in absence of the COCE diatom. Rule1: If (COCE has (µ; σ) = 0.0±3.82 or CPLA has (µ; σ) = 4.4±1.88) or (DMAU has (µ; σ) = 4.22±0.89 or NSROT has (µ; σ) = 3.44±1.46) then the Conductivity WQ class is fresh (with confidence of 0.4249).

  16. Pattern tree generated for the circumneutral - pH WQC • This rule has higher confidence factor than the Rule1.The model tree show that several diatoms can be found to exist in circumneutral waters except the DMAU diatom. • According to the tree model, the NPRE diatom can be more likely to be found with NSROT and APED diatom in circumneutral waters. NSROT and APED diatoms are also important habitants in this waters, but less abundant than the previous ones, according the model tree. • The entire model tree has confidence factor of 0.6874. Rule2: If (APED has (µ; σ) = 2.89±0.61 and DMAU has (µ; σ) = 0.0±0.56) or NPRE has (µ; σ) = 4.22±0.89 and NSROT has (µ; σ) = 3.44±1.46, then the pH WQ class is circumneutral (with confidence of 0.6874).

  17. Pattern tree generated for the оligosaprobous Saturated Oxygen WQC • This model tree indicated that the COCE diatom is mostly likely to be found in the оligosaprobous waters, or CSCU diatoms, which are more abundant that the NSROT diatom. For the NSROT diatom according to the model tree these waters are not suitable for its existence. • The rule have medium confidence factor of 0.5072. Rule3: If (NSROT has (µ; σ) = 0.00±1.463 or CSCU has (µ; σ) = 2.89±0.61) or COCE has (µ; σ) = 9.00±03.82 then the Saturated Oxygen WQ class is оligosaprobous (with confidence of 0.5072).

  18. Average prediction accuracy per WQC (in %)

  19. Comparison with classical classifiers 10-fold cross validation classification accuracy of crisp classify algorithms against proposed log-normal membership function

  20. Conclusion • The experiments on diatoms datasets WQC dataset show that modified log-normal MF for pattern trees outperform pattern trees which use trapezoidal, triangular or Gaussian in terms of prediction accuracy. • More important is the interpretation of the model trees, which in this case pattern trees, outperform in term of interpretability than the PCA, CCA, DCA and other methods, used by the biological experts previously. • Conducted experiments on the diatoms datasets show that the average prediction accuracy for the modified log-normal membership functions is greater than the classical classifiers.

  21. Future Work • Further research on developing more membership function in process of building pattern tress is necessary. • The current version also needed to be updated in term of new similarity metrics. • More fuzzy aggregations and similarity metrics may be more suitable for diatoms community dataset and can therefore lead to higher accuracy. • Trophic state Indexes [***] could be great indicator of diatoms-eutrophication process indication. [***] Carlson, R. E., Simpson, J. 1996. A Coordinator's Guide to Volunteer Lake Monitoring Methods. North American Lake Management Society, 96.

  22. Q&A Section Any Questions? Thank you for your attention

More Related