1 / 28

Comparative Study: Data Mining and Machine Learning for Weather Forecasting

This study explores the application of data mining and machine learning techniques for accurate weather forecasting. The study compares the performance of support vector regression and artificial neural network models in predicting rainfall and temperature.

mspillman
Télécharger la présentation

Comparative Study: Data Mining and Machine Learning for Weather Forecasting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Application of Data Mining and Machine Learning for Weather Forecasting: A Comparative Study Nasimul Hasan C121046 Nayan Chandra Nath C121038 Department of CSE International Islamic University Chittagong

  2. Outline • Introduction • Motivation and Goal • Methodology • Experiment Design • Result analysis • Conclusion

  3. Introduction • Weather: • Has great significance over our agriculture. • Deterministically chaotic system • Lack of proper data • Continuous change of climate

  4. Problem • The main challenge is to predict weather with most accuracy • Lots of work done before • Change of seasons

  5. Previous Work • A. Mellit, A. Massi Pavan & M. Benghanem developed a SVM model which can produce up to 99% accurate prediction for different models. • Hall and Tony proposed A neural network model using input from the Eta model and upper air soundings for the probability of precipitation (PoP) and quantitative precipitation forecast (QPF) for the Dallas-Fort Worth, Texas area. Their model forecasts with over 70% of the PoP forecasts being less than 5% or greater than 95%.

  6. Motivation & Goal • Motivation: • SVR and ANN is a powerful machine learning technique for pattern recognition • Introducing of using different kinds of windowing function as data preprocess is a new idea • Combining windowing function and support vector regression can make good model for time series prediction. • Goal: • Propose a good Machine Learning model to predict rainfall and temperature.

  7. Methodology • Support Vector Regression • Support vector machine (SVM), a novel artificial intelligence-based method developed from statistical learning theory • SVM has two major features: classification (SVC) & regression (SVR). • In SVM regression, the input is first mapped onto a m-dimensional feature space using some fixed (nonlinear) mapping, and then a linear model is constructed in this feature space. • A margin of tolerance (epsilon) is set in approximation. • This type of function is often called – epsilon intensive – loss function. • Usage of slack variables to overcome noise in the data and nonseparability

  8. Methodology (cont..) The regression problem of SVM can be expressed as the following optimization problem. Minimize: Subject to:

  9. Methodology (cont..) Artificial Neural Network Neural Network has its starting points in endeavors to discover numerical representations of data processing in biological systems[31]. Without a doubt, it has been utilized extensively to cover an extensive variety of various models, a lot of them have been the subject of misrepresented cases with respect to their biological credibility. From the viewpoint of applications of pattern recognition, however, biological authenticity would force totallysuperfluous limitations.

  10. The ANN Network

  11. Methodology (cont..) • Parameters: • Horizon (h) • Window size • Step size • Training window width • Testing window width • Windowing operator: • Transform the time series data into • a generic data set • Convert the last row of a window • within the time series into a label • or target variable • Fed the cross sectional values as • inputs to the machine learning • technique such as liner regression, • Neural Network, Support vector • machine and so on.

  12. Methodology (cont..) Moving Average:

  13. Experiment Design • Data • Experiment dataset had been collected from Meteorological Department, Bangladesh. • 7 year’s historical data (2008-2014) of Chittagong were collected. • Six attributes, Date, total, avg, max, min, MA were used in experiment.

  14. Experiment Design • Data Preprocessing • Prepared for ML using • Missing value replacement • 80% for training and 20% for testing

  15. Experiment Design Rectifier

  16. Experiment Flowchart Training Test

  17. Experiment Result Result evolution technique: Here, = original value of a point for a given time period t n = the total number of fitted points = the fitted forecast value for the time period t

  18. Correlation between features using Pearson Correlation matrix

  19. = the actual observations time series, is the estimated or forecasted time series, SAE = the sum of the absolute errors (or deviations), N = the number of non-missing data points.

  20. SVM produced best result with almost 98.65% accuracy for rainfall and 95% for temperature prediction ANN produced best result with almost 97.45% accuracy for rainfall and 96.7% for temperature prediction

  21. Results for different models using SVR

  22. ANN Monthly Rainfall Horizon 1

  23. ANN Monthly Temperature Horizon 1

  24. SVM Monthly Rainfall Horizon 1

  25. SVM Monthly Temperature Horizon 1

  26. Conclusion • Discussions : • Different windowing function can produce different prediction results. • Limitations & Future works: • Used only Moving Average and windowing operators. • Only one station data set were used to undertake the experiments. • Did not compare with other machine learning techniques. • In future, we will apply our model to other rainfall data set and will also • compare our research result with other types of data mining techniques.

  27. Thank You

More Related