1 / 14

p resented by Vasileios Zois CS at USC 09/20/2013

Introducing Scalability into Smart Grid. p resented by Vasileios Zois CS at USC 09/20/2013. Smart Grid Project Services. Manage Data Sparse Data Heterogeneous Data Semantic Represantation Train Prediction Models Data Intensive Application On Demand Procedure

greg
Télécharger la présentation

p resented by Vasileios Zois CS at USC 09/20/2013

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introducing Scalability into Smart Grid presented by Vasileios Zois CS at USC 09/20/2013

  2. Smart Grid Project Services • Manage Data • Sparse Data • Heterogeneous Data • Semantic Represantation • Train Prediction Models • Data Intensive Application • On Demand Procedure • Make Prediction & Update Models • Fast Access to Trained Models • Update with new values

  3. Steps to Scalability • Management of Data • Choose Underline Technology • Evaluate provided services • Training of Models • Design Training Tools • Take Advantage of Infrastructure • Give Efficient Solutions to Training • Access & Update Training Models • Update: Change Invariants that Effect Prediction • Do it Efficiently

  4. Managing Data • Requirements • Efficient Usage of Storage • Access Client to Data • Semantic Organization of Data • Possible Solutions • Distributed File System (HDFS) • Raw Data • Work out a Structure (XML, Ontology Schemas) • Column Oriented NoSQL Systems(Hbase,Cassandra) • Structure offered – Column Families • Implemented Operations • Still Needs Reasoning Operations

  5. Prediction Models • Regression Tree • Support Features • Tree Building • Scalable Implementation OpenPlanet • ARIMA Model • Short Term Prediction • Does Not Support Features? • On Demand Training • Small Prediction Window

  6. Scalable Prediction • Brute Force • Efficient use of resources • Build a system from scratch • Decrease Problem Size • Group Data and Pick Representatives • Clustering of Data with Similar Features • Introduce Features into ARIMA model • Use features to cluster Data • Execute Model on Clustered Data • Customer  SuperCustomer

  7. Parallel Clustering • Problem • Computationally Expensive • High Dimensional • Inevitable Parallelization • Challenges to Parallelization • Partitioning of Data to achieve Load Balance • Reduction of the Communication Cost • Approaches • Hierarchical Clustering : PBirch • Evolutionary Strategies Clustering • Density Based Clustering : PDBSCAN • Model Based Clustering : Autoclass System

  8. Parallel Hierarchical Clustering • PBirch • Single Program Multiple Data(SPMD) • Message Passing Interface (MPI) • Steps • Distribute Data Equally • Build Tree on Each Processor • Execute Clustering on Leaf nodes - Parallel Kmeans • Results • Linear Speedup • Increased Communication Latency • http://www.cs.gsu.edu/~wkim/index_files/papers/pbirch.pdf

  9. Clustering with Evolutionary Strategies • Model • Stochastic Optimization • Biological Evolution Concepts • Recombination, Mutation • Motive: Huge Range of Possible Solutions • Parallelization Techniques • Master – Slave Model • Master in charge of parent solutions • Slave in charge of recombination and mutation • Fits into mapreduce model • Proposed Solution • http://www.cs.gsu.edu/~wkim/index_files/papers/clusteringwithes.pdf

  10. Parallel Density Based Clustering • PDBSCAN • Based on original DBSCAN Algorithm • Shared Nothing Architecture • Execution • Divide Input into Several Partitions • Concurrently Cluster Data Locally with DBSCAN • Merge Local Clusters into Global Clusters • dR*-Tree Introduced • Decreased Communication Cost – Efficient Access of Data • Distributed Data Pages • Replicated Indices on all Machines • Results • Near Linear Speedup to the number of Machines • http://www.cs.gsu.edu/~wkim/index_files/papers/fastParallel_XU.pdf

  11. Parallel Model Based Clustering • Auto-class System • Bayesian Classification • Probability of an Instance belonging to a class • Approach • SIMD  Single Instruction Multiple Data • Divide Input into Processors • Update Parameters for Classification Locally • No Need for Load Balancing • Results • Good Scaling • After a certain threshold the communication starts to hinder the performance

  12. Clustering By Sorting Potential Values • Main Idea • Potential Model • Derived from Gravitational Force Model in Euclidean Space • Parameters: • Gravitational Constant, • Bandwidth Distance B ( Max Distance from center of cluster ) • δ threshold distance (avoid singularity problem) • Execution • Calculate Potential at each Point • Sort Points According to the Calculated Potential • Choose Cluster Centers by iteration over sorted array • If distance between to points in array > B create new cluster • Results • Near optimal Solution • http://www.sciencedirect.com/science/article/pii/S0031320312001136

  13. Any Questions?

  14. Thank you for your attention! Vasilis Zois vzois@usc.edu

More Related