Enhancing Smart Grid Scalability through Data Management and Prediction Models

Introducing Scalability into Smart Grid presented by Vasileios Zois CS at USC 09/20/2013

Smart Grid Project Services • Manage Data • Sparse Data • Heterogeneous Data • Semantic Represantation • Train Prediction Models • Data Intensive Application • On Demand Procedure • Make Prediction & Update Models • Fast Access to Trained Models • Update with new values

Steps to Scalability • Management of Data • Choose Underline Technology • Evaluate provided services • Training of Models • Design Training Tools • Take Advantage of Infrastructure • Give Efficient Solutions to Training • Access & Update Training Models • Update: Change Invariants that Effect Prediction • Do it Efficiently

Managing Data • Requirements • Efficient Usage of Storage • Access Client to Data • Semantic Organization of Data • Possible Solutions • Distributed File System (HDFS) • Raw Data • Work out a Structure (XML, Ontology Schemas) • Column Oriented NoSQL Systems(Hbase,Cassandra) • Structure offered – Column Families • Implemented Operations • Still Needs Reasoning Operations

Prediction Models • Regression Tree • Support Features • Tree Building • Scalable Implementation OpenPlanet • ARIMA Model • Short Term Prediction • Does Not Support Features? • On Demand Training • Small Prediction Window

Scalable Prediction • Brute Force • Efficient use of resources • Build a system from scratch • Decrease Problem Size • Group Data and Pick Representatives • Clustering of Data with Similar Features • Introduce Features into ARIMA model • Use features to cluster Data • Execute Model on Clustered Data • Customer  SuperCustomer

Parallel Clustering • Problem • Computationally Expensive • High Dimensional • Inevitable Parallelization • Challenges to Parallelization • Partitioning of Data to achieve Load Balance • Reduction of the Communication Cost • Approaches • Hierarchical Clustering : PBirch • Evolutionary Strategies Clustering • Density Based Clustering : PDBSCAN • Model Based Clustering : Autoclass System

Parallel Hierarchical Clustering • PBirch • Single Program Multiple Data(SPMD) • Message Passing Interface (MPI) • Steps • Distribute Data Equally • Build Tree on Each Processor • Execute Clustering on Leaf nodes - Parallel Kmeans • Results • Linear Speedup • Increased Communication Latency • http://www.cs.gsu.edu/~wkim/index_files/papers/pbirch.pdf

Clustering with Evolutionary Strategies • Model • Stochastic Optimization • Biological Evolution Concepts • Recombination, Mutation • Motive: Huge Range of Possible Solutions • Parallelization Techniques • Master – Slave Model • Master in charge of parent solutions • Slave in charge of recombination and mutation • Fits into mapreduce model • Proposed Solution • http://www.cs.gsu.edu/~wkim/index_files/papers/clusteringwithes.pdf

Parallel Density Based Clustering • PDBSCAN • Based on original DBSCAN Algorithm • Shared Nothing Architecture • Execution • Divide Input into Several Partitions • Concurrently Cluster Data Locally with DBSCAN • Merge Local Clusters into Global Clusters • dR*-Tree Introduced • Decreased Communication Cost – Efficient Access of Data • Distributed Data Pages • Replicated Indices on all Machines • Results • Near Linear Speedup to the number of Machines • http://www.cs.gsu.edu/~wkim/index_files/papers/fastParallel_XU.pdf

Parallel Model Based Clustering • Auto-class System • Bayesian Classification • Probability of an Instance belonging to a class • Approach • SIMD  Single Instruction Multiple Data • Divide Input into Processors • Update Parameters for Classification Locally • No Need for Load Balancing • Results • Good Scaling • After a certain threshold the communication starts to hinder the performance

Clustering By Sorting Potential Values • Main Idea • Potential Model • Derived from Gravitational Force Model in Euclidean Space • Parameters: • Gravitational Constant, • Bandwidth Distance B ( Max Distance from center of cluster ) • δ threshold distance (avoid singularity problem) • Execution • Calculate Potential at each Point • Sort Points According to the Calculated Potential • Choose Cluster Centers by iteration over sorted array • If distance between to points in array > B create new cluster • Results • Near optimal Solution • http://www.sciencedirect.com/science/article/pii/S0031320312001136

Any Questions?

Thank you for your attention! Vasilis Zois vzois@usc.edu

Enhancing Smart Grid Scalability through Data Management and Prediction Models

Enhancing Smart Grid Scalability through Data Management and Prediction Models

Presentation Transcript

Nanorobotics at USC

USC Athletes & SCholars

This Week At Allsorts 09/09/2013

P resented to Lone Star Harbor Safety Committee February 8, 2013

p resented by Joan Fitzgerald, Interim Dean

Basic Chemistry of Biodiesel Production p resented at CCURI Biofuels Workshop

p resented by: Joan Hicken January 14, 2014

Basic Chemistry of Ethanol Production p resented at CCURI Biofuels Workshop

Robotics at USC Upstate

p resented to CASE, June 15, 2013

Southeast Texas Health System P resented by Shannon Calhoun Executive Director

Vasilis Zois CS @ USC

Grand Rounds P resented by: Bobbi Bowman, SN, ODU

Children’s Center at USC

P. Zanardi USC

Vasileios Megalooikonomou

p resented by Jan Haas Institute for Immunology

Poppo and Zenger (1998 ) p resented by Yaxian Xie

P resented by

Ecological Forecasting at USC

Your Ph.D. at USC