1 / 31

Data Parallel Application Development and Performance with Windows Azure

Data Parallel Application Development and Performance with Windows Azure. Advisor : Professor Gagan Agrawal Present by : Yu Zhang . Agenda. Introduction to Windows Azure Parallel Model in Azure Implementation with Queue Implementation with WCF Experimental Evaluation

greg
Télécharger la présentation

Data Parallel Application Development and Performance with Windows Azure

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Parallel Application Development and Performance with Windows Azure Advisor : Professor Gagan Agrawal Present by : Yu Zhang

  2. Agenda • Introduction to Windows Azure • Parallel Model in Azure • Implementation with Queue • Implementation withWCF • Experimental Evaluation • Conclusion

  3. Motivation Emergency of Cloud Computing Windows Azure Amazon EC2 Google App Engine Main Target of Clouds Changing the way we provision hardware and software for on-demand capacity fulfillment. Hosting web service Interest from Scientific Community

  4. Goals Develop Data Parallel App in Azure is feasible How to develop parallel applications on Azure? What is the resulting performance? Specific Aim Simulate MPI reduce and all-reduce on Azure Build data parallel applications

  5. Introduction to Windows Azure

  6. What Is Windows Azure? • It is an operating system for the cloud • It is designed for utility computing • It has four primary features: • Write your apps (developer experience) • Host your apps (compute) • Manage your apps (service management) • Store your data (storage)

  7. Windows Azure Components

  8. A Windows Azure application is called a “service” • Definition information • Configuration information • At least one “role” • Service definition is in ServiceDefinition.csdeDefines aspects of a service that cannot be changed without redeployment • Types of roles and static role configuration • Set of configuration settings for a role • Contract with the environment code runs The Windows Azure Service Model

  9. Service configuration is in ServiceConfiguration.cscfg • Defines values for properties that can be dynamically updated for a running deployment • Values of a configuration parameter • Number of running instances The Windows Azure Service Model

  10. Definition: • Role name • Role type • VM size (e.g. small, medium, etc.) • Network endpoints • Code: • Web/Worker Role: Hosted DLL and other executables • VM Role: VHD • Configuration: • Number of instances • Number of update and fault domains The Windows Azure Service ModelRole Content

  11. Desktop And Related Azure Concepts • EXE • Application Configuration • Manifest • DLL • Windows forms library • Windows service • Local data stores • Service package • Service configuration • Service definition • Service role • Web role • Worker role • Internet data stores Desktop Windows Azure

  12. Web Role • Web Role handles request from the internet • IIS7 hosted web core • Hosts ASP.NET • XML based configuration of IIS7 • Integrated managed pipeline • Supports SSL Public Internet Web Role Load Balancer Storage Services

  13. Worker Role • No inbound network connections • Can read requests from queue in storage • or through Windows Communication Foundation Web Role Worker Role Worker Role Worker Role Worker Role Storage Service

  14. Windows Azure Storage Abstractions • Blobs – provide a simple interface for storing named files along with metadata for the file • Tables – provide structured storage. A table is a set of entities, which contain a set of properties • Queues – provide reliable storage and delivery of messages for an application

  15. Windows Azure Queues • Queue ishighly scalable, available and provide reliable message delivery • Simple, asynchronous work dispatch • A storage account can create any number of queues • 8K message size limit and default expiry of 7 days • Programming semantics ensure that a message mustbe processed at least once • Get message to make the message invisible • Delete message to remove the message

  16. Queues Tips • Messages > 8KB => Use blobs or tables to store and message contains the blob or table entity • VisibilityTimeout A queue message will reappear in VisibilityTimeOut (default 30sec) Consumers Producers P2 C1 1 2 4 3 3 2 2 1 1 C2 P1 Queue Usage Example

  17. Communicating sequential processes • Each process runs in its own local address space. • Processes exchange data and synchronize via • message passing. ( Usually, but not always, same • code executed by all processes.) • Need to take care of locality, in order to achieve • performance – message passing does this explicitly. MPI programming model

  18. Queue or WCF Azure Parallel Programming Model VMS LB • VMS Web Role IIS Worker Role • Web role hosts IIS service to accept outside request • Web role distributes workload to Worker role • Worker roles run and compute simultaneously • Communication between roles: Queue or WCF

  19. Simulation of MPI_Reduce in Azure While (True) { if (queue1.Exists()) { varmsg = queue1.GetMessage(); if (msg != null) { DoWork(); queue1.DeleteMessage(msg); } if (queue2.Exists()) { varmsg = queue2.GetMessage(); if (msg != null) { DoWork(); queue2.DeleteMessage(msg); } . . …… if (!queue1.Exists()&&(!queue2.Exists()&&(!queue3.Exists()&&……) { Break; } } Compute (); ……………….. } MPI_Reduce(inbuf, outbuf, count, type, op, root, comm) Inbuf:address of input buffer Outbuf: address of output buffer Count : number of elements in input buffer Type : datatype of input buffer elements Op : operation Root : process id of root process public class WorkerRole : RoleEntryPoint { Public override void Run() { doWork(); varmsg = new CloudQueueMessage(); queue.AddMessage(msg); }

  20. Simulation of MPI_ALLReduce in Azure While (True) { if (queue1.Exists()) { varmsg = queue1.GetMessage(); if (msg != null) { DoWork(); queue1.DeleteMessage(msg); } if (queue2.Exists()) { varmsg = queue2.GetMessage(); if (msg != null) { DoWork(); queue2.DeleteMessage(msg); } . . …… if (!queue1.Exists()&&(!queue2.Exists()&&(!queue3.Exists()&&……) { Break; } } Compute (); varmsg = new CloudQueueMessage(); queue1. AddMessage(msg); queue2. AddMessage(msg); ……………….. ……………….. } MPI_Allreduce(inbuf, outbuf, count, type, op, comm) Inbuf:address of input buffer Outbuf: address of output buffer Count : number of elements in input buffer Type : datatype of input buffer elements Op : operation public class WorkerRole : RoleEntryPoint { Public override void Run() { if (queue.Exists()) { varmsg = queue.GetMessage(); if (msg != null) { DoWork(); queue1.DeleteMessage(msg); } doWork(); varmsg = new CloudQueueMessage(); queue.AddMessage(msg); }

  21. Matrix Multiplication • Each worker role reads the data from matrix B • Decouple the matrix A into n parts, n is the number of the worker roles. • Each worker role gets one part of matrix A, for a N×N matrix, each worker role has two data sets, one is matrix B, the other is part of matrix A, say AK (1≤k≤n) n is the number of worker roles. • Each worker role computes the AK×B and add the result to its queue • Web role performs the reduce operation gets the final result. Matrix A Matrix B

  22. K Means 1. Web role calculates the initial means 2 .Broadcast the k centroids to all worker roles 3. Each worker role computes distance of each local document vector to the centroids 4. Assign points to closest centroid and compute local MSE (Mean Squared Error) 5. Perform reduction for global centroids and global MSE value 6. Web role broadcast new cnetroids to all worker role until no points move.

  23. KNN • 1. Web role be the master, the other N worker roles are slaves. • Master divides the training samples to N subsets, and distributes 1 subset for each worker role. • Each individual worker role now computes the distance measures • independently and storing the computes measures in a local array • When each worker role terminates distance calculation, it transmits a • message to the web role indicating end of processing • Web role then notes the end of processing for the sender and acquires the • computes measures by reduction. • After the web role has claimed all distance measures from all WRs, the • following steps are performed: • Select top k measures • Sort all distance measures in ascending order • Count the number of classes in the top k measures • The input element’s class will belong to the class having the higher count • among top k measures

  24. An Optimatized Solution --- WCF • What is Windows Communication Foundation (WCF)? • WCF is Microsoft’s implementation of industry standards to • provide a communication subsystem enabling applications • on one machine (process boundary) or across multiple • machines to communicate. • WCF is a core component of the .NET Framework 3.0 and • later versions which is included with Windows 7 and Vista • platforms as well as the future version of Windows Server. • The WCF API unifies ASMX Web Services, .NET Remoting, • distributed transactions and messaging into a single • programming model service orientation tenable. • Fundamental to .NET Framework. WCF

  25. Endpoint C C C B B B A A A WCF: Address, Binding, Contract WCF Services are deployed, discovered and consumed as endpoints Client Service Endpoints Message Address Binding Contract Where? How? What?

  26. WCF : Endpoint • Address An Address uniquely identifies a service. Provides the transport protocol, name of target machine (host) and port if applicable. Expressed as an explicit path or URI: [transport]://[machine][:optional port] http://localhost:8081/Service net.tcp://localhost:8082/Service • Binding • Bindings provide “canned” method regarding the • transport protocol, message encoding, communication • pattern, reliability, security policies. the WCF features required to support the design goals of the service.Some common bindings include: BasicHttpBinding NetTcpBinding WSHttpBinding • Contract All services expose a Contract. WCF uses 5 types of contracts: Service Contract – Exposes the service. Operation Contract- Exposes the service members. Data Contract – Describes service parameters. . <!-- configuration file used by above code --> <configuration xmlns="http://schemas.microsoft.com/.NetConfiguration/v2.0"> <system.serviceModel> <services> <!-- service element references the service type --> <service type="MM"> <!-- endpoint element defines the ABC's of the endpoint --> <endpoint address="http://localhost/MM/Ep1" binding="netTCPBinding" contract="IMM"/> </service> </services> </system.serviceModel> </configuration>

  27. WCF in Azure • Worker Role [ServiceContract] Public interface IService { [OperationContract] String compute(); } ServiceHostsh = new ServiceHost(typeof(IService)); //use the AddEndpoint helper method to create the ServiceEndpoint and add it to the ServiceDescription sh.AddServiceEndpoint( typeof(IService), //contract type new NetTCPbinding(), //one of the built-in bindings "http://localhost/IService/Ep1"); //the endpoint's address • Web Role • NetTcpBinding b = new NetTcpBinding(SecurityMode.None); • varfacotory= new ChannelFactory<WorkerRole.IService>(b); • var channel = facotory.CreateChannel(GetEndpoint( )); • channel.compute(); // call the service hosted on worker role maxBufferSize="10485760" maxReceivedMessageSize="10485760"

  28. From Objects to Services C&C++ with MPI Object-Oriented Polymorphism Encapsulation Subclassing 1980s Queue with Azure Component-Based Interface-based Dynamic Loading Runtime Metadata 1990s WCF with Azure Service-Oriented Message-based Schema+Contract Binding via Policy 2000s

  29. Experimental Evaluation Time (sec) Time (sec) Time (sec) Matrix Multiplication Kmeans KNN Fastest Read: 31ms Slowest Read: 203ms Fastest Write: 31ms Slowest Write: 234ms Fastest Delete: 0ms Slowest Delete: 593ms simply a reliable method of delivering messages between processes QUEUE Performance

  30. Azure VS Traditional Cluster • Hardware • Operating System The OS running on Glenn is Linux which has a lightweight kernel can make full use of hardware resources. • Programming Language C is only one level of abstraction away from machine language. C# running on the .Net framework is at a minimum 3 levels of abstraction away from assembler.

  31. Conclusion • MPI applications can harness the advantages of cloud computing • Applications running on the cloud can achieve high efficiency by simulation of MPI parallelization on Windows Azure Platform. • Introduce the different inter roles communication methods in parallel way which can be considered as a prototype of Azure MPI Library which most likely will be developed and utilized in the near future.

More Related