Applying Twister for Scientific Applications

Applying Twister for Scientific Applications Judy Qiu • http://salsahpc.indiana.edu • School of Informatics and Computing • Indiana University NSF Cloud PI Workshop March 17, 2011

Twister v0.9March 15, 2011 • Auto generation of partition files and configureMaps • Auto configurationto setup Twister environment automatically on a cluster • Concurrent file loading in Mapper configuration phase • and file loading balancing • Performance improvement (e.g. JVM Tuning) • Scalability New Infrastructure for Iterative MapReduce Programming SALSA Group Bingjing Zhang, Yang Ruan, Tak-Lon Wu, Judy Qiu, Adam Hughes, Geoffrey Fox, Applying Twister to Scientific Applications, Proceedings of IEEE CloudCom 2010 Conference, Indianapolis, November 30-December 3, 2010

K-Means Clustering Compute the distance to each data point from each cluster center and assign points to cluster centers map map • Iteratively refining operation • Typical MapReduce runtimes incur extremely high overheads • New maps/reducers/vertices in every iteration • File system based communication • Long running tasks and faster communication in Twister enables it to perform close to MPI Compute new cluster centers reduce Time for 20 iterations Compute new cluster centers User program

Motivation Data Deluge MapReduce Classic Parallel Runtimes (MPI) Input map Data Centered, QoS Efficient and Proven techniques iterations Experiencing in many domains Input Input map map Output Pij reduce reduce Expand the Applicability of MapReduce to more classes of Applications Iterative MapReduce More Extensions Map-Only MapReduce

A Programming Model for Iterative MapReduce Static data Configure() • Distributed data access • In-memory MapReduce • Distinction on static data and variable data (data flow vs. δ flow) • Cacheable map/reduce tasks (long running tasks) • Combine operation • Support fast intermediate data transfers Iterate User Program δ flow Map(Key, Value) Reduce (Key, List<Value>) Close() Combine (Map<Key,Value>)

Cloud Technologies and Their Applications Data Mining Services in the Cloud Smith Waterman Dissimilarities, PhyloD Using DryadLINQ, Clustering, Multidimensional Scaling, Generative Topological Mapping, etc SaaSApplications/Workflow Apache PigLatin/Microsoft DryadLINQ/Google Sawzall Higher Level Languages Apache Hadoop / Twister Microsoft Dryad / Twister Cloud Platform Nimbus, Eucalyptus, OpenStack, OpenNebula Cloud Infrastructure Linux Virtual Machines Linux Virtual Machines Windows Virtual Machines Windows Virtual Machines Hypervisor/Virtualization Xen, KVM Bare-metal Nodes Hardware

MPI & Iterative MapReduce papers • MapReduce on MPI TorstenHoefler, Andrew Lumsdaine and Jack Dongarra, Towards Efficient MapReduce Using MPI, Recent Advances in Parallel Virtual Machine and Message Passing Interface Lecture Notes in Computer Science, 2009, Volume 5759/2009, 240-249 • MPI with generalized MapReduce • Jaliya Ekanayake, Hui Li, Bingjing Zhang, Thilina Gunarathne, Seung-Hee Bae, Judy Qiu, Geoffrey Fox Twister: A Runtime for Iterative MapReduce, Proceedings of the First International Workshop on MapReduce and its Applications of ACM HPDC 2010 conference, Chicago, Illinois, June 20-25, 2010 • GrzegorzMalewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, NatyLeiser, and GrzegorzCzajkowskiPregel: A System for Large-Scale Graph Processing, Proceedings of the 2010 international conference on Management of data Indianapolis, Indiana, USA Pages: 135-146 2010 • Yingyi Bu, Bill Howe, Magdalena Balazinska, Michael D. ErnstHaLoop: Efficient Iterative Data Processing on Large Clusters, Proceedings of the VLDB Endowment, Vol. 3, No. 1, The 36th International Conference on Very Large Data Bases, September 1317, 2010, Singapore. • MateiZaharia, MosharafChowdhury, Michael J. Franklin, Scott Shenker, Ion StoicaSpark: Cluster Computing with Working Sets poster at http://radlab.cs.berkeley.edu/w/upload/9/9c/Spark-retreat-poster-s10.pdf • Russel Power, Jinyang Li, Piccolo: Building Fast, Distributed Programs with Partitioned Tables, OSDI 2010, Vancouver, BC, Canada

Features of Existing Architectures • Google, Apache Hadoop, Sector/Sphere, • Dryad/DryadLINQ (DAG based) • Programming Model (SPMD) • MapReduce (Optionally “map-only”) • Focus on Single Step MapReduce computations (DryadLINQ supports more than one stage) • Input and Output Handling • Distributed data access (HDFS in Hadoop, Sector in Sphere, and shared directories in Dryad) • Outputs normally goes to the distributed file systems • Intermediate data • Transferred via file systems (Local disk-> HTTP -> local disk in Hadoop) • Easy to support fault tolerance • Considerably high latencies • Fault Tolerance

Twister Architecture Main program Broker Network B TwisterDriver B B B 4 Broker Connection Receive static data (1) ORVariable data (key,value) via the brokers (2) Twister Daemon Twister Daemon 2 1 R 3 Reduce output goes to local disk OR to Combiner M Map output goes directly to reducer Read static data from local disk 4 1 Local Local Scripts for file manipulations Twister daemon is a process, but Map/Reduce tasks are Java Threads (Hybrid approach)

Twister Programming Model runMapReduce(..) Iterations Worker Nodes configureMaps(..) Local Disk configureReduce(..) Cacheable map/reduce tasks while(condition){ May send <Key,Value> pairs directly Map() Reduce() Combine() operation Communications/data transfers via the pub-sub broker network updateCondition() Two configuration options : Using local disks (only for maps) Using pub-sub bus } //end while close() User program’s process space

Twister API configureMaps(PartitionFilepartitionFile) configureMaps(Value[] values) configureReduce(Value[] values) runMapReduce() runMapReduce(KeyValue[] keyValues) runMapReduceBCast(Value value) map(MapOutputCollector collector, Key key, Value val) reduce(ReduceOutputCollector collector, Key key,List<Value> values) combine(Map<Key, Value> keyValues)

Pagerank – An Iterative MapReduce Algorithm Partial Adjacency Matrix Current Page ranks (Compressed) M Partial Updates R Partially merged Updates C Iterations [1] Pagerank Algorithm, http://en.wikipedia.org/wiki/PageRank [2] ClueWeb09 Data Set, http://boston.lti.cs.cmu.edu/Data/clueweb09/ Well-known pagerank algorithm [1] Used ClueWeb09 [2] (1TB in size) from CMU Reuse of map tasks and faster communication pays off

Overhead OpenMPI v Twisternegative overhead due to cache http://futuregrid.org

Dimension Reduction Algorithms • Multidimensional Scaling (MDS) [1] • Given the proximity information among points. • Optimization problem to find mapping in target dimension of the given data based on pairwise proximity information while minimize the objective function. • Objective functions: STRESS (1) or SSTRESS (2) • Only needs pairwise distances ijbetween original points (typically not Euclidean) • dij(X) is Euclidean distance between mapped (3D) points • Generative Topographic Mapping (GTM) [2] • Find optimal K-representations for the given data (in 3D), known as K-cluster problem (NP-hard) • Original algorithm use EM method for optimization • Deterministic Annealing algorithm can be used for finding a global solution • Objective functions is to maximize log-likelihood: [1] I. Borg and P. J. Groenen. Modern Multidimensional Scaling: Theory and Applications. Springer, New York, NY, U.S.A., 2005. [2] C. Bishop, M. Svens´en, and C. Williams. GTM: The generative topographic mapping. Neural computation, 10(1):215–234, 1998.

Multi-dimensional Scaling (EM) While(condition) { <X> = [A] [B] <C> C = CalcStress(<X>) } While(condition) { <T> = MapReduce1([B],<C>) <X> = MapReduce2([A],<T>) C = MapReduce3(<X>) } [1] J. de Leeuw, "Applications of convex analysis to multidimensional scaling," Recent Developments in Statistics, pp. 133-145, 1977. • Maps high dimensional data to lower dimensions (typically 2D or 3D) • SMACOF (Scaling by Majorizing of COmplicated Function)[1]

Next Generation Sequencing Pipeline on Cloud MapReduce Pairwise clustering MDS Blast Visualization Plotviz Visualization Plotviz Pairwise Distance Calculation Dissimilarity Matrix N(N-1)/2 values block Pairings Clustering FASTA FileN Sequences 1 2 3 4 • Users submit their jobs to the pipeline and the results will be shown in a visualization tool. • This chart illustrate a hybrid model with MapReduce and MPI. Twister will be an unified solution for the pipeline mode. • The components are services and so is the whole pipeline. • We could research on which stages of pipeline services are suitable for private or commercial Clouds. MPI 5 4

Scale-up Sequence Clustering Model with Twister Gene Sequences (N = 1 Million) O(N2) Pairwise Alignment & Distance Calculation Select Reference Reference Sequence Set (M = 100K) Distance Matrix Reference Coordinates Interpolative MDS with Pairwise Distance Calculation N - M Sequence Set (900K) Multi-Dimensional Scaling (MDS) x, y, z O(N2) O(N2) 3D Plot x, y, z Visualization N - M Coordinates

Current Sequence Clustering Model with MPI * MPI.NET Implementation Smith-Waterman / Needleman-Wunsch with Kimura2 / Jukes-Cantor / Percent-Identity C# Desktop Application based on VTK Pairwise Clustering Cluster Indices Pairwise Alignment & Distance Calculation 3D Plot Gene Sequences Visualization Coordinates Distance Matrix Multi-Dimensional Scaling Chi-Square / Deterministic Annealing MPI.NET Implementation MPI.NET Implementation * Note. The implementations of Smith-Waterman and Needleman-Wunsch algorithms are from Microsoft Biology Foundation library

Twister MDS Interpolation Performance Test

Johns Hopkins Notre Dame Iowa Penn State University of Florida Michigan State San Diego Supercomputer Center Univ.Illinois at Chicago Washington University University of Minnesota University of Texas at El Paso University of California at Los Angeles IBM Almaden Research Center 300+ Students learning about Twister & Hadoop MapReduce technologies, supported by FutureGrid. July 26-30, 2010 NCSA Summer School Workshop http://salsahpc.indiana.edu/tutorial Indiana University University of Arkansas

http://salsahpc.indiana.edu/b534/

Summary • MapReduce and MPI are SPMD programming model • Twister extends the MapReduce to iterative algorithms • Dataming in the Cloud (Data Analysis in the Cloud) • Several iterative algorithms we have implemented • K-Means Clustering • Pagerank • Matrix Multiplication • Breadth First Search • Multi Dimensional Scaling (MDS) • Integrating a distributed file system • Integrating with a high performance messaging system • Programming with side effects yet support fault tolerance

MapReduceRoles4Azure Several iterative algorithms we have implemented Will have prototype Twister4Azure by May 2011

Twister for Azure Scheduling Queue Worker Role Map Workers Job Bulleting Board Map 2 Map n Map 1 Reduce Workers Red 2 Red n Red 1 In Memory Data Cache Map Task Table Task Monitoring Role Monitoring

Sequence Assembly Performance

Acknowledgements to: SALSAHPC Group Indiana University http://salsahpc.indiana.edu

Applying Twister for Scientific Applications