caller callee data representation

A S time B The Parallel Remote Method Invocation in Multi-threaded Environment Keming Zhang, Kostadin Damevski, Steve Parker SCI Institute, University of Utah PRMI Design Introduction MPI and Threads • Identifying a set of threads: • Each thread maintains a CID(PID) pair, where CID identifies a set of caller (or callee) threads, and PID identifies a PRMI. • Application user signals the initial threads to create a new unique initial CID and set PID=CID+1. • When a PRMI is requested, set that PRMI’s CID=caller’s PID and PID=CID+1 • When a PRMI returns, the caller update its PID to the callee’s PID One approach to build parallel component architecture is through a Parallel Remote Method Invocation (PRMI). A PRMI is an extension of Java Remote Method Invocation (RMI). A group of processes (parallel caller) can collectively invoke an interface at another group of processes (parallel callee) through PRMI. All processes share the same interface, but the arguments can be distributed over all processes arbitrarily. PRMI design needs specify how M callers invoke N callees and how distributed data can be efficiently transmitted between M callers and N callees. In a multi-threaded environment, PRMI design becomes more challenging because synchronization becomes more complex and most MPI implementation are not thread-safe. In this work, a comparison of RMI and PRMI is made, issues and challenges of PRMI design are described and possible solutions and a PRMI design is presented. • One round trip PRMI: • Callers calculate arguments’ redistribution schedule and invocation schedule. Callers send their argument representations, invocation schedule and arguments to callees. 0(1,6) 2. Upon receiving all data from all callers, each callee starts its method. After the method completes, each callee calculates the output (including the return value) arguments’ redistribution schedule. Then it sends the the output back to the relevant callers. 6 (7) 1 (2,5,6) 2(3,4,5) 5(6) RMI & PRMI 3(4) 4(5) • RMI is neat: • All arguments are packaged and sent once. • All return (including output) data are packaged and sent once. • One single flag can indicate if an invocation is successful. • A unique MPI communicator is created for each PRMI thread set. • An MPI lock is used separate different collective MPI calls from different thread sets, if the MPI implementation is not threadsafe. Caller Callee PRMI: data redistribution Distributed data type: multiple dimensional array. Both callers and callees see the distributed array as global array. And they use an array representation scheme to describe the which part of the global array resides on which caller or callee. Conclusions • PRMI Design Issues • How do M callers invoke N callees collectively? • How arguments are distributed and redistributed? • How to resolve inconsistent invocation ordering when multiple PRMIs are allowed? • How to support non-threadsafe MPI? ? This work provides an approach of parallel Common Component Architecture. The approach is based on the Parallel Remote Method Invocation. The PRMI hides most parallelism, synchronization, thus provides a conventional, convenient and efficient way for building high performance applications. global array CalleeGroup Caller Group Caller Array Representation Callee Array Representation When the distributed array is passed between the caller and the callees, the transmission schedule is calculated based on their representations. Then the array are sent directly from callers to the corresponding callees (or reverse) in parallel, avoiding any bottlenecks. References Parallel Proxy R. ARMSTRONG, D. GANNON, A. GEIST, K. KEAHEY, S. KEAHEY, S. KOHN, l. MCINNES, S. PARKER, and B. SMOLINSKI, Toward a common component architecture for high-performance scientific computing. In Proceedings of the 8th IEEE International Symposium on High Performance Distributed Computing, 1999 K. ZHANG, K. DAMEVSKI, V. VENKATACHALAPATHY and S. PARKER. SCIRun2: a CCA framework for high performance computing. In Proceedings of the 9th International Workshop on High-Level Parallel Programming Models and Supportive Environments, 2004 K. DAMEVSKI and S. PARKER.Parallel Remote Method Invocation and m-by-n data redistribution.In Proceedings of the 4th Los Alamos Computer Science Institute Symposium, 2003 S. PARKER, The SCIRun problem solving environment and computational steering software system. PhD thesis, University of Utah, 1999 F. BERTRAND, R. BRAMELY, K. DAMEVSKI, D. BERNHOLDT, J. KOHL, J. LARSON, A. SUSSMAN. Data Redistribution and Remote Method Invocation in Parallel Component Architectures, In Proceedings of The 19th International Parallel and Distributed Processing Symposium, 2005 A parallel proxy consists of a set of callee URLs, and it also stores the arguments’ representations at the callee side. Efficient Array Redistribution Invocation ordering 0 When multiple parallel invocations at the same callee simultaneously, the invocation ordering may become inconsistent. 0 Acknowledgments 1 2 A centralized server (e.g. first node) maintains the order The ordering is not enforced if not necessary. This work was supported by DOE Center for Component Technology for Tera Scale Simulation Software (CCTTSS) and NSF (ACI 0113829) Data Parallel Component Software. caller callee data representation kzhang@cs.utah.edu

caller callee data representation

caller callee data representation

Presentation Transcript

Data Representation

Data Representation

Data Representation

Data Representation

Data Representation

Data representation

Data Representation

Data Representation

Data Representation

Data Representation

Data Representation

Data Representation

DATA REPRESENTATION

DATA REPRESENTATION

Data Representation

DATA REPRESENTATION

Caller 1 Caller 2 Caller 3 Caller 4 Caller 5 Caller 6 Caller 7 Caller 8 Caller 9 Caller 10

DATA REPRESENTATION

Data Representation