Download
slide1 n.
Skip this Video
Loading SlideShow in 5 Seconds..
Partha Dasgupta Arizona State University The MILAN Project PowerPoint Presentation
Download Presentation
Partha Dasgupta Arizona State University The MILAN Project

Partha Dasgupta Arizona State University The MILAN Project

75 Vues Download Presentation
Télécharger la présentation

Partha Dasgupta Arizona State University The MILAN Project

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Parallel Processing with Windows NT Networks • Partha DasguptaArizona State University • The MILAN Project • New York University Arizona State University • Funding Sources: DARPA/Rome Laboratory, NSF, Intel, and Microsoft Collborators: Zvi M. Kedem Donald McLaughlin Shantanu Sardesai Rahul Thombre

  2. ECLIPSE CalypsoLinux 1.0 Malaxis Chime + Joint Research ofArizona State University and New York University

  3. The Platforms • Calypso • Language independent parallel processing • Shared memory and fault tolerance. • Chime • CC++ based parallel processing • Shared memory, fault tolerance • Malaxis • DSM package for Windows NT • Read/write locking, barriers • Milan • A metacomputing platform • Coalesces features from the above systems to a general purpose computing platform

  4. Unix to Windows NT • Port a system program or middleware from Unix to Windows NT. How? • Just change the system calls? • Does not work. • Change programming and design styles to NT-centric: • no signals in NT • use structured event handling (no such thing in Unix) • use threads (useful) • integrate with windows messages or MFC • remote execution support is weak • Learn NT-centrism, and NT lingo

  5. NT Terminology • MSDN is not a network • Developer’s library contains books • Resource Kit is not about resources • Huh? • SDK, DDK, checked build • Service Pack • OSR2 • Remote access does not let you execute anything remotely • Use a Share? • You mean remote mount? No, I mean map network drive • Memory can be reserved or committed or both. • Synchronization primitives - never mind...

  6. What is • Yet another parallel processing system, which runs on a distributed network of microcomputers: • Shared Memory • Novel execution and memory management strategy • Fault Tolerant: • Machines may stop and start dynamically without affecting the execution • Automatic Load Balancing: • Manages slow and fast machines Provides near optimal thread assignments (measured) • Execution strategy hidden from programmer: • No message passing, process management, data partitioning • Low-overhead mechanisms

  7. manager Eager Scheduling TIES worker worker Key Techniques in Calypso • Eager Scheduling • Manager - worker architecture • Provides fault-tolerant and load-shared executions with minimal overhead • Two-phase Idempotent Execution Strategy • Distributed memory management strategy • Stops side effects due to failures • Ensures idempotence of results, in spite of duplicate executions • These techniques developed in previous joint theoretical research

  8. A 9 1 3 5 8 10 12 B 2 6 11 9 Worker interrupted C 4 7 9 Worker crashed time 50 100 150 200 250 300 350 400 Eager Scheduling • Workers contact the manager for work after finishing previous assignment, if any • When there is unfinished work, the manager has the option of assigning an unfinished thread to a “willing” worker regardless of who is already working on that thread • An example of Round Robin Eager Scheduling: • 3 machines: fast, slow and transient • 12 threads of equal length (50 secs)

  9. Chime • Chime is a programming system and runtime environment for parallel processing • The first system to incorporate standard parallel language support on a network of workstations: • Nested Parallelism, Parallel statements • Language-defined scoping of variables • Synchronization support • Transparent shared memory • Chime supports the “shared memory” constructs of CC++ • Adds fault tolerance…. • Adds load balancing…. …. with low overhead A “distributed” cactus stack

  10. Chime Software Architecture

  11. Chime Execution Trace

  12. Malaxis • A DSM Package • Uses NT threads and memory mapping and protection features • Uses barrier synchronization, memory XOR-ing and intelligent monitoring of page/lock requests to prevent page shuttling • Programmer support: • Spawning processes on remote machines • Mapping shared segments • Barrier Synchronization • Read and Write locks (abstract, advisory)

  13. Milan • A metacomputing platform • Creates a system image of a large computer on a set of workstations • Smart scheduling • bunching • job recall • pre-emption • Shared memory • Fault tolerant

  14. Using Windows NT • The needs of our implementations: • User Level page fault handling • Getting and setting thread contexts • Getting and setting stack contents • Asynchronous notification and exception handling • Networking support • Process/Thread control • Windows NT provides all of the above

  15. Memory Handling • Windows NT memory handling is elegant and powerful (After you understand the terminology) • States of memory: • committed • reserved • guarded • Protection and allocation is done by: • VirtualAlloc • VirtualProtect • Access violations generate exceptions • Needed reprogramming Calypso - for the better

  16. Exception Handling • All exceptions are delivered to an exception handler, defined in the current scope of execution. • Great, for programmers - nice and structured • Not good for middleware solutions…. • How can I execute another persons code, with my exception handlers? • I cannot change the exception handler, from within my exception handler. • In our case, we found reasonable workarounds - but don’t have general solutions to the above problems.

  17. Threads • Good, consistent, kernel threads. • Easy to use • works great • plethora of synchronization constructs (too many, in fact) • Threads are useful for: • Threads inside middleware - wow! • Handling distributed shared memory (callbacks, caching, memory service) • Process migration - a thread can set up the main process • Segregating functionality (assign a thread per job)

  18. Process and Stack Migration • Migration is used by our system for several purposes: • Cactus stacks • Checkpointing • Pre-emptive scheduling (produces better turnaround times in dynamic environments) • When a thread has to be migrated: • Another thread suspends it and gets its context • The context is a checkpoint • The context is sent to the target machine • A thread sets the context of a suspended thread with the new context and resumes it. Stack has to be reset too. • IT WORKS

  19. Other Features • Networking • winsock is like sockets, no surprises • Remote execution • our approach: Use a daemon process • NT approach: use a starter service • Execution Monitor (GUI) • External process, that controls and displays state of the distributed computation

  20. Performance • Program: Ray Trace, generates a nice picture • Equipment: Pentium-90, running Windows NT (Calypso tests)Pentium Pro 200, running Windows NT (Chime tests) • Tests conducted • Speedup • Speedup in case of mixed speed machines • Speedup in case of crashing and recovering machines • Micro-tests (migration, stack creation) • Not all tests will be shown now.

  21. Calypso Performance Performance is comparable to Unix systems

  22. Chime Performance Chime has higher network overhead than Calypso

  23. In Retrospect • NT has some strong points, things that are better than Unix • Threads • Exception Handling • Memory Management • Program development tools • (very good, especially the debugger) • Documentation • A few shortcomings • no signals • no remote execution facility • terrible terminology

  24. Status • Operational prototype systems • Calypso on Windows NT / Windows 95 released • A prototype of Chime implementing most of the “parallel part” of Compositional C++ on an unreliable network of workstations • Ongoing research • Distributed scheduling and resource management (for MILAN) • Quality of service • Better integration with NT (MFC support, remote services, global scheduling…)

  25. Acknowledgements • Co-PI • Zvi M. Kedem • Calypso • Arash Baratloo, Mehmet Karaul • Calypso NT • Donald McLaughlin and Shantanu Sardesai • Chime • Shantanu Sardesai • Calypso Linux • Arash Baratloo

  26. Questions ?

  27. done?

  28. Done? Review request for SP&E