Enhancing the Soundness and Precision of Parallel Program Analysis through Schedule Specialization
360 likes | 496 Vues
This study discusses a novel approach to analyzing parallel programs, focusing on schedule specialization to improve both soundness and precision in dynamic and static analyses. The authors present a framework that leverages a finite set of schedules for the runtime enforcement of program behaviors. This technique effectively reduces performance overhead while maintaining soundness by enforcing a small subset of schedules. The research discusses the implementation details, notable performance metrics, and presents various examples demonstrating the effectiveness of the proposed method.
Enhancing the Soundness and Precision of Parallel Program Analysis through Schedule Specialization
E N D
Presentation Transcript
Sound and Precise Analysis ofParallel Programs throughSchedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University
Motivation • Analyzing parallel programs is difficult. precision Total Schedules Dynamic Analysis Analyzed Schedules ? Static Analysis Analyzed Schedules soundness (# of analyzed schedules / # of total schedules)
Schedule Specialization • Precision: Analyze the program over a small set of schedules. • Soundness: Enforce these schedules at runtime. Total Schedules precision Dynamic Analysis Schedule Specialization Enforced Schedules Static Analysis Analyzed Schedules soundness (# of analyzed schedules / # of total schedules)
Enforcing Schedules Using Peregrine • Deterministic multithreading • e.g. DMP (ASPLOS ’09), Kendo (ASPLOS ’09), CoreDet (ASPLOS ’10), Tern (OSDI ’10), Peregrine (SOSP ’11), DTHREADS (SOSP ’11) • Performance overhead • e.g. Kendo: 16%, Tern & Peregrine: 39.1% • Peregrine • Record schedules, and reuse them on a wide range of inputs. • Represent schedules explicitly.
Schedule Specialization • Precision: Analyze the program over a small set of schedules. • Soundness: Enforce these schedules at runtime. precision Dynamic Analysis Schedule Specialization Enforced Schedules Static Analysis Analyzed Schedules soundness (# of analyzed schedules / # of total schedules)
Framework • Extract control flow and data flow enforced by a set of schedules Schedule Specialization Specialized Program Program C/C++ program with Pthread Extra def-use chains Schedule Total order of synchronizations
Outline • Example • Control-Flow Specialization • Data-Flow Specialization • Results • Conclusion
Running Example int results[p_max]; intglobal_id = 0; intmain(intargc, char *argv[]) { inti; int p = atoi(argv[1]); for (i = 0; i < p; ++i) pthread_create(&child[i], 0, worker, 0); for (i = 0; i < p; ++i) pthread_join(child[i], 0); return 0; } void *worker(void *arg) { pthread_mutex_lock(&global_id_lock); intmy_id = global_id++; pthread_mutex_unlock(&global_id_lock); results[my_id] = compute(my_id); return 0; } Thread 1 Thread 0 Thread 2 create create lock Race-free? unlock lock unlock join join 8
Control-Flow Specialization atoi intmain(intargc, char *argv[]) { inti; int p = atoi(argv[1]); for (i = 0; i < p; ++i) pthread_create(&child[i], 0, worker, 0); for (i = 0; i < p; ++i) pthread_join(child[i], 0); return 0; } i = 0 i < p ++i create i = 0 i < p ++i create join create return join join 9
Control-Flow Specialization atoi atoi intmain(intargc, char *argv[]) { inti; int p = atoi(argv[1]); for (i = 0; i < p; ++i) pthread_create(&child[i], 0, worker, 0); for (i = 0; i < p; ++i) pthread_join(child[i], 0); return 0; } i = 0 i = 0 i < p ++i i < p create create i = 0 i < p ++i create join create return join join 10
Control-Flow Specialization atoi atoi intmain(intargc, char *argv[]) { inti; int p = atoi(argv[1]); for (i = 0; i < p; ++i) pthread_create(&child[i], 0, worker, 0); for (i = 0; i < p; ++i) pthread_join(child[i], 0); return 0; } i = 0 i = 0 i < p ++i i < p create create create i = 0 ++i i < p ++i i < p create join create create return join join 11
Control-Flow Specialization atoi atoi i < p intmain(intargc, char *argv[]) { inti; int p = atoi(argv[1]); for (i = 0; i < p; ++i) pthread_create(&child[i], 0, worker, 0); for (i = 0; i < p; ++i) pthread_join(child[i], 0); return 0; } i = 0 i = 0 i = 0 i < p ++i i < p i < p create join create i = 0 ++i ++i i < p ++i i < p i < p create join create join create return ++i ++i join i < p join return 12
Control-Flow Specialized Program atoi i < p intmain(intargc, char *argv[]) { inti; int p = atoi(argv[1]); i = 0; // i < p == true pthread_create(&child[i], 0, worker.clone1, 0); ++i; // i < p == true pthread_create(&child[i], 0, worker.clone2, 0); ++i; // i < p == false i = 0; // i < p == true pthread_join(child[i], 0); ++i; // i < p == true pthread_join(child[i], 0); ++i; // i < p == false return 0; } i = 0 i = 0 i < p i < p join create ++i ++i i < p i < p create join ++i ++i i < p return
More Challenges onControl-Flow Specialization • Ambiguity Caller Callee S1 call call S2 ret • A schedule has too many synchronizations
Data-Flow Specialization Thread 1 Thread 0 Thread 2 intglobal_id= 0; void *worker.clone1(void *arg) { pthread_mutex_lock(&global_id_lock); intmy_id = global_id++; pthread_mutex_unlock(&global_id_lock); results[my_id] = compute(my_id); return 0; } void *worker.clone2(void *arg) { pthread_mutex_lock(&global_id_lock); intmy_id = global_id++; pthread_mutex_unlock(&global_id_lock); results[my_id] = compute(my_id); return 0; } global_id = 0 create create lock my_id = global_id global_id++ unlock lock my_id = global_id global_id++ unlock join join 15
Data-Flow Specialization Thread 1 Thread 0 Thread 2 intglobal_id= 0; void *worker.clone1(void *arg) { pthread_mutex_lock(&global_id_lock); intmy_id = global_id++; pthread_mutex_unlock(&global_id_lock); results[my_id] = compute(my_id); return 0; } void *worker.clone2(void *arg) { pthread_mutex_lock(&global_id_lock); intmy_id = global_id++; pthread_mutex_unlock(&global_id_lock); results[my_id] = compute(my_id); return 0; } global_id = 0 create create lock my_id = global_id global_id++ unlock lock my_id = global_id global_id++ unlock join join 16
Data-Flow Specialization Thread 1 Thread 0 Thread 2 intglobal_id= 0; void *worker.clone1(void *arg) { pthread_mutex_lock(&global_id_lock); intmy_id = global_id++; pthread_mutex_unlock(&global_id_lock); results[my_id] = compute(my_id); return 0; } void *worker.clone2(void *arg) { pthread_mutex_lock(&global_id_lock); intmy_id = global_id++; pthread_mutex_unlock(&global_id_lock); results[my_id] = compute(my_id); return 0; } global_id = 0 create create lock my_id = 0 global_id= 1 unlock lock my_id = global_id global_id++ unlock join join 17
Data-Flow Specialization Thread 1 Thread 0 Thread 2 intglobal_id= 0; void *worker.clone1(void *arg) { pthread_mutex_lock(&global_id_lock); intmy_id = global_id++; pthread_mutex_unlock(&global_id_lock); results[my_id] = compute(my_id); return 0; } void *worker.clone2(void *arg) { pthread_mutex_lock(&global_id_lock); intmy_id = global_id++; pthread_mutex_unlock(&global_id_lock); results[my_id] = compute(my_id); return 0; } global_id = 0 create create lock my_id = 0 global_id= 1 unlock lock my_id = 1 global_id = 2 unlock join join 18
Data-Flow Specialization Thread 1 Thread 0 Thread 2 intglobal_id = 0; void *worker.clone1(void *arg) { pthread_mutex_lock(&global_id_lock); global_id = 1; pthread_mutex_unlock(&global_id_lock); results[0] = compute(0); return 0; } void *worker.clone2(void *arg) { pthread_mutex_lock(&global_id_lock); global_id = 2; pthread_mutex_unlock(&global_id_lock); results[1] = compute(1); return 0; } global_id = 0 create create lock my_id = 0 global_id= 1 unlock lock my_id = 1 global_id = 2 unlock join join 19
More Challenges onData-Flow Specialization • Must/May alias analysis • global_id • Reasoning about integers • results[0] = compute(0) • results[1] = compute(1) • Many def-use chains
Evaluation • Applications • Static race detector • Alias analyzer • Path slicer • Programs • PBZip2 1.1.5 • aget 0.4.1 • 8 programs in SPLASH2 • 7 programs in PARSEC
Static Race Detector # of False Positives
Static Race Detector # of False Positives
Static Race Detector # of False Positives
Static Race Detector # of False Positives
Static Race Detector: Harmful Races Detected • 4 in aget • 2 in radix • 1 in fft
Conclusion and Future Work • Designed and implemented schedule specialization framework • Analyzes the program over a small set of schedules • Enforces these schedules at runtime • Built and evaluated three applications • Easy to use • Precise • Future work • More applications • Similar specialization ideas on sequential programs
Related Work • Program analysis for parallel programs • Chord (PLDI ’06), RADAR (PLDI ’08), FastTrack (PLDI ’09) • Slicing • Horgon (PLDI ’90), Bouncer (SOSP ’07), Jhala (PLDI ’05), Weiser (PhD thesis), Zhang (PLDI ’04) • Deterministic multithreading • DMP (ASPLOS ’09), Kendo (ASPLOS ’09), CoreDet (ASPLOS ’10), Tern (OSDI ’10), Peregrine (SOSP ’11), DTHREADS (SOSP ’11) • Program specialization • Consel (POPL ’93), Gluck (ISPL ’95), Jørgensen (POPL ’92), Nirkhe (POPL ’92), Reps (PDSPE ’96)
Handling Races • We do not assume data-race freedom. • We could if our only goal is optimization.
Input Coverage • Use runtime verification for the inputs not covered • A small set of schedules can cover a wide range of inputs