Program Comprehension through Dynamic Analysis Visualization, evaluation, and a survey
400 likes | 626 Vues
Program Comprehension through Dynamic Analysis Visualization, evaluation, and a survey. Bas Cornelissen (et al.) Delft University of Technology IPA Herfstdagen, Nunspeet, The Netherlands November 26, 2008. 1. Context. Software maintenance e.g., feature requests, debugging
Program Comprehension through Dynamic Analysis Visualization, evaluation, and a survey
E N D
Presentation Transcript
Program Comprehension through Dynamic AnalysisVisualization, evaluation, and a survey Bas Cornelissen (et al.) Delft University of Technology IPA Herfstdagen, Nunspeet, The Netherlands November 26, 2008 1
Context • Software maintenance • e.g., feature requests, debugging • requires understanding of the program at hand • up to 70% of effort spent on comprehension process Support program comprehension
Definitions Program Comprehension • “A person understands a program when he or she is able to • explain the program, its structure, its behavior, its effects on its operation context, and its relationships to its application domain • in terms that are qualitatively different from the tokens used to construct the source code of the program.”
Definitions (cont’d) Dynamic analysis • The analysis of the properties of a running software system Unknown system e.g., open source • Advantages • preciseness • goal-oriented • Limitations • incompleteness • scenario-dependence • scalability issues Instrumentation e.g., using AspectJ Scenario Execution (too) much data
Outline • Literature survey • Visualization I: UML sequence diagrams • Comparing reduction techniques • Visualization II: Extravis • Current work: Human factor • Concluding remarks
Why a literature survey? • Numerous papers and subfields • last decade: many papers annually • Need for a broad overview • keep track of current and past developments • identify future directions • Existing surveys (4) do not suffice • scopes restricted • approaches not systematic • collective outcomes difficult to structure
Characterizing the literature • Four facets • Activity: what is being performed/contributed? • e.g., architecture reconstruction • Target: to which languages/platforms is the approach applicable? • e.g., web applications • Method: which methods are used in conducting the activity? • e.g., formal concept analysis • Evaluation: how is the approach validated? • e.g., industrial study
Characterization Etc.
Survey results • Least common activities • surveys, architecture reconstruction • Least common target systems • multithreaded, distributed, legacy, web • Least common evaluations • industrial studies, controlled experiments, comparisons
UML sequence diagrams • Goal • visualize testcase executions as sequence diagrams • provides insight in functionalities • accurate, up-to-date documentation • Method • instrument system and testsuite • execute testsuite • abstract from “irrelevant” details • visualize as sequence diagrams
Evaluation • JPacman • Small program for educational purposes • 3 KLOC • 25 classes • Task • Change requests • addition of “undo” functionality • addition of “multi-level” functionality
Evaluation (cont’d) • Checkstyle • code validation tool • 57 KLOC • 275 classes • Task • Addition of a new check • which types of checks exist? • what is the difference in terms of implementation?
Results • Sequence diagrams are easily readable • intuitive due to chronological ordering • Sequence diagrams aid in program comprehension • supports maintenance tasks • Proper reductions/abstractions are difficult • reduce 10,000 events to 100 events, but at what cost?
Results (cont’d) • Reduction techniques: issues • which one is “best”? • which are most likely to lead to significant reductions? • which are the fastest? • which actually abstract from irrelevant details?
Trace reduction techniques • Input 1: large execution trace • up to millions of events • Input 2: maximum output size • e.g., 100 for visualiz. through UML sequence diagrams • Output: reduced trace • was reduction successful? • how fast was the reduction performed? • has relevant data been preserved?
Example technique Stack depth limitation [metrics-based filtering] • requires two passes determine maximum depth discard events above maximum depth determine depth frequencies Trace Trace 200,000 events • 0 28,450 • 13,902 • 58,444 • 29,933 • 10,004 • ... 42,352 events > depth 1 maximum output size (threshold) 50,000 events
How can we compare the techniques? • Use: • common context • common evaluation criteria • common test set Ensures fair comparison
Approach • Assessment methodology • Context • Criteria • Metrics • Test set • Application • Interpretation • need for high level knowledge • reduction success rate; performance; info preservation • output size; time spent; preservation % per type • five open source systems, one industrial • apply reductions using thresholds 1,000 thru 1,000,000 • compare side-by-side
Techniques under assessment • Subsequence summarization [summarization] • Stack depth limitation [metrics-based] • Language-based filtering [filtering] • Sampling [ad hoc]
Extravis • Execution Trace Visualizer • joint collaboration with TU/e • Goal • program comprehension through trace visualization • trace exploration, feature location, ... • address scalability issues • millions of events sequence diagrams not adequate
Evaluation: Cromod • Industrial system • Regulates greenhouse conditions • 51 KLOC • 145 classes • Trace • 270,000 events • Task • Analysis of fan-in/fan-out characteristics
Evaluation: JHotDraw • Medium-size open source application • Java framework for graphics editing • 73 KLOC • 344 classes • Trace • 180,000 events • Task • feature location • i.e., relate functionality to source code or trace fragment
Evaluation: Checkstyle • Medium-size open source system • code validation tool • 73 KLOC • 344 classes • Trace: 200,000 events • Task • formulate hypothesis • “typical scenario comprises four main phases” • initialization; AST construction; AST traversal; termination • validate hypothesis through trace analysis
Motivation • Need for controlled experiments in general • measure impact of (novel) visualizations • Need for empirical validation of Extravis in particular • only anecdotal evidence thus far • Measure usefulness of Extravis in • software maintenance • does runtime information from Extravis help?
Experimental design • Series of maintenance tasks • from high level to low level • e.g., overview, refactoring, detailed understanding • Experimental group • ±10 subjects • Eclipse IDE + Extravis • Control group • ±10 subjects • Eclipse IDE
Concluding remarks • Program comprehension: important subject • make software maintenance more efficient • Difficult to evaluate and compare • due to human factor • Many future directions • several of which have been addressed by this research
Want to participate in the controlled experiment..? • Prerequisites • at least two persons • knowledge of Java • (some) experience with Eclipse • no implementation knowledge of Checkstyle • two hours to spare between December 1 and 19 • Contact me: • during lunch, or • through email: s.g.m.cornelissen@tudelft.nl