Program Comprehension through Dynamic Analysis Visualization, evaluation, and a survey

Program Comprehension through Dynamic AnalysisVisualization, evaluation, and a survey Bas Cornelissen (et al.) Delft University of Technology IPA Herfstdagen, Nunspeet, The Netherlands November 26, 2008 1

Context • Software maintenance • e.g., feature requests, debugging • requires understanding of the program at hand • up to 70% of effort spent on comprehension process  Support program comprehension

Definitions Program Comprehension • “A person understands a program when he or she is able to • explain the program, its structure, its behavior, its effects on its operation context, and its relationships to its application domain • in terms that are qualitatively different from the tokens used to construct the source code of the program.”

Definitions (cont’d) Dynamic analysis • The analysis of the properties of a running software system Unknown system e.g., open source • Advantages • preciseness • goal-oriented • Limitations • incompleteness • scenario-dependence • scalability issues Instrumentation e.g., using AspectJ Scenario Execution (too) much data

Outline • Literature survey • Visualization I: UML sequence diagrams • Comparing reduction techniques • Visualization II: Extravis • Current work: Human factor • Concluding remarks

Literature survey

Why a literature survey? • Numerous papers and subfields • last decade: many papers annually • Need for a broad overview • keep track of current and past developments • identify future directions • Existing surveys (4) do not suffice • scopes restricted • approaches not systematic • collective outcomes difficult to structure

Characterizing the literature • Four facets • Activity: what is being performed/contributed? • e.g., architecture reconstruction • Target: to which languages/platforms is the approach applicable? • e.g., web applications • Method: which methods are used in conducting the activity? • e.g., formal concept analysis • Evaluation: how is the approach validated? • e.g., industrial study

Attribute framework

Characterization Etc.

Attribute frequencies

Survey results • Least common activities • surveys, architecture reconstruction • Least common target systems • multithreaded, distributed, legacy, web • Least common evaluations • industrial studies, controlled experiments, comparisons

Visualization I: Sequence Diagrams

UML sequence diagrams • Goal • visualize testcase executions as sequence diagrams • provides insight in functionalities • accurate, up-to-date documentation • Method • instrument system and testsuite • execute testsuite • abstract from “irrelevant” details • visualize as sequence diagrams

Evaluation • JPacman • Small program for educational purposes • 3 KLOC • 25 classes • Task • Change requests • addition of “undo” functionality • addition of “multi-level” functionality

Evaluation (cont’d) • Checkstyle • code validation tool • 57 KLOC • 275 classes • Task • Addition of a new check • which types of checks exist? • what is the difference in terms of implementation?

Results • Sequence diagrams are easily readable • intuitive due to chronological ordering • Sequence diagrams aid in program comprehension • supports maintenance tasks • Proper reductions/abstractions are difficult • reduce 10,000 events to 100 events, but at what cost?

Results (cont’d) • Reduction techniques: issues • which one is “best”? • which are most likely to lead to significant reductions? • which are the fastest? • which actually abstract from irrelevant details?

Comparing reduction techniques

Trace reduction techniques • Input 1: large execution trace • up to millions of events • Input 2: maximum output size • e.g., 100 for visualiz. through UML sequence diagrams • Output: reduced trace • was reduction successful? • how fast was the reduction performed? • has relevant data been preserved?

Example technique Stack depth limitation [metrics-based filtering] • requires two passes determine maximum depth discard events above maximum depth determine depth frequencies Trace Trace 200,000 events • 0 28,450 • 13,902 • 58,444 • 29,933 • 10,004 • ... 42,352 events > depth 1 maximum output size (threshold) 50,000 events

How can we compare the techniques? • Use: • common context • common evaluation criteria • common test set  Ensures fair comparison

Approach • Assessment methodology • Context • Criteria • Metrics • Test set • Application • Interpretation • need for high level knowledge • reduction success rate; performance; info preservation • output size; time spent; preservation % per type • five open source systems, one industrial • apply reductions using thresholds 1,000 thru 1,000,000 • compare side-by-side

Techniques under assessment • Subsequence summarization [summarization] • Stack depth limitation [metrics-based] • Language-based filtering [filtering] • Sampling [ad hoc]

Assessment summary

Visualization II: Extravis

Extravis • Execution Trace Visualizer • joint collaboration with TU/e • Goal • program comprehension through trace visualization • trace exploration, feature location, ... • address scalability issues • millions of events  sequence diagrams not adequate

Evaluation: Cromod • Industrial system • Regulates greenhouse conditions • 51 KLOC • 145 classes • Trace • 270,000 events • Task • Analysis of fan-in/fan-out characteristics

Evaluation: Cromod (cont’d)

Evaluation: JHotDraw • Medium-size open source application • Java framework for graphics editing • 73 KLOC • 344 classes • Trace • 180,000 events • Task • feature location • i.e., relate functionality to source code or trace fragment

Evaluation: JHotDraw (cont’d)

Evaluation: Checkstyle • Medium-size open source system • code validation tool • 73 KLOC • 344 classes • Trace: 200,000 events • Task • formulate hypothesis • “typical scenario comprises four main phases” • initialization; AST construction; AST traversal; termination • validate hypothesis through trace analysis

Evaluation: Checkstyle (cont’d)

Current work: Human factor

Motivation • Need for controlled experiments in general • measure impact of (novel) visualizations • Need for empirical validation of Extravis in particular • only anecdotal evidence thus far • Measure usefulness of Extravis in • software maintenance • does runtime information from Extravis help?

Experimental design • Series of maintenance tasks • from high level to low level • e.g., overview, refactoring, detailed understanding • Experimental group • ±10 subjects • Eclipse IDE + Extravis • Control group • ±10 subjects • Eclipse IDE

Concluding remarks

Concluding remarks • Program comprehension: important subject • make software maintenance more efficient • Difficult to evaluate and compare • due to human factor • Many future directions • several of which have been addressed by this research

Want to participate in the controlled experiment..? • Prerequisites • at least two persons • knowledge of Java • (some) experience with Eclipse • no implementation knowledge of Checkstyle • two hours to spare between December 1 and 19 • Contact me: • during lunch, or • through email: s.g.m.cornelissen@tudelft.nl

Program Comprehension through Dynamic Analysis Visualization, evaluation, and a survey

Program Comprehension through Dynamic Analysis Visualization, evaluation, and a survey

Presentation Transcript

Policy Analysis and Program Evaluation

Visualization for Reading Comprehension

A Survey on Graph Visualization

Dynamic Program Analysis

AliRoot survey: Visualization

Aiding Program Comprehension by Static and Dynamic Feature Analysis

Static and Dynamic Program Analysis: Synergies and Applications

Improving Comprehension Through Visualization

A Program Evaluation

Program Comprehension

Ultrasound Visualization Pipeline A Survey

Ultrasound Visualization Pipeline A Survey

JMVA Comprehension and Analysis

Dynamic Program Analysis

Comprehension through TRANSMEDIATION

Supporting Phenotyping through Visualization and Image Analysis

Dynamic Survey

Survey Visualization

Survey Visualization

READING COMPREHENSION THROUGH ACCELERATED READER PROGRAM

Information Visualization Survey

Making a Difference Through Program Evaluation