780 likes | 851 Vues
An Empirical Assessment of the Crosscutting Concern Problem. Marc Eaddy Department of Computer Science Columbia University. Motivation. Maintenance dominates software costs. Other Development. 50–90% of total software cost. 3–4 x Development Costs. Maintenance. Motivation.
E N D
An Empirical Assessment of the Crosscutting Concern Problem Marc Eaddy Department of Computer Science Columbia University
Motivation • Maintenance dominates software costs Other Development 50–90% of total software cost 3–4 x Development Costs Maintenance
Motivation • >50% of maintenance time spent understanding the program
Motivation • >50% of maintenance time spent understanding the program • Where are the features,reqs, etc. in the code? Reqs Code
Motivation • >50% of maintenance time spent understanding the program • Where are the features,reqs, etc. in the code? • What is this code for?
Motivation • >50% of maintenance time spent understanding the program • Where are the features,reqs, etc. in the code? • What is this code for? • Why is it hard to understand and changethe program?
Main Contributions ConcernTagger Cerberus • Improved state of theart of concern location • Innovative metricsand experimentalmethodology • Evidence of the dangersof crosscutting concerns PDA
Improving concern location • Statement Annotations for Fine-Grained Advising • ECOOP Workshop on Reflection, AOP, and Meta-Data for Software Evolution (2006) • Eaddy and Aho • Demo: Wicca 2.0 - Dynamic Weaving using the .NET 2.0 Debugging APIs • Aspect-Oriented Software Development (2007) • Eaddy • Identifying, Assigning, and Quantifying Crosscutting Concerns • ICSE Workshop on Assessment of Contemporary Modularization Techniques (2007) • Eaddy, Aho, and Murphy • Cerberus: Tracing Requirements to Source Code Using Information Retrieval, Dynamic Analysis, and Program Analysis • IEEE International Conference on Program Comprehension (2008) • Eaddy, Aho, Antoniol, and Guéhéneuc
Innovative metrics & methodology • Towards Assessing the Impact of Crosscutting Concerns on Modularity • AOSD Workshop on Assessment of Aspect Techniques (2007) • Eaddy and Aho • Do Crosscutting Concerns Cause Defects? • IEEE Transactions on Software Engineering (2008) • Eaddy, Zimmerman, Sherwood, Garg, Murphy, Nagappan, and Aho
Dangers of crosscutting • Do Crosscutting Concerns Cause Defects? • IEEE Transactions on Software Engineering (2008) • Eaddy, Zimmerman, Sherwood, Garg, Murphy, Nagappan, and Aho
Roadmap ConcernTagger Cerberus • Improved state of theart of concern location • Innovative metricsand experimentalmethodology • Evidence of the dangersof crosscutting concerns PDA
What is a “concern?” Anything that affects the implementation of a program • Feature, requirement, design pattern, code idiom, etc. • Raison d'être for code • Every line of code exists to satisfy some concern • Existing definitions are poor • Concern domain must be “well-defined set”
Concern location problem • Concern–code relationship hard to obtain Program Elements Concerns
Concern location problem • Concern–code relationship hard to obtain • Concern–code relationship undocumented Program Elements Concerns ?
Concern location problem • Concern–code relationship hard to obtain • Concern–code relationship undocumented • Reverse engineer the relationship Program Elements Concerns
Manual concern location • Concern–code relationship determined by a human • Existing techniques too subjective • Inaccurate, unreliable • Ideal • Code affected when concern is changed • My insight • Prune dependency rule [ACOM’07] • Code affected when concern is pruned (removed) • i.e., software pruning • Practical approximation
Prune dependency rule • Code is prune dependenton concern if • Concern pruned code removed or altered • Distinguish between removing and altering code • Easily determine change impact of removing code • Code dependent on removed code must be altered (to prevent compile errors) • Easy for human to approximate
Manual concern location • Concern–code relationship determined by a human • Existing tools impractical for analyzing all concerns of a real system • Many concerns (>100) • Many concern–code links (>10K) • Hierarchical concerns • My solution: ConcernTagger [TSE’08]
Automated concern location • Concern–code relationship predicted by an “expert” • Experts look for clues in docs and code • Existing techniques only consult 1 or 2 experts • My solution: Cerberus [ICPC’08] • Information retrieval • Execution tracing • Prune dependency analysis
IR-based concern location • i.e., Google for code • Program entities are documents • Requirements are queries Requirement “Array.join” SourceCode join Id_join js_join()
Vector space model [Salton] • Parse code and reqs doc to extract term vectors • NativeArray.js_join()method “native,” “array,” “join” • “Array.join”requirement “array,” “join” • My contributions • Expand abbreviations • numconns number, connections, numberconnections • Index fields • Weigh terms (tf · idf) • Term frequency (tf) • Inverse document frequency (idf) • Similarity = cosine distance between document and query vectors
Tracing-based concern location • Observe elements activated when concern is exercised • Unit tests for each concern • e.g., find elements uniquely activated by a concern
Tracing-based concern location • Observe elements activated when concern is exercised • Unit tests for each concern • e.g., find elements uniquely activated by a concern Unit Test for “Array.join” Call Graph var a = new Array(1, 2); if (a.join(',') == "1,2"){ print "Test passed"; } else { print "Test failed"; } js_join js_construct
Tracing-based concern location • Observe elements activated when concern is exercised • Unit tests for each concern • e.g., find elements uniquely activated by a concern Unit Test for “Array.join” Call Graph var a = new Array(1, 2); if (a.join(',') == "1,2"){ print "Test passed"; } else { print "Test failed"; } js_join js_construct
Tracing-based concern location • Elements often activated by multiple concerns • What is “information content” of element activation? • Element Frequency–Inverse ConcernFrequency [ICPC’08]
Prune dependency analysis • Infer relevant elements based on structural relationship to relevant element e (seed) • Assumes we already have some seeds • Prune dependency analysis[ICPC’08] • Automates prune dependency rule[ACOM’07] • Find references to e • Find superclasses and subclasses of e
PDA example Program Dependency Graph Source Code inherits interface A { public void foo(); } public class B implements A { public void foo() { ... } public void bar() { ... } } public class C { public static void main() { B b = new B(); b.bar(); } A C B refs contains contains contains contains calls bar foo foo main
PDA example Program Dependency Graph Source Code inherits interface A { public void foo(); } public class B implements A { public void foo() { ... } public void bar() { ... } } public class C { public static void main() { B b = new B(); b.bar(); } A C B refs contains contains contains contains calls bar foo foo main
PDA example Program Dependency Graph Source Code inherits interface A { public void foo(); } public class B implements A { public void foo() { ... } public void bar() { ... } } public class C { public static void main() { B b = new B(); b.bar(); } A C B refs contains contains contains contains calls bar foo foo main
PDA example Program Dependency Graph Source Code inherits interface A { public void foo(); } public class B implements A { public void foo() { ... } public void bar() { ... } } public class C { public static void main() { B b = new B(); b.bar(); } A C B refs contains contains contains contains calls bar foo foo main
PDA example inherits Program Dependency Graph Source Code interface A { public void foo(); } public class B implements A { public void foo() { ... } public void bar() { ... } } public class C { public static void main() { B b = new B(); b.bar(); } A C B refs contains contains contains contains calls bar foo foo main
Cerberus effectiveness Cerberus Cerberus Most effective PDA improves IR by 155% PDA Improves Tracing by 104%
Roadmap ConcernTagger Cerberus • Improved state of theart of concern location • Innovative metricsand experimentalmethodology • Evidence of the dangersof crosscutting concerns PDA
The crosscutting concern problem Some concerns difficult to modularize • Code related to the concern is… • Scattered across (crosscuts) multiple files • Often tangled with other concern code Program Elements Concerns
Example: Pathfinding in Goblin • Pathfinding is modularized
Example: Collision detection • Collision detection not modularized
How to measure scattering? • Existing metrics inadequate • My solution • Degree of scattering [ASAT’07] • Degree of tangling [ASAT’07]
Degree of scattering (DOS) • Measures concern modularity, i.e., distribution of concern code across multiple classes • Average DOS – Overall modularity of concerns • Summarizes amount of crosscutting present • More insightful than traditional metrics • “class A is highly coupled” vs. “feature A is hard to change” [Wong, et al.] [ACOM’07]
DOS= 1.00 #Classes = 4 DOS= 0.08 #Classes = 4 Insight behind DOS • More descriptive than class count • Consider two different concern implementations Marc Eaddy
Degree of tangling (DOT) • Distribution of classcode across multiple concerns • Average DOT – Overall separation of concerns [Wong, et al.] [ACOM’07] Marc Eaddy
Roadmap ConcernTagger Cerberus • Improved state of theart of concern location • Innovative metricsand experimentalmethodology • Evidence of the dangersof crosscutting concerns PDA
Do crosscutting concerns cause defects? [TSE’08] • Created mappings • Requirement–code map (via ConcernTagger) • Bug–code map (via BugTagger) • Bug–requirement map (inferred)
Do crosscutting concerns cause defects? [TSE’08] • Correlated scatteringand bug count • Spearmancorrelation • Found moderateto strong correlationbetween scatteringand defects • As scattering increasesso do defects Scattering Bugs
How widespread is the problem? • 5 case studies of OO programs • Scattering • Concerns related to 6 classes on average • OO unsuitable for representing these problem domains? • Most (86%) concerns are crosscutting to some extent • Dispels “modular base” notion • General-purpose solution needed • Tangling • Classes related to 10 concerns on average • Poor separation of concerns • Classes doing too much • Crosscutting concerns severely limit modularity
Main Contributions ConcernTagger Cerberus • Improved state of theart of concern location • Innovative metricsand experimentalmethodology • Evidence of the dangersof crosscutting concerns PDA
Future work • Further explore new concern analysis field • Techniques to reduce crosscutting • Improve concern location • Improve PDA generality, precision, and heuristics • Use machine learning to combine judgments • Incorporate smart “grep” and PDA into IDE • Gather empirical evidence • Impact of reducing crosscutting • Impact of crosscutting on maintenance effort • Impact of code tangling on quality
Acknowledgements • Alfred Aho • ConcernTagger/Mapper • Vibhav Garg • Jason Scherer • John Gallagher • Martin Robillard • FrédéricWeigand-Warr • BugTagger • Thomas Zimmermann • Cerberus • Giuliano Antoniol • Yann-Gaël Guéhéneuc • Andrew Howard • Gobin • Erik Petterson • John Waugh • Hrvoje Benko • Wicca • BorianaDitcheva • Rajesh Ramakrishnan • Adam Vartanian • Microsoft Phoenix Team
Questions? Marc Eaddy Columbia University eaddy@cs.columbia.edu