
Aiding Comprehension of Cloning Through Categorization Cory Kapser and Michael W. Godfrey Software Architecture Group School of Computer Science, University Of Waterloo
Overview • Motivation • Background • Methods • Case Studies • Results • Discussion • Summary
Motivation • Code duplication (“cloning”) is common in large, long-lived industrial software systems. • Negatively affects successful system evolution! • Thus, clone management or removal is desirable.
Problems with clone detection technologies • Comprehension • Result sets often provide little information beyond “it’s a clone” • Scalability • VERY large result sets typical • Accuracy • Esp. false positives
Proposed solution • Classification of clones • Improve comprehension through informative grouping and statistical analysis • Improve scalability through easier navigation • Improve accuracy through region-specific filtering
Overview • Motivation • Background • Methods • Case Studies • Results • Discussion • Summary
Code cloning • A serious problem in industrial software. • Typically, 15% of a system is duplicated code. • As high as 50% in some cases [Ducasse]
Reasons for code cloning • Perceived cost • Time constraints • Insufficient understanding of the underlying problem • Architectural clarity
Problems with clones • Maintenance • Size • Comprehension • Bugs (copied and new) • Indication of poor design
Managing clones • Removal • Documentation
Overview • Motivation • Background • Methods • Case Studies • Results • Discussion • Summary
Our approach • Perform clone detection • Extract/define “regions” from source code • Map clone pairs to regions • Classify clones • Filter clones • Display results
The taxonomy • Classifies clones according to attributes such as location and region type of a clone • Hierarchical
ADD A SLIDE HERE • To discuss what you hoped yoru taxonomy would help you with • Why did you pcik that design? • Give an example of how using this taxonomy could be helpful in a (simple, made up) example case
Overview • Motivation • Background • Methods • Case Studies • Results • Discussion • Summary
Case studies • PostgreSQL • 543,387 LOC • 1097 source files • Linux kernel file-system subsystem • 280,177 LOC • 537 source files
Filtering and classification results • 85 – 87% of clones could be classified using the taxonomy • Fewer unclassified clones in Same Directory Clonescategory • Large percentage of false positives were removed via filtering structural and prototype regions.
Overall cloning in the systems • Function Clones dominate the SameDirectory Clones. • Most cloning occurs within the same directory.
Frequency of clone types • Very few loop clones • Relatively many conditional clones • 38% of the clone pairs in the Linux fs and 53% of the clone pairs of PostgreSQL made up function clones
It is possible to insert a table here with the results even if it is partial (to show that the work is there and that there are numbers)? • Or maybe a graph? Nice to have this to imply: here’s all the hard work we did, boy did we sweat, and there are so many results that the obersvations are probably meaningful
Overview • Motivation • Background • Methods • Case Studies • Discussion • Summary
Cloning comprehension • Classification of clones can improve comprehension • User will have a working understanding of what a clone in a certain type means • We believe navigation of the “clone space” will be greatly improved • We now know more about cloning as it occurs in a software system • Simple metrics are now available
Tool support • Clone Interpretation and Classification System (CICS) • Provides GUI to navigate classified clones • Will provide benchmarking support for clone detection tools • Many features can be added complement the sorting of clones in the taxonomy
Overview • Motivation • Background • Methods • Case Studies • Discussion • Summary
Summary • Management of clones is important for the healthy evolution of a software system • We can make the process of managing clones more comprehensible, scalable, and accurate
Future work • Deeper classification • Benchmark suite • IDE plugins • Evolution of clones