120 likes | 126 Vues
Extraction of Evolution History from Software Source Code Using Linear Counting. Speaker: Liu Shuchang Osaka University. 1. Background. existing code. product variant. daily software development. copy. edit. software evolution. 2. Evolution History Example. only source code. ❌. 3.
E N D
Extraction of Evolution Historyfrom Software Source CodeUsing Linear Counting Speaker: Liu Shuchang Osaka University 1
Background existing code product variant daily software development copy edit software evolution 2
Evolution History Example only source code ❌ 3
Introduction • Evolution History Recovery • product variants • using only source code • Evolution Tree • vertex: variant • edge: derived relation (most similar pair) • key: product similarity • Previous Study • diff based (file-to-file similarity) • time needed (worst case: 2 days) • Linear Counting Algorithm • estimating instead of calculating 4
Linear Counting Algorithm Cardinality: 11 Zero: 2 Bitmap Size: 8 -8 × ln(2/8) = 11.0903 An example of the Linear Counting Algorithm 5
|A∩B| ——— |A∪B| Bit Map A∪B Bit Map A∩B Multiset A Multiset B Bit Map A Bit Map B Estimate Product Similarity bitwise operator hash function Initialization hash function Similarity: Jaccard Index LC(A∩B) continued division LC(A∪B) 6
the most similar pair |A∩B| ——— |A∪B| Variant A (Source Code) Variant B (Source Code) Initial Multiset B Initial Multiset A Evolution Tree Process Flow 1. n-gram modeling Initialization Jaccard Index 2. each line of the code Linear Counting Algorithm Initialization (A, B), (A, C), (A, D), … Prim’s Algorithm 7
Research Data A description of datasets we dealt with 8
Final Result of dataset5 The Evolution Tree we extracted (the Best Configuration) Existing actual evolution history 9
Analysis on Bitmap Size Part of the experiment results of dataset5 10
Best Configuration • Main Factors • N-gram Modeling • no (each line of code) • Bitmap Size • 128,000,000 bits • Hashing Function • MurmurHash3 • Results • Proper Edges • 86.5% (on average) • Time • 10s to 5mins 11
Contributions and Future Work • Contributions: • extract an ideal Evolution Tree efficiently • influence of various factors • best configuration • faster and showed better accuracy • Future Work • larger datasets • other programming language • solve the remaining problems 12