1 / 12

Extraction of Evolution History from Software Source Code Using Linear Counting

Extraction of Evolution History from Software Source Code Using Linear Counting. Speaker: Liu Shuchang Osaka University. 1. Background. existing code. product variant. daily software development. copy. edit. software evolution. 2. Evolution History Example. only source code. ❌. 3.

ledet
Télécharger la présentation

Extraction of Evolution History from Software Source Code Using Linear Counting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Extraction of Evolution Historyfrom Software Source CodeUsing Linear Counting Speaker: Liu Shuchang Osaka University 1

  2. Background existing code product variant daily software development copy edit software evolution 2

  3. Evolution History Example only source code ❌ 3

  4. Introduction • Evolution History Recovery • product variants • using only source code • Evolution Tree • vertex: variant • edge: derived relation (most similar pair) • key: product similarity • Previous Study • diff based (file-to-file similarity) • time needed (worst case: 2 days) • Linear Counting Algorithm • estimating instead of calculating 4

  5. Linear Counting Algorithm Cardinality: 11 Zero: 2 Bitmap Size: 8 -8 × ln(2/8) = 11.0903 An example of the Linear Counting Algorithm 5

  6. |A∩B| ——— |A∪B| Bit Map A∪B Bit Map A∩B Multiset A Multiset B Bit Map A Bit Map B Estimate Product Similarity bitwise operator hash function Initialization hash function Similarity: Jaccard Index LC(A∩B) continued division LC(A∪B) 6

  7. the most similar pair |A∩B| ——— |A∪B| Variant A (Source Code) Variant B (Source Code) Initial Multiset B Initial Multiset A Evolution Tree Process Flow 1. n-gram modeling Initialization Jaccard Index 2. each line of the code Linear Counting Algorithm Initialization (A, B), (A, C), (A, D), … Prim’s Algorithm 7

  8. Research Data A description of datasets we dealt with 8

  9. Final Result of dataset5 The Evolution Tree we extracted (the Best Configuration) Existing actual evolution history 9

  10. Analysis on Bitmap Size Part of the experiment results of dataset5 10

  11. Best Configuration • Main Factors • N-gram Modeling • no (each line of code) • Bitmap Size • 128,000,000 bits • Hashing Function • MurmurHash3 • Results • Proper Edges • 86.5% (on average) • Time • 10s to 5mins 11

  12. Contributions and Future Work • Contributions: • extract an ideal Evolution Tree efficiently • influence of various factors • best configuration • faster and showed better accuracy • Future Work • larger datasets • other programming language • solve the remaining problems 12

More Related