1 / 10

Finding File Clones in FreeBSD Ports Collection

Finding File Clones in FreeBSD Ports Collection. Yusuke Sasaki Tetsuo Yamamoto Yasuhiro Hayase Katsuro Inoue. File Clones. Two or more files with the same content Comments and code indentation ignored Inside a project or between different projects Research about file-clones is scarce

Télécharger la présentation

Finding File Clones in FreeBSD Ports Collection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Finding File Clones in FreeBSD Ports Collection Yusuke Sasaki Tetsuo Yamamoto Yasuhiro Hayase Katsuro Inoue

  2. File Clones • Two or more files with the same content • Comments and code indentation ignored • Inside a project or between different projects • Research about file-clones is scarce • Get new knowledge about file-clones Project A Project B int main() { printf(“Hello msr!”); return 0; }

  3. FCFinder • Input • .c and .h files • Output • File-clone sets • Faster than other tools • Detection • Tokenization • MD5 Hash Calculation • Exact Matching

  4. Experiment • Target • Only .c and .h files inthe FreeBSD Ports Collection • ~1.4M files • ~12 GB • 17.16 hours • We measured: • File size • Number of files in each project • Size of each file-clone set • Number of file-clones in a project These values follow the power law

  5. File-clone Set Size Left:used in PHP5 Right:used in PHP4 used in both of PHP4 and 5 D E L:650 sets R:500 sets 419 sets 120 file clones 100 5 10 50 L:61 file clones R:59 file clones file clone set size R*2 =0.8508

  6. File-clones per Project Right:PHP4 modules Center:projects related bin-utils Left:PHP5 modules G 5 10 50 100 500 1K 5K 10K number of file clone sets R*2 =0.8263

  7. File-clones Between Projects (1/3) * Nodes show the projects * Edges between projects show the number of file clones between two projects • Ex) gcc41 and gfortran shares 7691 file clones

  8. File-clones Between Projects (2/3) * Nodes show the projects * Edges between projects show the number of file clones between two projects

  9. File-clones Between Projects (3/3) * Nodes show the projects * Edges between projects show the number of file clones between two projects

  10. Conclusions & Future Work Conclusions • Measured several features of the FreeBSD Ports collection. • Found that the measured features follow the power law Future Work • Projects logical coupling investigation

More Related