60 likes | 151 Vues
Explore the cutting-edge research of detecting clones across various programming languages by Dr. Kraft and team, comparing C# and VB parsers in this preliminary study. Discover the approach using token sequences of CodeDOM graphs with Levenshtein distance for code file comparisons and limitations to address.
E N D
Clone Research • Detecting clones across multiple programming languages is on the cutting edge of research. • A preliminary version of this was done by Dr. Kraft and his students for C# and VB. • They compared the Mono C# parser (written in C#) to the Mono VB parser (written in VB). • Publication: • Nicholas A. Kraft, Brandon W. Bonds, Randy K. Smith: Cross-language Clone Detection. SEKE 2008: 54-59
Dr. Kraft Approach • Token sequence of CodeDOM graphs with Levenshtein distance • The Levenshtein distance between two sequences is defined as the minimum number of edits needed to transform one sequence into the other • Performs Comparisons of code files • CodeDOM tree is tokenized • Based on Distances • Percentage of matching tokens in a sequence
Porting Clone Analysis • About 50% complete porting the analysis code • Dr. Kraft's code • AST, codeDOM and Tokenization's are woven tightly together
Limitations • Only does file-to-file comparisons • Does not detect clones in same source file • Can only detect Type 1 and some Type 2 clones • Not very efficient (brute force)