430 likes | 566 Vues
This presentation outlines the agenda and current tasks for the Clone Analysis Team aimed at advancing our source code parsing capabilities using the GOLD Parsing System. Key topics include a demo of the GOLD parser, project layout discussions, and team collaboration strategies. The team will focus on loading and translating C#, JAVA, and C++ source codes into CodeDOM, enhancing grammar for various languages, and addressing the challenges of LALR-compliant parsing for C++. The ultimate goal is to create an efficient framework for code translation, clone detection, and visualization of detected clones.
E N D
Presentation 4 Cross Language Clone Analysis Team 2 October 13, 2010
Agenda • Current Tasks • Spike – GOLD Parser • Demo • Project Layout • Team Collaboration • Path Forward
Our Team • Allen Tucker • Patricia Bradford • Greg Rodgers • Brian Bentley • Ashley Chafin
Current Tasks What we are tackling…
Current Tasks (Review) • Current tasks created for the first user story “Source Code Load & Translate”: • Load & parse C# source code. • Load & parse JAVA source code. • Load & parse C++ source code. • Translate the parsed C# source code to CodeDOM. • Translate the parsed JAVA source code to CodeDOM. • Translate the parsed C++ source code to CodeDOM. • Associate the CodeDOM to the original source code.
GOLD Parsing System Spike
Topics To Discuss • What is it? • How does it work? • What can we use it for? • How can we extend it?
What Is GOLD? • GOLD is a free parsing system that you can use to develop your own programming languages, scripting languages and interpreters. It strives to be a development tool that can be used with numerous programming languages and on multiple platforms. – www.devincook.com/goldparser
How It Works (Block Structure) Source Code Grammar Builder Compiled Grammar Table (*.cgt) Engine Parsed Data
How It Works (Components) Source Code Grammar Builder Compiled Grammar Table (*.cgt) Engine Parsed Data Three Major Components Builder – Reads a source grammar to construct a Compiled Grammar Table Compiled Grammar Table – Stores LALR and DFA parse tables Engine – Performs actual parsing
How It Works (Process) Source Code Grammar Builder Compiled Grammar Table (*.cgt) Engine Parsed Data • Step 1 • Write the grammar for the language being implemented. (GOLD-Meta Language) • Rules: Backus-Naur Form • Terminals: Regular Expressions • Character sets: Set Notation
How It Works (Process) Source Code Grammar Builder Compiled Grammar Table (*.cgt) Engine Parsed Data • Step 2 • Analyze Grammar • Construct LALR and DFA parse tables which are saved in a Compiled Grammar Table file.
How It Works (Process) Source Code Grammar Builder Compiled Grammar Table (*.cgt) Engine Parsed Data • Step 3 • Analyze source text with parser engine and construct parse tree • Engine can be implemented in any number of programming languages
Usage within CloneDigger Source Code Compiled Grammar Table (*.cgt) Engine Parsed Data CodeDOM Conversion AST • CodeDOM Conversion • Need to write routine to move data from Parsed Tree to CodeDOM • Parsed data trees from parser are stored in consistent data structure, but are based on rules defined within grammars
Task Understanding • Three Step Process • Step 1 Code Translation • Step 2 Clone Detection • Step 3 Visualization Common Model Translator Source Files Detected Clones Inspector Common Model Clone Visualization UI Detected Clones
Extension and Enhancements Source Code Grammar Builder Compiled Grammar Table (*.cgt) Engine Parsed Data • Enhance Grammars • Update Java • Update C# • Define C++ • Share among other classmates with similar interest • Share with greater community
Grammars • What is a grammar? • A set of rules of a specific kind, for forming strings in a formal language. The rules describe how to form strings from the language's alphabet that are valid according to the language's syntax. A grammar does not describe the meaning of the strings or what can be done with them in whatever context —only their form.
Gold Parser Grammars • Gold Parser uses context-free grammars that can be used to do Lookahead Left-to-Right (LALR) parsing. • LALR compliant grammars that we already have: • C# • Java • Visual Basic .Net
C++ Grammar Issue • Currently no LALR compliant C++ grammar exists due to the overall complexity. • Other C++ parsers exist, but give an output format different than the other languages we already have grammars for using Gold Parser. • We are still searching for C++ parsing solutions.
GOLD Parser Conclusion • We plan to use GOLD Parsing System. • Tasks we have to complete: • Update JAVA grammer • Update C# grammer • Research “Define C++ grammer” • Create a CodeDOM conversion to move data from Parsed Tree to CodeDOM
Demonstrations GOLD Parsing System
Project Layout Key Points, Architecture, & Unit Test
Key Architecture Points • Multilanguage support • Configurable for different platforms • Stand-along application • plug-in • backend service • Extendable
Architecture User Interface Communication Layer Core Clone Detection Algorithms Code Model API Language Service Interface C# Service Java Service C++ Service
Core Unit • Code Model • Stores the code in common format • Application Programming Interface • Used to embed clone detection in applications • Language Service Interface • Communication layer between the core and the specific language services Core Clone Detection Algorithms Code Model API Language Service Interface
Team Collaboration Team 2 & Team 4
Team Collaboration • Due to Team 4’s team size, we have taken responsibility of gathering & sharing grammers. • Both Teams will… • Use the same grammers & engines • We will both have limitations based on this. • Ex: JAVA grammer is based off 1.4 -> we are limited to using JAVA 1.4 • Test the same grammers & engines • We will have two test beds.
Team Collaboration • Method of collaboration: • Google code project site: • http://code.google.com/p/uah-studio-2010-2011/ • Team 4 team members have access to this site. • Meetings • Email • What does our google code project contain? • Source control for grammers & engines • Bugs/Issues • Team 4 will have ability to document new bugs. • Documents/Artifacts
Path Forward Next Iteration & Schedule
Path Forward Finalize Iteration 1 Iteration 2 Planning/Elaboration