1 / 66

Architectural Reverse Engineering

Architectural Reverse Engineering. Kristof De Vos. Overview. Why? Current state Approaches Tools RevEngE Dali Bookshelfs Dowsing Next Generation. Overview. Why? Current state Approaches Tools RevEngE Dali Bookshelfs Dowsing Next Generation. Why ?. # legacy systems is growing

cece
Télécharger la présentation

Architectural Reverse Engineering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Architectural Reverse Engineering Kristof De Vos

  2. Overview • Why? • Current state • Approaches • Tools • RevEngE • Dali • Bookshelfs • Dowsing • Next Generation

  3. Overview • Why? • Current state • Approaches • Tools • RevEngE • Dali • Bookshelfs • Dowsing • Next Generation

  4. Why ? • # legacy systems is growing • old and large • poorly structured, understood (new staff) • System ’s software achitecture influences it ’s quality support: modifiability, performance, security, … • documentation unavailable or outdated • architecture embodies the earliest, farthest reaching, design decisions

  5. Overview • Why? • Current state • Approaches • Tools • RevEngE • Dali • Bookshelfs • Dowsing • Next Generation

  6. Current State of Reverse Engineering • Research has little effect on actual software reengineering practice • reasons: • cost and benefit in software cycle unknown • no effective decision procedure developed • no validated reengineering process • diversity of languages, compilers, hardware, OS => no tools at the marketplace

  7. Related Research Areas • Taxonomy: • IEEE Reengineering Taxonomy Project • Intermediate Representation: • common, portable, interoperable, … • lot research, but not commercial • Educational and Training Material • WWW

  8. Research Areas • Reengineering Task Descriptions • activities, inputs, outputs, costs, benefits • Standard Datasets • specific source code , test data =>comparing • Case Studies • systematic collection • uniformly documented • Public Domain Tools

  9. Overview • Why? • Current state • Approaches • Tools • RevEngE • Dali • Bookshelfs • Dowsing • Next Generation

  10. Approaches • Bottom-up • source code => system model • start with program knowledge • creating higher abstractions • Top-down • map abstract domain concepts on concrete system artifacts

  11. 2 distinct steps • Identify current components, capture dependencies • reconstruction of design and requirements specification (= domain model) • highly interactive • find correlation between design and code

  12. Difficulties • Trade-offs have impact on: • understandibility • difficulty • genericity / flexibility • as-designed vs. as-implemented • not « an architecture », but many • hidden in abstractions, compositions, …

  13. Architectural Recovery Framework

  14. Error prone tools • Static error tools are proved to be error prone • false negatives • don’t identify elements from source code • false positives • identify elements that do not actually appear • solution: combine results

  15. Overview • Why? • Current state • Approaches • Tools • RevEngE • Dali • Bookshelfs • Dowsing • Next Generation

  16. RevEngE • Integrated toolkit for program understanding • Code localization • data flow analysis • pattern matching • system clustering • visualization

  17. Ariadne , ART and Rigi • Single tool is unsufficient • Common repository: Telos • Global repository & data integration mechanism • adapted: changed way of retrieving: • Complete objects • objects of a class • partial objects • related information

  18. ART • prototype textual redundancy analysis engine • sourcefiles => substrings (snips) • matching snips • pieces of code that are connected • rough => startingpoint deeper analysis

  19. Ariadne • Set of pattern-matching, design recovery and program analysis engines • code localization: • pattern matching algorithm to discover known patterns • Clustering: • analyzing global data-flow and data artifacts which are shared • “common references analysis”

  20. Rigi • Prototype realization of PHSE= architecture for meta reverse engineering • primarily used as visualization tool, but • rich scripting facilities • extract abstractions from code

  21. Rigi • Parsing subsystem • distributed, multi-user repository • interactive graph editor • reverse engineering methodology • measures for evaluating quality of structural abstraction • facilities to understand document structures • extension mechanism through a scripting language

  22. Fundamental issues • Code representation • structural representation • data flow and control flow • quality and complexity metrics • localization of algorithms • identification of ADTs and generic operations • multiple system analysis views

  23. More specific (1) • Data-binding analysis • Tupple <p,q,x> where variable x is defined by function p and used by function q • Common Reference Analysis • Tupple <p,q,x> where variable x is defined or used by functions p and q • Similarity Analysis • 5 software quality and complexity metrics are applied to identify similar code fragments

  24. More Specific (2) • Subsystem Analysis • input: Common reference & data-binding analysis • e.g. functions that define 2-10 variables and have 3-15 common references statistically show error-prone modules • Redundancy Analysis • check text that occurs multiple times • cut-and-paste analysis

  25. Concrete Example: CLIPS • =C-Language Integrated Production System • expert system developed at NASA • 60 files • 700 functions • 30.000 lines of code

  26. CLIPS: Data-Binding Ana. • Remember <p,q,x> • simplifications • removal of unconnected objects • decomposition into distinct connected objects • filtering singly connected objects • identification of key-functions that have high coupling with rest of the system • starting point for further analysis

  27. CLIPS: Common Reference Ana. • Remember <p,q,x> • define clusters of functions that have an interface to another cluster • clusters usually group functions that operate on similar concepts • again startingpoint for further analysis

  28. CLIPS: Similarity Ana. • Detect instances of code cloning • functions with same structure • functions with same data-flow • usually define similar algorithms • e.g. implementations of patterns

  29. CLIPS: Subsystem ana. • Further investigation of clusters of ALL previous analyses • common interface: functions • define entry-points of clusters • define central (important) components

  30. CLIPS: Redundancy Ana. • ART: find exact matches of 5 or more lines • again clusters as result • inspection with texteditor • can identify design patterns

  31. Other Techniques • Values of Variables • = inter-procedural data flow • most if-statements switch on values • potential values are defined • Special Patterns • p.e. Using files • pattern matching in almost every tool

  32. Other Techniques • Clustering • grouping features of a program • objects, functions, variables, ...

  33. Overview • Why? • Current state • Approaches • Tools • RevEngE • Dali • Bookshelfs • Dowsing • Next Generation

  34. Dali • single source of information is not enough • Late Binding • architectural relations that are not bound at compile-time • polymorphism, function pointers, parameters, … • System Topology • allocation of software to processes (or processors) • typically not stated in a compiled source

  35. Dali : advantages • Different views contain complementary information • user can navigate between views • source extraction can be error prone=> cross-checking • different tools can be plugged in(manipulation, analysis, presentation) • views stored in central repository

  36. Dali: how it works • View extraction • static • inheritance hierarchies, build dependencies, call graphs, variable accesses, … • extracted directly from source code • dynamic • process spawning, inter process communication, run time procedure invocation • there will be overlap • result: “Extracted Views”

  37. Dali: How it works • View Fusion • views are combined, defined and manipulated • pruning: unneeded parts are cut away • to define views that are more meaningfull • eliminate errors by cross-checking • External Manipulation • openness • other tools can be integrated and used

  38. Dali: How it works • Analysis • conformance testing, pattern based complexity measurement, ...

  39. Dali

  40. Overview • Why? • Current state • Approaches • Tools • RevEngE • Dali • Bookshelfs • Dowsing • Next Generation

  41. SWAG Software Bookshelfs • swag.uwaterloo.ca • hierarchical decomposition in subsystems • webbased system • source code • documentatie • test, analyses • history, future plans • for new users, experienced users, managers

  42. BS: what? • Separate webpage per subsystem • hierarchical structure: webpages/subsystems • each page contains : • landscape diagram • all contained items (files,subsystems) • supersystem • all links (inside and outside) • associated documentation

  43. BS: Building it • Extractors: • parse source file for facts • Rigi Standard Form (RSF) • Fact Manipulator • software tool GROK • uses database interface to manipulate facts • subdiviside, compose, ...

  44. BS: Building Tools • Diagram Layouter • Layouter read facts • attaches position, size, color, … • Landscape Viewer • LS Viewer = java program • displays the complete system/subsystem • layout of facts can be changed in LS Editor

  45. BS: Extraction Process

More Related