450 likes | 655 Vues
Bard. ASLs. An Application-Specific Language (ASL) is a programming language that supports: a particular class of applications in a highly idiosyncratic domain In order to understand a particular ASL, it is usually necessary to understand the domain
E N D
ASLs • An Application-Specific Language (ASL) is a programming language that supports: • a particular class of applications • in a highly idiosyncratic domain • In order to understand a particular ASL, it is usually necessary to understand the domain • An ASL can be extremely effective when used for its intended purpose
Bard • Bard is designed to perform pattern matching and manipulation of Abstract Syntax Trees (ASTs) in the reengineering domain. • In order to understand Bard, it is necessary to understand a little about ASTs and reengineering
Outline of the talk • This talk will cover: • What reengineering is all about • What an Abstract Syntax Tree (AST) is • What an "idiom" is • What the Idiom Tool does • How Bard and the Idiom Tool work together • The design decisions behind Bard
Legacy code • The legacy code problem: • About 80% of programmer time goes to maintenance • New software methodologies are the preferred solution (that is, do it over from scratch!) • BUT • 40 years of working systems can't be abandoned.
Software reengineering • Software reengineering is any activity that • improves one's understanding of software • improves the software documentation • improves the software itself • brings legacy code up to more current standards • aids in a new implementation of the software
Why reengineer? • We reengineer code for many reasons: • add functionality • ease maintenance • move to new platforms and/or new languages • facilitate reuse • improve performance • decrease cost of keeping obsolete systems • combat software rot
The reengineering process • Reverse engineering moves from "software assets" (code, documentation, etc.) to a higher level of abstraction • Transformations modify the system at some level of abstraction • Forward engineering moves from a higher level of abstraction back to code
Reengineering levels • Reengineering can be carried out at many levels:
Idiom tool • Idiom Tool is the part of the reengineering suite that performs transformations on the AST
Creating the Abstract Syntax Tree • Parsers convert programs written in various languages into their AST representations • This AST can be manipulated by Idiom Tool, or used as input for various other tools (to create flowcharts, data flow diagrams, concordances, etc.) • Idiom Tool is a programmable subsystem • The programming language used is Bard
Example of an AST (simplified) • In C: if (n < 0) return -n;
Gray • The ASTs can expressed in a language called Gray • Gray is itself an ASL (Application-Specific Language) • Since ASTs are trees, Gray borrows the syntax (but not the operators) of LISP
Example Gray code • (If_Op (Less Identifier:n Number_Literal:0) (Return_Value (Negate Identifier:n)) Epsilon) )
Idioms • An idiom is a conventional way of expressing an idea • Programming languages, like natural languages, have idioms • Examples: • for (i = 0; i < n; i++) A[i] = 0; • temp = x; x = y; y = temp;
Idioms change • Idioms vary over time and across languages • Arrays may be zero-based or one-based • FORTRAN used parallel arrays to simulate structs • Languages may use array indices to simulate pointers • C lacks multiply-dimensioned arrays • No early languages were object-oriented • COBOL typically represented years as two digits
Recap • To summarize, • ReEngineer is a tool that uses • parsers to translate programs in various languages into • Abstract Syntax Trees, which can be represented in a linear fashion by • Gray code, and which can be manipulated by • Idiom Tool, which uses programs written in • the Bard language.
Regular expressions • Recognizing idioms in a program is a matter of recognizing certain kinds of patterns • Regular expressions (regexps) are a standard, well-developed mechanism for doing pattern recognition on strings • Regular expressions are used by grep, Perl, Tcl, awk, Python, vi, emacs, and a number of other languages and tools
Regular expressions in Bard • Regular expressions in Bard are modelled after regular expressions in grep • Because we are doing pattern matching on ASTs rather than strings, • Literals are represented by Gray expressions, • Regular expression syntax must be modified to accomodate arbitrary Gray expressions, and • Regular expressions may occur within Gray expressions, and vice versa
Inner and outer languages • Regular expressions are not enough -- we need some way to control pattern matching • We also need to manipulate the parts of the AST found by pattern matching • Solution: wrap a more-or-less conventional "outer language" around the "inner language" of regular expressions • This is also the approach taken in Perl
Conventional statements in Bard • Conventional statements in Bard include: • assignment statements • if statements • while loops • print statements • function and procedure calls • calls to native (Ada) code • tracing and debugging facilities
Special-purpose statements in Bard • Examining and manipulating the AST: • match pattern [at position] • insert subtree as relative-position • for example, insert identifier:x as first child; • delete position • replace position with subtree • find pattern [up to position] • go to position
Efficiency • ReEngineer has been used to process hundreds of thousands of lines of code • Pattern-matching languages are not efficient enough for this purpose • Bard procedures are inefficient • Bard procedures are called only when there is a high likelihood that pattern matching will succeed
Fast pre-screening of nodes • Each Bard routine specifies the types of nodes at which it might be applicable • Idiom Tool walks the AST and calls Bard when it finds such a node • Since Idiom Tool is written in Ada, it can screen nodes very, very fast
The structure of Bard procedures • A "procedure" in Bard, like a "rule" in an expert system, consists of two parts: • The test part determines whether this particular procedure is applicable at this point in the tree • Tests must not have have side effects • The action part consists of arbitrary Bard code • Actions may modify the AST and/or may collect and store information about the AST
Skeleton of a Bard procedure • procedure name (pass, node_types) tests; -- may not have side effects commit; -- separates tests from actions actions; -- manipulate the ASTend procedure;
Bard example idiom remove_null_statements; -- Remove null statements, except empty loop bodies. procedure remove_null_statements (1, Null_Statement); go to first child of parent; not match While_Op; not match Do_While; not match C_For_Loop; commit; go to @trigger; delete; end procedure; end idiom;
Diffuse patterns • A pattern is compact if it is a single recognizable unit • A pattern is diffuse if it consists of multiple pieces scattered throughout the AST • Diffuse patterns may overlap in complex ways • The problem is to collect the various pieces • This is not easily solved with only simple data structures such as arrays
Dealing with diffuse patterns • Idiom Tool makes multiple passes over the data base • Each Bard procedure indicates the pass for which it is active • This allows some procedures to collect information and other procedures to process the information. • Idiom tool uses a fact base to store information between passes
Modifying the fact base • Bard's fact base is modeled after that of Prolog • Bard allows negative assertions as well as positive assertions • assert fact; • deny fact; • retract fact;
Interrogating the fact base • Because Bard does not used the Closed World Assumption, facts may be "true", "false", or (unlike Prolog) "unknown" • true fact; • false fact; • known fact; • unknown fact;
Multiple occurrences • It is sometimes necessary to recognize multiple occurrances of the same subtree. • For example, to recognize Pascal statements of the form x:=x+1, you have to realize that the identifier x is repeated • (Assignment Identifier:?id (Integer_Addition Identifier:?id Numeric_Literal:1))
Unification in Bard • When pattern matching variable ?id against value v, • If ?id has no prior value, it is unified with v • If id? has a prior value, then the match succeeds if and only if id? == v • If a test succeeds, all unifications performed during the test are retained • If a test fails, all unifications performed during the test are discarded
Summary I • Idiom tool does multiple preorder traversals of the AST; at each node it may trigger one or more Bard procedures • A Bard procedure consists of • a header that specifies appropriate node types • a test part that performs any additional tests • the keyword commit, and • an action part
Summary II • Special features of Bard include • an "inner language" for pattern matching • procedures modeled after rules in expert systems • powerful tests that can comprise arbitrary actions • unification and unification unwinding • an integrated fact base • These features make Bard far better for its purpose than a general language such as Ada
Summary III • Bard has been used • in a Pascal to C translation system • to assist in converting CMS-2 to C • in a prototype Y2K system
Current status of Bard • Currently owned by either Lockheed or Unisys • No longer under active development • Lost and presumed dead as a result of corporate mergers and reorganizations