1 / 29

The Pivot: Static Analysis of C++ Applications

The Pivot: Static Analysis of C++ Applications. Bjarne Stroustrup Texas A&M University http://www.research.att.com/~bs. Overview. Static analysis of C++ What would be useful Why it is hard C++0x The Pivot Context Aims Organization Basic representations

errin
Télécharger la présentation

The Pivot: Static Analysis of C++ Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Pivot:Static Analysis of C++ Applications Bjarne Stroustrup Texas A&M University http://www.research.att.com/~bs

  2. Overview • Static analysis of C++ • What would be useful • Why it is hard • C++0x • The Pivot • Context • Aims • Organization • Basic representations • High-level program representation for HPC • Concept-based checking and transformation

  3. What would be useful? • Direct representation of high-level ideas in code • E.g. no sideeffects, idempotent operation, always gives the same answer for the same element, no security violation, no memory leak, no race condition, no deadlock, being sorted, being band-diagonal, parallel application … • Use of such direct representation • For providing guarantees • For information • For optimization • For program transformation

  4. It’s hard • C++ is • Large • Extremely flexible and general • Quite irregular • Has it’s type-unsafe C subset • High-level ideas tend to be represented as templated classes and functions • Generic programming, Template meta-programming, generative programming • We have little experience with tools representing and manipulating templates • Such templates tend to be provided as part of domain specific libraries

  5. Bell Labs proverbs • Library design is language design • Language design is library design But the devil is in the details

  6. C++0x • 1998: ISO C++ standard • 2009 (estimated): ISO C++ standard • Better libraries and better support for library building • Hash maps, regular expressions, file system, … • Threads and memory model • Concepts • A type system for types, integers, and operations • Auto, template aliases, general, initializer lists, …

  7. Concept: trivial example // Caveat: likely C++0x template<ForwardIterator Iter, ValueType Val> where Assignable<Iter::value_type,Val> Iter find(Iter first, Iter last, Val v); template<Container Cont, ValueType Val> where Assignable<Cont::value_type,Val> Iter find(Iter first, Iter last, Val v); vector<int> v = { 2, 3, 5, 8, 13, 21, 34 }; auto p1 = find(v.begin(), v.end(), 42); auto p2 = find(v,42.3); auto p3 = find(7,42); // error: 7 is not a Container

  8. Concepts • Can express many high-level abstractions • A type system for sets of types, integers, and operations • We have experimental implementations of concepts • A concept is a handle to which we can attach • some “standard semantics” within the language • essentially arbitrary semantics outside the language using tools • Until we get concepts, we can “fake them” with static analysts and transformation tools

  9. Context for the Pivot • Semantically Enhanced Library (Language) • Enhanced notation through libraries • Restrict semantics through tools • And take advantage of that semantics Domain Specific Library C++ Semantic Restrictions

  10. Context for the Pivot Domain Specific Library • Provide the advantages of specialized languages • Without introducing new “special purpose” languages • Without supporting special-purpose language tool chains • Avoiding the 99.?% language death rate • Provide general support for the SELL idea • Not just a specialized tool per application/library • The Pivot fits here C++ Semantic Restrictions

  11. Example SELL: Safe C++ • Add • Range-checked std::vector • iterators • Resource handles • Any (if needed) (a typesafe union type) • Subtract • Arrays • Pointers • New/delete • Unions • Excessively complex/obscure code • Uses of undefined construct not caught by compilers (e.g. a[++i] = i) • Transforms • Pointers into iterators and resource handles (if porting) • New/delete into resource handle uses

  12. Example SELL: STAPL • Wait for Lawrence’s talk

  13. Aims • To allow fully general analysis of C++ source code • “What a human can do” • Foci • Templates (e.g. specialization) • C++0x features (e.g. concepts, generalized initializers) • Distributed programming • Embedded systems • Limitation: we work after macro expansion • To allow transformation of C++ code • i.e. production of new code from old source • Non-aim: handling other languages • e.g. Fortran, Java • but C and C++ dialects are relatively easy

  14. Related work • Lots • 20+ tools for analyzing C++ • But • Most are specialized • E.g. alias analysis, flow analysis, numeric optimizations • Most are attached to a single compiler/parser • None handles all of C++ • E.g. C + classes, C++ but not standard libraries • (that requires full handling of templates) • Hardly two tools handle the same subset • None handles the key C++0x features (e.g. concepts) • Some are proprietary • No serious interoperability

  15. The Pivot C++ source Object code Tool 1 Compiler Compiler Compiler IDL Tool 2 IPR C++ source Tool 3 Tool 4 Specialized representation (e.g. flow graph) XPR “information”

  16. Why? The Original Project • Communication with remote mobile device • Calling interface • CORBA, DCOM, Java RMI, …, homebrew interface • Transport • TCP/IP, XML, …, homebrew protocol • Big, Ugly, Slow, Proprietary, … • Why can’t I just write ISO Standard C++?

  17. The original Project Distributed programs in ISO C++ • “as similar as possible to non-distributed programming, but no more similar” // use local object: X x; // remote at “my host” A a; std::string s("abc"); // … x.f(a, s); // a function call // use remote object : proxy<X> x; x.connect("my_host"); A a; std::string s("abc"); // … x.f(a, s); // a message send

  18. IPR high-level principles • Complete: Direct representation of C++ • Built-in types, classes, templates, expressions, statements, translation units … • Can represent erroneous and incomplete C++ programs • Regular • The structure contains all of C++ but doesn’t mimic irregularities • Programming effort proportional to complexity of task • IPR is not just a data structure • Extensible • Node types • Information associated with a node • Operations • No integration with compilers

  19. IPR design choices • Type safe • IPR (not its users) handles memory management • Minimal (run-time and space) • Minimal number of nodes (unification) • Minimal number of checked indirections (usually, virtual function calls) • Expression-based regular superset of C++ • E.g. statements, declarations are expressions too • C++0x features (most important: concepts – types have types) • Interfaces: • Purely functional, abstract classes, for most users • No mutation operation on abstract classes • Users don't get pointers directly • Mutating (operates on concrete classes) • Users get to use pointers for in-place transformation • Traversals (and queries) • Several, most not in “the Pivot core”

  20. IPR is minimal • Necessary for dealing with real-world code • Multi-million line programs are not uncommon • Given the constraint of completeness • C++ is complex • especially when we use the advanced template features essential for high-performance work • Unified representation • E.g., there is only one int and only one 1 • Type comparison becomes pointer comparison • Indirections are minimized • An indirection (only) when there is a choice of different types of information

  21. Original idea (XTI) • Too large, too slow

  22. Current hierarchy (IPR) • Compact • minimal call overhead

  23. IPR – Example 1 void foo(float b = 2.4)

  24. IPR – Example 2

  25. XPR (eXternal Program Representation) • Can be thought of as a specialized portable object database • Easy/fast to parse • Easy/fast to write • Compact • About as compact as C++ source code • Robust • Read/write without using a symbol table • LR(1), strictly prefix declaration syntax • Human readable • Human writeable • Can represent almost all of C++ directly • No preprocessor directives • No multiple declarators in a declaration • No <, >, >>, or << in template arguments, except in parentheses

  26. XPR i : int // int i; C : class { // class C { m : const int // const int m; mm : *const int // const int* mm; f : (:int,:*char) double // double f(int,char*); f : (z:complex) C // C f(complex z); } // }; vector : <T> class { // template<class T> class vector { p : *T // T* p; sz : int // int sz; } // };

  27. Extremely simple SELL example template <Parallelizable T> void f(const T& v) { double d = v[2]; // OK double* d = &v[2]; // not OK };

  28. Current and future work • Complete infrastructure • Complete EDG and GCC interfaces • Represent headers (modularity) directly • Complete type representation in XPR • Initial applications • Style analysis • including type safety and security • Analysis and transformation of STAPL programs • Build alliances

  29. References [GJS+06] Gregor, Douglas; Järvi, Jaako; Siek, Jeremy; Lumsdaine, Andrew; Dos Reis, Gabriel; Stroustrup, Bjarne: Concepts: Linguistic Support for Generic Programming in C++. to appear OOPSLA'06. [DRS05] Stroustrup, Bjarne; Dos Reis, Gabriel: A concept design. C++ Committee, paper N1782. April 2005. [SDR05] Stroustrup, Bjarne; Dos Reis, Gabriel: Supporting SELL for High Performance Computing. LCPC '05. [Str05] Stroustrup, Bjarne: A rational for semantically enhanced libraries. LCSD '05.

More Related