1 / 23

Inferring the type systems under data processing programs

Inferring the type systems under data processing programs. Motivation. Data processing programs Retrieving runtime system status, recorded information, … On specific APIs Type systems (structure ?) of data sources are necessary for inspecting and developing programs

hosea
Télécharger la présentation

Inferring the type systems under data processing programs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Inferring the type systems under data processing programs

  2. Motivation • Data processing programs • Retrieving runtime system status, recorded information, … • On specific APIs • Type systems (structure?) of data sources are necessary for inspecting and developing programs • What kinds of data, relations, how to invoke the API • Not easy to establish the type systems • Generic APIs do not reflect the data types • Sufficient and accurate documents are not always available • Reading source code is not always practical

  3. This work • “Systematically inferring the type systems of data sources, through static analysis of data processing programs” • For inspection: detecting problems related to data usages • For programming: sample code snippets for retrieving a specific type of data • Basic idea • Recover entire data flow of the program • Clarify the different ways of API invocations to retrieve data • Challenges • Big scale and complex structure of source code • Complex data retrieving logic

  4. Data processing programs • A simple example • Retrieving memory information of JEE server • Through JMX API

  5. Type system under this program

  6. Inferring the type system getAttribute_Verbose Memory Verbose See what data it gets, and what other data used when getting them.

  7. Challenging for practical programs Complex data flow One instruction to retrieve different kinds of data

  8. Approach Overview

  9. Source Code Analysis • Purpose: recover the data flow from source code • Source code abstraction • Object- and call-site-sensitive points-to analysis • About points-to analysis • A heapH storing all objects allocated in the source code • A points-to mapping Ptshowing what objects a variable may point to • Extension to typical points-to analysis • Tracing the API invocation results: new obtained objects • Depends on constant values: pre-calculation on constants

  10. Data type inference • Raw inference • A new calculus to clarify API invocations • Construct classes and associations accordingly • Code snippets slicing • Backward slicing along data flow • Meta-model refinement • Remove redundant duplicated elements • Meta-model decoration • Names, multiplicity

  11. Raw meta-modeling • Points-to analysis result • for get, cds, s, and the anonymous return value • Calculate the source of • ) • Two classes for from the two clauses, to associations from classes of to the two classes

  12. Code snippet slicing • For an association from • Source and auxiliary variables from the clause • Backtracking the invocations

  13. Refinement and decoration • Refinement • Rewriting rules • Decoration • Empirical namingprinciples

  14. Implementation • Points-to analysis: Extend WALA • Inference: Implement thealgorithms on • WALA • EMF

  15. Experiments • To evaluation the following three aspects • Applied to practical data sources and programs • Useful for inspecting existing programs • Useful for writing new programs • Three experiments • Inference test on typical data sources and open source programs • Result investigation, finding problems for the programs • User study, comparing the programing efficiency with and without the inferred type system

  16. Inference test

  17. Inspection with type systems • Informal but interesting finds for the selected programs • Version incompatibility • Two programs on JOnAS, CarteBlanche and jonasAdmin (4.7) • A “DeploymentPlan” type in CarteBlanche but not jonasAdmin • Conjectures: DeploymentPlan is a feature in a later version and CarteBlanche is not compliant to JOnAS 4.7 • Confirmed by their documents • Incompete support • JabRef sub function to import from MS Bib reference source • 76 out of 77 XML elements supported, without “RefOrder” • Indicating potential improvement • Conclusion: Assist developer in detecting wrong or sufficient use of data source

  18. User study • Four data sources (Exists, JOnAS, Flickr, GeoRss) • 12 problems about retrieving data • Q1: get the ID of a query under processing • Q2: get the ID of a running job • 6 volunteers, 3 grad, 1 ugrad, 2 engineers • Experiment result • Programming efficiency (time spent) • Programming processes

  19. User study result

  20. Findings • Process • Without type systems • Most chose to search the sample clients • Hard to find the proper keyword • Some chose to use the XML schema, but block a while for writing code • Sometimes miss the relation between problems • With type systems • Read the meta-model intuitively, chose the element, go on • Result • Really improve • Significant for related problems • Significant for non-expert developers

  21. Related work • API programming assistant • Restraint: summarize and detect “bad smells” • Guidance: Not formal or precise, but show potential ways • A guidance approach, but for data not API itself • Data type inference • Inferring data types from text and XML • Not from data themselves, but the programs using them, no need for huge amount of sample set • Points-to analysis: A new usage and corresponding extension • Def-use analysis: not just “uses”, but the compositions of “uses” to form sufficient and independent invocation

  22. Conclusion • A novel approach to inferring type systems of data sources under data processing programs • Usage and extend points-to analysis • A new calculus to clarify different API usages • Experiments to show this approach • Applies to practical data sources and programs • Assist program inspection • Assist writing new data processing programs • Future work • Accuracy improvement • More experiments on different APIs

More Related