Lase : Locating and Applying Systematic Edits by Learning from Examples

Lase: Locating and Applying Systematic Edits by Learning from Examples Na Meng* Miryung Kim* Kathryn S. McKinley*+ The University of Texas at Austin* Microsoft Research+

Motivating Scenario Pat needs to update database transaction code to prevent SQL injection attacks Cold Aold Bold Anew Cnew Bnew

Systematic Editing • Similar but not identical changes to multiple contexts • Manual, tedious, and error-prone • Source transformation tools require describing edits in a formal language[CHP91, ER02, LR95] • Bug fixing tools handle simple stylized code changes[WNGF09, JZDLL12, SMS13] • Sydit does not find edit locations automatically when applying systematic edits[MKM11]

Lase Workflow User selects examples LASE selects methods & suggests edits Bold Aold Dold Iold Xold … … Bnew Anew Dsuggested Isuggested Xsuggested

Approach Overview Bold Aold Syntactic program differencing Extract context Identify common edit Generalize identifier Find edit location Apply edit script Cold Dold III. II. I. Create edit script Phase: Anew Object next = e.next(); ✔ match ✗ no match Object next = v$0.next(); Object next = iter.next(); Bnew Dnew

Step 1. Syntactic Program Differencing Input: mold, mnew Output: Edit operations

Step 2: Identify Common Edit • Longest Common Edit Operation Subsequence Edit script B Edit script A insert(Object next = iter.next()…) update(print(next.getString())) to … insert(if(nextinstanceofMVAction) insert(((MVAction)next).update()) delete(System.out.println(…)) insert(Object next = e.next()…) insert(if(nextinstanceofMVAction) insert(((MVAction)next).update()) update(print(next.toString())) to … insert(Object next = iter.next()…) insert(if(nextinstanceofMVAction)) insert(((MVAction)next).update()) insert(Object next = e.next()…) insert(if(nextinstanceofMVAction)) insert(((MVAction)next).update())

Step 3: Generalize Identifier • Keep the original type, method, and variable names if examples agree • Abstract identifiers if examples disagree Object next = e.next(); Object next = iter.next(); Object next = v$0.next(); 8

Step 4: Extract Context • Iteratoriter=getActions().values().iterator(); • … … • while(iter.hasNext()) • Iteratore = fActions.values().iterator(); • … … • while(e.hasNext()) Object next = e.next(); Object next = iter.next(); Iterator v$0 = u$0:FieldAccessOrMethodInvocation.values().iterator(); … … while(v$0.hasNext())

Phase II. Find Edit Locations Aold Bold Dold Anew • Iteratore = fActions.values().iterator(); Iteratorv$0 = u$0:FieldAccessOrMethodInvocation.values().iterator(); Bnew

Phase III. Applying Edit Script • Customize general edit scripts • Identifier concretization • Edit position concretization • Apply the customized edit scripts

Example 2: • Comment[] getTrailingComments(ASTNode node) { • if (this.trailingComments!= null) { • + if (this.trailingPts>= 0) { • int[] range = (int[]) this.trailingComments.get(node); • + int[] range = null; • + for (inti = 0; range == null && i <= this.trailingPtr; i++) { • + if (this.trailingNodes[i] == node) • + range = this.trailingIndexes[i]; • + } • if (range != null) { • … … • returntrailingComments; • }} • return null; • } • Example 1: • Comment[] getLeadingComments( • ASTNode node) { • if (this.leadingComments!= null) { • + if (this.leadingPts>= 0) { • int[] range = (int[]) this.leadingComments.get(node); • + int[] range = null; • + for (inti = 0; range == null && i <= this.leadingPtr; i++) { • + if (this.leadingNodes[i] == node) • + range = this.leadingIndexes[i]; • + } • if (range != null) { • … … • returnleadingComments; • }} • return null; • } update (if (this.v$0 != null) ) to (if (this.v$1 >= 0) ) insert (int[] range = null; …) insert (for (inti = 0; range == null && i <= this.v$1; i++) …) insert (if (this.v$2[i] == node) …) insert (range = this.v$3[i]; …) delete (int[] range = (int[]) this.v$0.get(node); )

Suggested version: public intgetExtendedEnd (ASTNode node) { int end = node.getStartPosition() + node.getLength(); if (this.v$1 >= 0) { int[] range = null; for (inti = 0; range == null && i <= this.v$1; i++) { if (this.v$2[i] == node) range = this.v$3[i]; } if (range != null) { … … } } else { … … } return end - 1; } Found location: public intgetExtendedEnd (ASTNode node) { int end = node.getStartPosition() + node.getLength(); if (this.trailingComments!= null) { int[] range = (int[]) this.trailingComments.get(node); if (range != null) { … … } } else { … … } return end - 1; }

Outline • Phase I: Creating Abstract Edit Scripts • Syntactic Program Diff • Identify Common Edit • Generalize Identifier • Extract Context • Phase II: Find Edit Locations • Phase III: Apply Edit Script • Evaluation

Test Suite • 24 repetitive bug fixes that require multiple check-ins [Park et al., MSR 2012] • 2 from Eclipse JDT and 22 from Eclipse SWT • Each bug is fixed in multiple commits • Clones of at least two lines between patches checked in at different times • We use the first two changed methods as input examples • 37 systematic edits that require similar changes to different methods

RQ1: Precision, Recall, and Accuracy Precision (P): What percentage of all found locations are correctly identified? Recall (R): What percentage of all expected locations are correctly identified? Accuracy (A): How similar is Lase-generated version to developer-generated version?

On average, Lase finds edit locations with 99% precision, 89% recall, and 91% accuracy. For three bugs, Lase suggests in total 9 edits that developers missed and later confirmed.

RQ2: Sensitivity to number of exemplar edits • 7 cases in the oracle data set • Enumerate subsets of exemplar edits

As the number of exemplar edits increases, • P does not change because exemplar edits are similar, except for case 12 • R is more sensitive to the number of exemplar edits • R increases as a function of exemplar edits • A decreaseswhen exemplar edits are different • A remains the same or increases when the exemplar edits are very similar

Conclusion • Lase automates edit location search and program transformation application • Lase achieves 99% precision, 89% recall, and 91% accuracy • Future Work • Integrate with automated compilation and testing • Automatically detect repetitive change examples to infer program transformations Tool Demo: @Marina, 13:30 onFriday

Acknowledgement • This work was supported in part by the National Science Foundation under grantsCAREER-1149391, CCF-1117902, CCF-1043810, SHF-0910818, CCF-1018271, CCF-0811524, and a Microsoft SEIF award Thank you!

References I • [Meng et al. 2011] Na Meng, Miryung Kim and Kathryn S. McKinley. Systematic editing: Generating program transformations from an example. In PLDI ‘11. • [Kamiya et al. 2002] Toshihiro Kamiya and Shinji Kusumoto and Katsuro Inoue. CCFinder: A multilinguistic token-based code clone detection system for large scale source code. In TSE ’02. • [Lozano et al. 2004] Antoni Lozano and Gabriel Valiente. On the maximum common embedded subtree problem for ordered trees. In C. Iliopoulos and T Lecroq, editors, String Algorithmics, 2004. • [Park et al. MSR 2012] J. Park, M. Kim, B. Ray, and D.-H. Bae. An empirical study of supplementary bug fixes. In MSR ’12.

References II • [JZDLL12] G. Jin,W. Zhang, D. Deng, B. Liblit, and S. Lu. Automated concurrency bug fixing. In PLDI ’12. • [CHP91] J. R. Cordy, C. D. Halpern, and E. Promislow. Txl: A rapid prototyping system for programming language dialects. Computer Languages, 1991. • [G10] S. Gulwani. Dimensions in program synthesis. In PPDP ’10. • [WNGF09] W. Weimer, T. Nguyen, C. Le Goues, and S. Forrest. Automatically finding patches using genetic programming. In ICSE ’09.

References III • [ER02] M. Erwig and D. Ren. A rule-based language for programming software updates. In RULE ’02. • [LR95] D. A. Ladd and J. C. Ramming.A*: A language for imple- menting language processors. In TSE’95. • [SMS13] S. Son, K. S. McKinley, and V. Shmatikov. Fix Me Up: Repairing access-control bugs in web applications. In NDSS’13.

Step 4: Common Edit Context Extraction • Extract all potential common context • Refine the common context • Consistent identifier mapping • Embedded subtree isomorphism • Program dependence equivalence

Step 4: Common Edit Context Extraction (1/4) • Finding common text with clone detection (CCFinder [Kamiya et al. 2002]) 1 1 2 2 3 3

Step 4: Common Edit Context Extraction (2/4) • Identifier generalization • Iteratore = fActions.values().iterator(); • while (e.hasNext()) { • Iteratoriter = getActions().values().iterator(); • while (iter.hasNext()) { Iteratorv$0 = u$0:FieldAccessOrMethodInvocation.values().iterator(); while (v$0.hasNext()) {

Step 4: Common Edit Context Extraction (3/4) • Maximum Common Embedded Subtree Extraction (MCESE) [Lozano et al. 2004] 1 1 3 2 2 3 1 1 1,2,3,-3,-2,-1 1,2,-2,3,-3,-1 2 2 1,2,-2,-1

Step 4: Common Edit Context Extraction (4/4) • Program dependence analysis Iteratore = fActions.values().iterator(); • while (e.hasNext()) { Object next = e.next();

?When more than two examples? Aold Cold Dold Bold EAB EABC EABCD EAC EACD EAD Anew Cnew Dnew Bnew

public void setBackgroundPattern (Pattern pattern){ if (handle == 0) SWT.error(SWT.ERROR_GRAPHIC_DISPOSED); if (pattern == null) SWT.error(SWT.ERROR_NULL_ARGUMENT); if (pattern.isDisposed()) SWT.error(SWT.ERROR_INVALID_ARGUMENT); initGdip(false, false); if (data.gdipBrush != 0) destroyGdipBrush(data.gdipBrush); data.gdipBrush = Gdip.Brush_Clone(pattern.handle); data.backgroundPattern = pattern; } public void setBackgroundPattern (Pattern pattern){ if (handle == 0) SWT.error(SWT.ERROR_GRAPHIC_DISPOSED); if (pattern != null && pattern.isDisposed()) SWT.error(SWT.ERROR_INVALID_ARGUMENT); initGdip(false, false); if (data.gdipBrush != 0) destroyGdipBrush(data.gdipBrush); if (pattern != null) { data.gdipBrush = Gdip.Brush_Clone(pattern.handle); } else { data.gdipBrush = 0; } data.backgroundPattern = pattern; }

Lase : Locating and Applying Systematic Edits by Learning from Examples