1 / 50

Autograder

Autograder. Rishabh Singh, Sumit gulwani , Armando solar- lezama. A. G. Test-cases based feedback Hard to relate failing inputs to errors Manual feedback by TAs Time consuming and error prone. Feedback on Programming Assignments.

Télécharger la présentation

Autograder

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Autograder Rishabh Singh, Sumitgulwani, Armando solar-lezama A G

  2. Test-cases based feedback • Hard to relate failing inputs to errors • Manual feedback by TAs • Time consuming and error prone Feedback on Programming Assignments

  3. "Not only did it take 1-2 weeks to grade problem, but the comments were entirely unhelpful in actually helping us fix our errors. …. Apparently they don't read the code -- they just ran their tests and docked points mercilessly. What if I just had a simple typo, but my algorithm was fine? ...." 6.00 Student Feedback (2013)

  4. Scalability Challenges (>100k students) Bigger Challenge in MOOCs

  5. Today’s Grading Workflow • defcomputeDeriv(poly): • deriv = [] •     zero = 0 • if(len(poly) == 1): • returnderiv • for e inrange(0, len(poly)): • if(poly[e] == 0): •             zero += 1 • else: • deriv.append(poly[e]*e) • returnderiv • defcomputeDeriv(poly): • deriv = [] •     zero = 0 • if(len(poly) == 1): • returnderiv • for e inrange(0, len(poly)): • if(poly[e] == 0): •             zero += 1 • else: • deriv.append(poly[e]*e) • returnderiv replace derive by [0] Grading Rubric Teacher’s Solution

  6. Autograder Workflow • defcomputeDeriv(poly): • deriv = [] •     zero = 0 • if(len(poly) == 1): • returnderiv • for e inrange(0, len(poly)): • if(poly[e] == 0): •             zero += 1 • else: • deriv.append(poly[e]*e) • returnderiv • defcomputeDeriv(poly): • deriv = [] •     zero = 0 • if(len(poly) == 1): • returnderiv • for e inrange(0, len(poly)): • if(poly[e] == 0): •             zero += 1 • else: • deriv.append(poly[e]*e) • returnderiv replace derive by [0] Error Model Teacher’s Solution A G

  7. Technical Challenges Large space of possible corrections Minimal corrections Dynamically-typed language Constraint-based Synthesis to the rescue

  8. Running Example

  9. computeDeriv • Compute the derivative of a polynomial • poly = [10, 8, 2] #f(x) = 10 + 8x +2x2 • => [8, 4] #f’(x) = 8 + 4x

  10. Teacher’s solution defcomputeDeriv(poly):    result = [] iflen(poly) == 1: return[0] foriinrange(1, len(poly)):         result += [i* poly[i]] return result

  11. Demo

  12. Simplified Error Model • return a  return {[0],?a} • range(a1, a2)  range(a1+1,a2) • a0 == a1 False

  13. Autograder Algorithm

  14. Algorithm ---- ---- .py .sk .out .

  15. Algorithm: Rewriter .py .

  16. Rewriting using Error Model • range(0, len(poly)) • range({0 ,1}, len(poly)) default choice a  a+1

  17. Rewriting using Error Model • range(0, len(poly)) • range({0 ,1}, len(poly)) a  a+1

  18. Rewriting using Error Model • range(0, len(poly)) • range({0 ,1}, len({poly, poly+1})) a  a+1

  19. Rewriting using Error Model • range(0, len(poly)) • range({0 ,1}, {len({poly, poly+1}), len({poly, poly+1})+1}) a  a+1

  20. Rewriting using Error Model () • defcomputeDeriv(poly): • deriv = [] •     zero = 0 • if({len(poly) == 1, False}): • return {deriv,[0]} • for e inrange({0,1}, len(poly)): • if(poly[e] == 0): •             zero += 1 • else: • deriv.append(poly[e]*e) • return {deriv,[0]} Problem: Find a program that minimizes cost metricand is functionally equivalent with teacher’s solution

  21. Algorithm: Translator .sk .

  22. Sketch [Solar-Lezama et al. ASPLOS06] • void main(int x){ • int k =2; • assert x + x == k * x; • } • void main(int x){ • int k =??; • assert x + x == k * x; • } Statically typed C-like language with holes

  23. Translation to Sketch (1) Handling python’s dynamic types (2) Translation of expression choices

  24. (1) Handling Dynamic Types • structMultiType{ • inttype; • int ival;bitbval; • MTStringstr;MTListlst; • MTDictdict;MTTupletup; • } bool int list

  25. Python Constants using MultiType 2 1  int int bool bool bool int list list list [1,2] 

  26. Python Exprs using MultiType x + y  int int int bool bool bool list list list

  27. Python Exprs using MultiType x + y  bool bool bool int int int list list list

  28. Python Expressions using MultiType x + y  Typing rules are encoded as constraints bool int int bool list list

  29. (2) Translation of Expression Choices { , } MultiType modifyMTi( , ){ if(??) return else choicei = True return } // default choice // non-default choice

  30. Translation to Sketch (Final Step) harness main(int N, int[N] poly){ MultiTypepolyMT = MTFromList(poly); MultiTyperesult1 = computeDeriv_teacher(polyMT); MultiTyperesult2= computeDeriv_student(polyMT); assert MTEquals(result1,result2); }

  31. Translation to Sketch (Final Step) harness main(int N, int[N] poly){ totalCost = 0; MultiTypepolyMT = MTFromList(poly); MultiTyperesult1=computeDeriv_teacher(polyMT); MultiTyperesult2=computeDeriv_student(polyMT); ……………… if(choicek) totalCost++; ……………… assert MTEquals(result1,result2); minimize(totalCost); } Minimum Changes

  32. Algorithm: Solver .sk .out

  33. Solving for minimize(x) Binary search for x – no reuse of learnt clauses Sketch Uses CEGIS – multiple SAT calls MAX-SAT – too much overhead Incremental linear search – reuse learnt clauses

  34. Incremental Solving for minimize(x) (P,x) (P1,x=7) Sketch {x<7} Sketch {x<4} UNSAT (P2,x=4) Sketch

  35. Algorithm: Feedback ---- ---- .out

  36. Feedback Generation Correction rules associated with Feedback Template Extract synthesizer choices to fill templates

  37. Evaluation

  38. Autograder Tool for Python Currently supports: - Integers, Bool, Strings, Lists, Dictionary, Tuples - Closures, limited higher-order fn, list comprehensions

  39. Benchmarks Exercises from first five weeks of 6.00x and 6.00 int: prodBySum, compBal, iterPower, recurPower, iterGCD tuple:oddTuple list: compDeriv, evalPoly string: hangman1, hangman2 arrays(C#): APCS dynamic programming (Pex4Fun)

  40. Average Running Time (in s)

  41. Feedback Generated (Percentage)

  42. Feedback Generated (Percentage)

  43. Feedback Generated (Percentage)

  44. Feedback Generated (Percentage)

  45. Average Performance

  46. Why low % in some cases? • Completely Incorrect Solutions • Unimplemented Python Features • Timeout • comp-bal-6.00 • Big Conceptual errors

  47. Big Error: Misunderstanding APIs • eval-poly-6.00x • defevaluatePoly(poly, x): •     result = 0 • foriinlist(poly): •         result += i* x **poly.index(i) • return result

  48. Big Error: Misunderstanding Spec • hangman2-6.00x • def getGuessedWord(secretWord, lettersGuessed): • for letter in lettersGuessed: •     secretWord = secretWord.replace(letter,’_’) • return secretWord

  49. A technique for automated feedback generation Error Models, Constraint-based synthesis • Provide a basis for automated feedback for MOOCs • Towards building a Python Tutoring System rishabh@csail.mit.edu Thanks! Conclusion

More Related