770 likes | 898 Vues
A Software Engineering Tool for Distributed Development. Jason Carter Prasun Dewan University of North Carolina at Chapel Hill. Motivation. Grrr …. Hmm… is Bob stuck?. Programmer Bob. Programmer’s Mentor/Teammate. Applications. Need Help?. Offer help to student programmers
E N D
A Software Engineering Tool for Distributed Development Jason Carter Prasun Dewan University of North Carolina at Chapel Hill
Motivation Grrr… Hmm… is Bob stuck? Programmer Bob Programmer’s Mentor/Teammate
Applications Need Help? Offer help to student programmers who are too shy to ask for it Significantly improve programmer productivity Benefits of this idea may occur in industry Manager Student
Co-located vs. Distributed Distributed Team • [2]Herbsleb, J.D., et al. Distance, dependencies, and delay in a global collaboration. in Proc. CSCW 2000. Less productivity More productivity Co-Located Team
Productivity Higher in War-rooms Than In Cubicles • Teasley, S., et al. How does radical collocation help a team succeed? in Proc. CSCW 2000. War-room Cubical Combined these studies show
Distance Impedes Deduction Hmm… is Alice stuck? Bob Grrr… Distance Developers often do not explicitly ask for help How do we reduce this gap? Alice
CollabVS Developers are aware of methods their distributed teammates are working on • Hedge R. and Dewan P. Connecting Programming Environments to Support Ad-Hoc Collaboration in 23rd IEEE/ACM International Conference on ASE. 2008. Use this information with project information to manually determine if teammate is stuck Distributed users the feeling of “being there” in a single location
Can We Do Better Than Being There? Face-to-Face Interaction - “Being There” • Hollan, J. and Scott S. Beyond being there. CHI ’92. Bob Bob “Beyond Being There” Alice How do we plan to go “beyond being there”? Alice
“Beyond Being There” You are having difficulty. Bob is having difficulty. Programmer Bob There are several ways to infer this information Programmer’s Mentor/Teammate
Automatic Prediction of Frustration • Kapoor, A., Burleson, W., and Picard, R.W. "Automatic Prediction of Frustration," International Journal of Human-Computer Studies, Vol. 65, Issue 8, 2007. PROBLEM: Overhead of using this non-standard equipment Posture Seating Chairs Wireless Bluetooth skin conductance tests Video Camera Pressure Mice Determine this information by logging interaction with some component of the system Alternative approach
Determine IF Programmers Are Interruptible • Fogarty, J., Ko, A.J., Aung, H.H., Golden, E., Tang, K.P. and Hudson, S.E. Examining task engagement in sensor-based statistical models of human interruptibility. In Proc. CHI 2005, ACM Press (2005), 331-340. Developed a tool that uses developers’ actions to determine if they are interruptible Randomly interrupted developers Interruptibility 0 100 Can we use the this approach?
Information about Events No random interruption would find a developer is having difficulty Interruptibility 0 100 A better alternative is to allow developers to report their status
USE Buttons To Collect Information Buttons used to indicate status “Eureka Button” To capture situations in which developers did not realize they had been having a problem until they had solved it “Notifications Enabled” These buttons are useful only for the training phase Allowed developers to determine if they received status change notifications Useful to run an initial naïve algorithm
Basic Intuition You are having difficulty. Monitor progress of developers progress < than some threshold Threshold indicate that they are having difficulty Progress 0 100 related but fundamentally different
Relationship Between Productivity and Progress Usually measured after developers have written code Measured while developers write code Little work has been done on measuring progress The only work we could find was done by Kersten and Murphy
Mylar: Tool to Reduce Navigation • Kersten, M., Murphy, G. C., Mylar: A degree-of-interest model for IDEs. In Proc. Conference on Aspect-Oriented Software Development, 2005, 159-168. # of Edit Commands Edit Ratio # of Navigation Commands
Metrics To Measure Progress > Low Threshold & Edit Ratio # of Debug Commands Stuck Having Difficulty Naïve algorithm did not predict the progress status well Explore the logs and corrections to derive a better algorithm Y. Sharon. Eclipseye—spying on eclipse. Bachelor’s thesis, University of Lugano, June 2007.
Deriving Mining Algorithm Analyze Logs To find patterns when developers indicated they were having difficulty Features Values that change when programmers are making progress and having difficulty A manual inspection of the logs showed that the frequency of certain edit commands decreased when developers were having difficulty
Commands Grouped Into Five Categories Depending on the developer, the frequency of execution of other commands increased We used this categories to create our features
Identifying Features For different segments of the log we calculated: the occurrences of each category of commands in that segment 100 * total # of commands in the segment Used these percentages as features to identify patterns The size of these segments is an important issue
Determining Segment Size <action> <eventType>SOLUTION_OPENED</eventType> <solutionEvent> <timestamp>9/20/2009 12:44:02 PM</timestamp> </solutionEvent> </action> <action> <commandEvent> <command>Debug.Start</command> <timestamp>9/20/2009 12:45:33 PM </timestamp> </commandEvent> </action> <action> <eventType>WINDOW_LOST_FOCUS</eventType> <windowEvent> < timestamp> 12:46:01 PM</ timestamp > </windowEvent> </action> <action> <eventType>WINDOW_GAINED_FOCUS</eventType> <windowEvent> < timestamp> 12:48:01 PM</ timestamp > </windowEvent> </action> Segment Sizes: Whole Log 200 100 25 50 Graphed the programming behavior of all participants to determine usefulness of features
Graphs to Validate Features (Cont.) The two graphs validate our feature choice show that a general model must account for differences in what percentages change when developers are having difficulty There are several standard ways to build a general model
Number of Stuck Events Significantly Less than Total Number of Events Total Events: 2288 This leads to the imbalance class distribution problem 29
Imbalanced Class Distribution Disproportionate number of having difficulty segments to making progress segments Needle in a haystack “Standard” algorithms to predict making progress ~97% of the time Accuracy of this model: 83% Problem: Model can’t identify when a developer is having difficulty
SMOTE Algorithm Replicates rare data, having difficulty, until there is more of a balance between having difficulty statuses and making progress statuses The replicated data of all developers were combined and used as input to several standard algorithms to build a model Making Progress Having Difficulty 2212 76 1216
Build A Model Applied mining algorithms Participant1-2 Participant3-4 Participant5-6 Logs – Replicated Data 10 trials of model construction executed Standard 10 fold cross-validation Each trial used 90% of data for training The remaining 10% used as test data to evaluate the model in that trial Model
Accuracy of Model Using Decision Tree Algorithm • Witten, I.H. and Frank, E. (1999) Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann. Developers were having difficulty 1216 times Developers were making progress 2288 times
Classification via Clustering Designed to identify rare events without replicating records • Witten, I.H. and Frank, E. (1999) Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann. Developers were having difficulty 76 times Developers were making progress 2288 times
Our Approach Approach is promising Left several unanswered questions
Does Approach Work In Practice? Decision Tree Model Features Edit Percentage Debug Percentage Focus Percentage Navigation Percentage Classification Via Clustering Model Remove Percentage Research group and one industrial developer used software Learned several important lessons
Having Difficulty Lesson Stuck button and the having difficulty status hurt my advisor’s ego
Frequent False Positives Lesson Workflow system Industry Developer Building a new product Started a new session The navigations performed to build the working set of files Sometimes needed more time to determine if the predicted change of status was correct
label aggregation technique 50 50 50 50 # of events Compute Features Compute Features Compute Features Compute Features 250 Two techniques account for the fact that developers’ status does not change instantaneously To give a more detailed explanation of how this works
Implementation of Developer Notifications There are 5 status predictions when reporting a dominant status every 250 events Indeterminate Status Prediction Slow Progress Slow Progress Slow Progress Status Prediction Slow Progress Status Prediction Slow Progress Slow Progress Status Prediction Slow Progress Making Progress Making Progress Status Prediction Making Progress Allowed the developer to correct a predicted status to indeterminate
Results of Pilot Study There was a total of 88 predictions made Every hour we switched models without interrupting the user Accuracy of this study is good Large number of false negatives How do we improve our accuracy?
Cost of Processing Incremental Input Events Lesson Advisor noticeable intolerable 3-year old laptop
Changes In The Tool Increased Programming Time and Effort Decision Tree Model Algorithm Classification Via Clustering Model Do not share code
Solution To Creating New Code For Each Programming Environment Build an architecture that is independent of the programming environment Decision Tree Algorithm Architecture Developers’ actions Classification Via Clustering Algorithm Supports interoperability Also put process on server
Architecture Made up of several modules
event-interception module Developers’ actions Serialized object Developers’ actions WOX XML This module does several things: Captures events from both Eclipse and Visual Studio Passes these events to the prediction modules Prediction modules are written in C# so events from Visual Studio could be passed directly Java events were converted to C# using standard libraries
Predication Modules Mediator Status Aggregator WOX IKVM Event Aggregator Feature Extractor Prediction Manager Previous Model Mediator allows modules to be loosely coupled We can use several algorithms for event aggregation
Discrete Chunks/Sliding Window Aggregation Algorithm Discrete chunk of 3 events Window, Window Size = 3 Can this tool work with professional programmers?
Controlled User Study 14developers 9student programmers Having difficulty is rare 5 industry programmers Make sure developers face difficulty during the study Tasks are no impossible to solve We use ACM programming problems
ACM Programming Problems Mid-Atlantic ACM Programming Contest http://midatl.radford.edu/ • http://midatl.radford.edu/ Is self reporting reliable?