Improving Developer Productivity with Visual Studio IntelliCode

Improving Developer Productivity with Visual Studio IntelliCode Allison Buchholtz-Au Program Manager II @allison_au Shengyu Fu Principal Data Science Lead

What is Visual Studio IntelliCode? • Range of capabilities that offers new productivity enhancements through artificial intelligence (AI) • AI-assisted IntelliSense: • Uses current code context and patterns based on thousands of highly rated, open-source projects on GitHub. • Predicts the most likely and most relevant suggestions • Argument completion

Demo

Why ML? Why this problem?

Solution Principles

Data Science Jouney • Understanding data first, draw intuition and define metrics before building machine learning model. • Keep engineering constraints in mind, be practical for model productization. • Heavily relying on offline evaluation for model improvement.

Data Source – Open source code Number of C# repos on GitHub with good quality >2K Number of solutions we were able to build and parse to form our training dataset >5k Number of .cs documents in the dataset >200K

Extract Training Data from Source Code Features we can extract about an invocation: • Span start: 139 • Is in conditional bracket: false • Is in loop: false • Class invoked: System.Console • Method/property invoked: WriteLine • Containing class: Program • Containing function: Main • Method is override, static, virtual, definition, abstract, sealed: all false • Invoking kind: named type Used Roslyn APIs to compile each solution to get the • syntax tree • semantic model for each document

What questions can we ask of this dataset? How to make recommendations? Which features are useful? How is C# used? • Which are the most frequently used classes? • Are there patterns in how methods of one class are used? • Which pieces of information extracted by the parser would be helpful? • What is the reasonable code context to look at – the entire document/function or the most recent calls? • Will the same model and parameters work for all classes? • Do we have enough data?

How often is each class used?

How often do we face the cold start problem?

Do we have enough training data? Learning curves: Model precision on training and testing data over a varying number of training data sizes • Prediction on test data improves as training data size increases • Testing and training results converge at similar values • Allow us to verify when a model has learned as much as it can from the data

Metrics • Precision • Coverage • Average Reciprocal Rank

Modeling Statistical Language Model Deep Learning Model Clustering Model Frequency Model • Difficult to tune the cluster size for each class • Reasonable precision with bigger model size • Much better precision with smaller model size • In Production! • Best precision and coverage • Largest model size • Slowest execution time • Simple • Low precision

Architecture

Offline Model Evaluation An example of a data point constructed from usage data Training Data Test Data Features Model Model Model Model Evaluation Label Prediction Provided by model Test Results

Offline Evaluation Results

Online Evaluation New Invocation Features Model makes recommendation Log a recommendation event If user choose method from recommended list Yes Log a commit event

Current Status and Future Work • Intellicode for member completion was released for C#/C++/XAML in VS and Python/Typescript/Javascript/Java in VSCode. • Method Argument Recommendation has been in preview for C# in VS • Custom Model Training on user's own codebase is in preview for C# in VS • Enable live A/B testing for different models • How do we further improve model in production? • Tune the current statistical language model • Improve model precision by incorporating feedback telemetry within GDPR compliance • Deep learning - how to reduce the model size and improve runtime? • Expand to line/snippet level completion

How do DS/Engineering/PM work together?What lessons did we learn?

Specialized for rapid iteration

Metrics

Online Survey • 73% of users across languages see an increase in productivity. • Survey users every quarter.

Finding a common mindset is key

Drive alignment and ownership

Challenges

Questions?

Improving Developer Productivity with Visual Studio IntelliCode

Improving Developer Productivity with Visual Studio IntelliCode

Presentation Transcript

Improving Software “Code Quality“ with Visual Studio Team System Development Edition

Visual Studio Productivity

Visual Studio 2010 SharePoint Developer Tools Overview

Heterogeneous Development with Visual Studio 2010

Fun Programming with Visual Studio

Real World Developer Testing with Visual Studio 2012

Improving Developer Productivity and Software Quality with Microsoft Visual Studio

Visual Studio 2010 SharePoint Developer Tools

Application Analytics with Visual Studio 2010

Improving Productivity

Getting Started with Visual Studio 2010

A Lap Around Visual Studio 2010 for the Visual Basic Developer

Developer Productivity Improvements with Visual Studio 2008 and Office Business Applications

Advanced Debugging with Visual Studio

JBoss Developer Studio

Debugging JavaScript with Microsoft Visual Studio

HTML Applications with Visual Studio .NET

JBoss Developer Studio

Developer Studio

Visual Studio

SharePoint Workflows with Visual Studio

Improving the Software Development Lifecycle with Visual Studio Team System