1 / 27

Improving Developer Productivity with Visual Studio IntelliCode

Improving Developer Productivity with Visual Studio IntelliCode. Allison Buchholtz-Au Program Manager II @ allison_au. Shengyu Fu Principal Data Science Lead. What is Visual Studio IntelliCode?.

mtony
Télécharger la présentation

Improving Developer Productivity with Visual Studio IntelliCode

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Improving Developer Productivity with Visual Studio IntelliCode Allison Buchholtz-Au Program Manager II @allison_au Shengyu Fu Principal Data Science Lead

  2. What is Visual Studio IntelliCode? • Range of capabilities that offers new productivity enhancements through artificial intelligence (AI) • AI-assisted IntelliSense: • Uses current code context and patterns based on thousands of highly rated, open-source projects on GitHub. • Predicts the most likely and most relevant suggestions • Argument completion

  3. Demo

  4. Why ML? Why this problem?

  5. Solution Principles

  6. Data Science Jouney • Understanding data first, draw intuition and define metrics before building machine learning model. • Keep engineering constraints in mind, be practical for model productization. • Heavily relying on offline evaluation for model improvement.

  7. Data Source – Open source code Number of C# repos on GitHub with good quality >2K Number of solutions we were able to build and parse to form our training dataset >5k Number of .cs documents in the dataset >200K

  8. Extract Training Data from Source Code Features we can extract about an invocation: • Span start: 139 • Is in conditional bracket: false • Is in loop: false • Class invoked: System.Console • Method/property invoked: WriteLine • Containing class: Program • Containing function: Main • Method is override, static, virtual, definition, abstract, sealed: all false • Invoking kind: named type Used Roslyn APIs to compile each solution to get the • syntax tree • semantic model for each document

  9. What questions can we ask of this dataset? How to make recommendations? Which features are useful? How is C# used? • Which are the most frequently used classes? • Are there patterns in how methods of one class are used? • Which pieces of information extracted by the parser would be helpful? • What is the reasonable code context to look at – the entire document/function or the most recent calls? • Will the same model and parameters work for all classes? • Do we have enough data?

  10. How often is each class used?

  11. How often do we face the cold start problem?

  12. Do we have enough training data? Learning curves: Model precision on training and testing data over a varying number of training data sizes • Prediction on test data improves as training data size increases • Testing and training results converge at similar values •  Allow us to verify when a model has learned as much as it can from the data

  13. Metrics • Precision • Coverage • Average Reciprocal Rank

  14. Modeling Statistical Language Model Deep Learning Model Clustering Model Frequency Model • Difficult to tune the cluster size for each class • Reasonable precision with bigger model size • Much better precision with smaller model size • In Production! • Best precision and coverage • Largest model size • Slowest execution time • Simple • Low precision

  15. Architecture

  16. Offline Model Evaluation An example of a data point constructed from usage data Training Data Test Data Features Model Model Model Model Evaluation Label Prediction Provided by model Test Results

  17. Offline Evaluation Results

  18. Online Evaluation New Invocation Features Model makes recommendation Log a recommendation event If user choose method from recommended list Yes Log a commit event

  19. Current Status and Future Work • Intellicode for member completion was released for C#/C++/XAML in VS and Python/Typescript/Javascript/Java in VSCode. • Method Argument Recommendation has been in preview for C# in VS • Custom Model Training on user's own codebase is in preview for C# in VS • Enable live A/B testing for different models • How do we further improve model in production? • Tune the current statistical language model • Improve model precision by incorporating feedback telemetry within GDPR compliance • Deep learning  - how to reduce the model size and improve runtime? • Expand to line/snippet level completion 

  20. How do DS/Engineering/PM work together?What lessons did we learn?

  21. Specialized for rapid iteration

  22. Metrics

  23. Online Survey • 73% of users across languages see an increase in productivity. • Survey users every quarter.

  24. Finding a common mindset is key

  25. Drive alignment and ownership

  26. Challenges

  27. Questions?

More Related