1 / 23

/* iComment : Bugs or Bad Comments? */

/* iComment : Bugs or Bad Comments? */. Lin Tan, Ding Yuan, Gopal Krishna, Yuanyuan Zhou Published in SOSP 2007 Presented by Kevin Boos. In a Nutshell. iComment : static analysis + NLP Detects code-comment mismatches Uses both source code and comments. Roadmap. iComment Paper

gail
Télécharger la présentation

/* iComment : Bugs or Bad Comments? */

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. /* iComment:Bugs or Bad Comments? */ Lin Tan, Ding Yuan, Gopal Krishna, Yuanyuan Zhou Published in SOSP 2007Presented by Kevin Boos

  2. In a Nutshell • iComment: static analysis + NLP • Detects code-comment mismatches • Uses both source code and comments

  3. Roadmap • iComment Paper • Motivation • Challenges • Contributions • Approach & Methodology • Results • Related Work • Complexity • Authors’ other works

  4. Motivation • Software bugs affect reliability. • Mismatches between code and developer assumptions // Caller must acquire lock. static intreset_hardware(...) { //access shared data. } static int in2000_bus_reset(...) { reset_hardware(...); }

  5. Prevalence of Comments • Comments = developer assumptions • Must hold locks, interrupts must be disabled, etc. • Other tools do not utilize comments! • Ignore valuable information (dev. intentions)

  6. Code vs. Comments • Developer assumptions can’t always be inferred from source code • Comments and code are redundant • or should be…

  7. Inconsistencies • What’s wrong: comments or code? • Developer mistake • Out of date • Copyand paste error (clone detection) • Bad code might be bugs • Bad comments cause future bugs

  8. Challenges • Parsing and understanding comments • Natural language is ambiguous and varying /* We need to acquirethe IRQ lock before calling … */ /* Lock must be acquired on entry to this function. */ /* Caller must hold instance lock! */ • NLP only captures sentence structure • No concept of understanding • Decent accuracy • Comments may be grammar disasters…

  9. Contributions • First step towards automatically analyzing comments • Combines NLP, machine learning, static analysis • Identifies inconsistent code & comments • Real-world applicability • Discovered 60 new bugs or bad comments • Only two topics: locks & calls

  10. Approach • Two types of comments • Explanatory: /* set the access flags */ • Assumptions/Rules: /* don’t call with lock held */ • Check comment rules topic-by-topic • General framework • Users choose the hot topics

  11. Rule Templates • <Lock L> must be held before entering <Function F>. • <Lock L> must NOT be held before entering <Function F>. • <Lock L> must be held in <Function F>. • <Lock L> must NOT be held in <Function F>. • <Function A> must be called from <Function B> • <Function A> must NOT be called from <Function B> • Other templates exist (see paper) • User can add more templates

  12. Handling Comments • Extract comments • NLP, keyword filters, correlated word filters • Classify comments (rule generation) • Manually label small subset • Create decision tree with machine learning • Decision tree matches comments to templates • Fill template parameters with actual variables • Training is optional for users

  13. Rule Checker • Static analysis • Flow sensitive and context sensitive • Scope of comments • Display the inconsistencies • Sorted by ranking (support probability)

  14. Evaluation • Four large software projects • Two topics: locks and function calls • Average training data: 18%

  15. Results • Automatically detected 60 new bugs and bad comments • 19 new bugs and bad comments already confirmed by developers • False positives exist (38%) • Incorrectly generated rules • Inaccuracy of checking rule

  16. Training Accuracy • Accuracy: % of correct mismatches —— Software-specific training —— —— Cross-software training ——

  17. Related Work • Extracting rules from source code • iComment employs static analysis but not dynamic traces • Annotations • Poor adoption rates • Requires manual effort per comment • Documentation generation • No usage of NLP • iComment also analyzes unstructured comments

  18. Complexity • Detecting inconsistencies • NLP • Abstracted away by tools • Machine learning • Simple manual training rules • Code maintenance • Developers may forget to be thorough • Automatic bug detection • Locking errors are extremely complex

  19. Author Bio • Primary author: Lin Tan • Improving software reliability • Comments • Source code • Execution traces • Manual input • HotComments – prior ideas paper

  20. Author Bio • Secondary author: Ding Yuan • Reliability of large software systems • Better logging • Enhanced output

  21. Author Bio • Professor: Yuanyuan Zhou • Better debuggers, software reliability • Founded PatternInsight

  22. PatternInsight Startup • http://patterninsight.com/

  23. Conclusion • Comment-code inconsistencies are bad • Poorer software quality and reliability • First work to automatically analyze comments • Uses NLP and static code analysis • Detected real bugs in Linux/Mozilla • Manages complexity of code consistency and maintenance

More Related