Faculty Candidate Seminar
Leveraging Code Comments to Improve Software Reliability
Add to Google Calendar
Software reliability is critically important. This work focuses on addressing fundamental challenges of software reliability: obtaining accurate program specifications and discovering tools/languages limitations. In this talk, I will show that comments provide a great data source for obtaining important information, including specifications and problems of current tools/languages. First, I will present a novel approach, iComment, which is the first work to automatically extract specifications from comments written in natural language and use these specifications to detect comment-code inconsistencies, i.e., software bugs and bad comments. Our evaluation on large real-world software such as the Linux kernel, Mozilla, Apache and Wine and 2 types of comments shows that iComment effectively extracted 1832 specifications and detected 60 new bugs and bad comments. iComment combines techniques from different areas, including natural language processing (NLP), machine learning, information retrieval, program analysis and statistics. To help explain the pros and cons of extracting specifications from comments compared to extracting specifications from code, I will briefly discuss AutoISES, which infers security specifications by statically analyzing source code, and then directly use these specifications to automatically detect security bugs/violations. I will also briefly present, cComment, which studies comment semantics and characteristics to further understand what other comments can be utilized, how we can utilize them, and what important problems/limitations they reveal. We discovered many interesting findings that can guide the design of new languages and tools for improving reliability, programmer productivity, software evolution, etc.
Lin Tan is a Ph.D. candidate in the Department of Computer Science at the University of Illinois at Urbana-Champaign. Her research areas include software systems, software reliability and security, with a focus on using interdisciplinary techniques such as machine learning, data mining, computer architecture and program analysis to address systems reliability problems. She currently holds an IBM Ph.D. Fellowship. Her recent work on architectural support for intrusion detection has been successfully transferred and licensed since 2006, and was selected into the IEEE Micro's Top Picks 2006.