Understanding and Improving Object-Oriented Software Through Static Software Analysis
Thesis DisciplineComputer Science
Degree GrantorUniversity of Canterbury
Degree NameDoctor of Philosophy
Software engineers need to understand the structure of the programs they construct. This task is made difficult by the intangible nature of software, and its complexity, size and changeability. Static analysis tools can help by extracting information from source code and conveying it to software engineers. However, the information provided by typical tools is limited, and some potentially rich veins of information - particularly metrics and visualisations - are under-utilised because developers cannot easily acquire or make use of the data. This thesis documents new tools and techniques for static analysis of software. It addresses the problem of generating parsers directly from standard grammars, thus avoiding the com-mon practice of customising grammars to comply with the limitations of a given parsing al-gorithm, typically LALR(1). This is achieved by a new parser generator that applies a range of bottom-up parsing algorithms to produce a hybrid parsing automaton. Consequently, we can generate more powerful deterministic parsers - up to and including LR(k) - without incurring the combinatorial explosion that makes canonical LR(k) parsers impractical. The range of practical parsers is further extended to include GLR, which was originally developed for natural language parsing but is shown here to also have advantages for static analysis of programming languages. This emphasis on conformance to standard grammars im-proves the rigour of static analysis tools and allows clearer definition and communication of derived information, such as metrics. Beneath the syntactic structure of software (exposed by parsing) lies the deeper semantic structure of declarations, scopes, classes, methods, inheritance, invocations, and so on. In this work, we present a new tool that performs semantic analysis on parse trees to produce a comprehensive semantic model suitable for processing by other static analysis tools. An XML pipeline approach is used to expose the syntactic and semantic models of the software and to derive metrics and visualisations. The approach is demonstrated producing several types of metrics and visualisations for real software, and the value of static analysis for informing software engineering decisions is shown.