Static analysis tools
for Go code comprehension and refactoring
golang nyc meetup
13 Nov 2014
Alan Donovan
Google, New York
golang nyc meetup
13 Nov 2014
Alan Donovan
Google, New York
This talk was presented at the GothamGo Kickoff Meetup in New York City, November 2014.
2Programmers say "writing code", but most of that time is spent reading
Actually: reading code, and making logical deductions
Machines are good at reading and logic
Machines should be helping us read code.
And write it! Refactoring can be tedious and error-prone.
3oracle
, godoc
gorename
, eg
Supports interactive, editor-integrated queries:
- where is this thing defined?
- what are the methods of this type?
- what are the free variables of this selection?
- what might this expression point to?
- what dynamic types might this interface or reflect.Value
contain?
Package view
- method set and implements relation for every type
- internal call graph of every package
Source code view
- kind, type, definition of every identifier
- method set and implements relation for every type
- peers of every channel send/receive
- callers of every function
- callees of every call site (even dynamic)
Mostly in golang.org/x/tools
repo
Computes mappings from:
- identifiers to definitions
- constant expressions to values
- expressions to types
Many subtle corners:
- method set computation
- recursive interfaces
- shifts
Making it correct, fast, and clean was a substantial project
Author: Robert Griesemer
9Typed abstract syntax trees (ASTs) are too varied and subtle to analyze directly
Programs are lowered into Static Single-Assignment form (SSA):
- simplifies dataflow analyses since reaching definitions are implicit
- invented 1991, now mainstream (gcc, llvm)
All Go programs can be expressed using only ~30 basic instructions
Simple, explicit, high-level, high source fidelity
The llgo project is using go/ssa as a front-end for LLVM
10
Pointers complicate reasoning about program behaviour
- functions may be called dynamically
- a variable may be updated and accessed via different names ("aliases")
- runtime types are values too: interface
, reflect.Type
We use pointer analysis to answer the question:
which variables might this pointer point to?
Let pts(p)
be the points-to set of pointer p
Its elements are abstract variables: globals, locals, unnamed (new(T)
)
Overview:
Statement Constraint y = &x pts(y) ⊇ {x} y = x pts(y) ⊇ pts(x) "inclusion-based" *y = x ∀o ∈ pts(y). pts(o) ⊇ pts(x) y = *x ∀o ∈ pts(x). pts(y) ⊇ pts(o) y = &x->f \ y = x.(T) } not shown reflection /
All Go operations can be expressed in these constraints
Function, map, slice and channel ops all simplify to struct ops
Treatment of unsafe.Pointer
conversion is unsound
But we don't care! This isn't a compiler
The constraint system is:
- field-sensitive: struct fields x.f and x.g have distinct solutions
- flow-insensitive: the order of statements is ignored
- context-insensitive: each function is analyzed only once
- whole-program: Go source is required for all dependencies
- inclusion-based: i.e. Andersen's analysis
Optimization: don't make constraints for "uninteresting" types
such as types not related to a one-shot Oracle query
Constraint system for 124KLoC program (oracle) has:
254,000 variables
184,000 equations
Solving phase is in O(|v|³), so pre-solver optimisation is crucial
We can bring this down to:
88,000 variables
65,000 equations
p = &x a = &x if ... { q = p b = a p, q = &x, &y r = q c = b } else { s = r if ... { a = c } p, q = &y, &x }
Hash-Value Numbering
16It's transitive closure of a graph, essentially
Lots of optimizations, for example:
sparse bit vectors, a very compact representation for points-to sets
golang.org/x/tools/container/ints
Solver log is >1GB. Debugging is fun.
17The call graph can be read directly from the solution
The golang.org/x/tools/cmd/callgraph
tool prints raw call graphs
Demo (time permitting)
18A refactoring tool, usable from...
% gorename -from bytes.Buffer.Len -to Size Renamed 105 occurrences in 42 files in 33 packages.
All renamings are reversible
Sound: either a conflict is reported, or the refactoring preserves behaviour*
*except reflection
golang.org/x/tools/cmd/gorename
20A Go port of the Java Refaster tool
A template specifies the pattern and replacement as Go expressions:
package P import ( "errors"; "fmt" ) func before(s string) error { return fmt.Errorf("%s", s) } func after(s string) error { return errors.New(s) }
Parameters are wildcards
% eg -t template.go <package> ...
From the outset, Go has been renowned for its tools: go
, gofmt
We have many building blocks for sophisticated source analysis tools
What should we build next?
24golang nyc meetup
13 Nov 2014