Go: Easy to Read, Hard to Compile

Corner cases when compiling Go

Ian Lance Taylor

Google

Introduction

To really learn a language, write a compiler for it
Compiler bugs imply language complexity
Or, compiler bugs imply differences from C/C++
Sometimes simpler for users is harder for compilers
Fortunately Go is much simpler to compile than C++ or even C

This talk is based on Go compiler bugs encountered over the years.

Recursive Types

Names in Go packages are defined in the entire package, so Go types
can refer to themselves recursively.

type P *P
type S []S
type C chan C
type M map[int]M

This is not permitted in C/C++, except for the special case of a
struct/union/class field which is a pointer/reference.

All Go compiler code that walks over types has to be careful to avoid
endless loops.

Recursive Types

What good is a recursive pointer type? It can only be nil or a
pointer to itself. That's enough for Peano arithmetic.

func Val(p *P) int {
    if p == nil {
        return 0
    } else {
        return 1 + Val(*p)
    }
}

func Add(a, b *P) *P {
    if b == nil {
        return a
    } else {
        a1 := new(P)
        *a1 = a // a1 == a + 1
        return Add(a1, *b) // a + b == Add(a+1, b-1)
    }
}

Recursive Types

Actually, a recursive pointer can have a bit more information: it can
have a finalizer.

// +build ignore

package main

import (
	"fmt"
	"runtime"
)

// 1 START OMIT
type P *P
type S []S
type C chan C
type M map[int]M

// 1 END OMIT

// 2 START OMIT
func Val(p *P) int {
	if p == nil {
		return 0
	} else {
		return 1 + Val(*p)
	}
}

func Add(a, b *P) *P {
	if b == nil {
		return a
	} else {
		a1 := new(P)
		*a1 = a // a1 == a + 1
		return Add(a1, *b) // a + b == Add(a+1, b-1)
	}
}

// 2 END OMIT

func Print(p *P) {
    fmt.Println(Val(p))
}

func Allocate() {
    p := new(P); *p = new(P); **p = new(P)
    runtime.SetFinalizer(p, Print)
    runtime.SetFinalizer(*p, Print)
    runtime.SetFinalizer(**p, Print)
}

func main() {
    Allocate()
    for i := 0; i < 5; i++ {
        runtime.GC()
        runtime.Gosched()
    }
}

Recursive Types

Recursive function types are actually useful: they can implement a
state machine.

type F func(*State) F

type State int

func Begin(s *State) F {
    *s = 1
    return Middle
}

func Middle(s *State) F {
    *s++
    if *s >= 10 {
        return End
    }
    return Middle
}

Recursive types

// +build ignore

package main

import "fmt"

// 1 START OMIT
type F func(*State) F

type State int

func Begin(s *State) F {
	*s = 1
	return Middle
}

func Middle(s *State) F {
	*s++
	if *s >= 10 {
		return End
	}
	return Middle
}
// 1 END OMIT

func End(s *State) F {
    fmt.Println(*s)
    return nil
}

func main() {
    var f F = Begin
    var s State
    for f != nil {
        f = f(&s)
    }
}

Recursive Types

Simple rule: all names at package scope are visible in the entire
package.

Complex consequence: compiler must handle recursive types (also
recursive initializers).

Constants

Go has both typed and untyped constants. They follow the same rules,
except that a typed constant must be representable in its type.

This is reasonably clear for integers, less so for floats.

// +build ignore

package main

import "fmt"

const C1 = 1e-323

const C2 = C1 / 100
const C3 = C2 * 100

const C4 float64 = C1 / 100
const C5 = C4 * 100

func main() {
    fmt.Println(C3, C5)
}

Constants

Go's floating point variables follow IEEE-754 rules.

Constants do not.

// +build ignore

package main

import "fmt"

const C1 = 1e+308
const C2 = C1 * 10
const C3 = C2 / 10

var V1 = C1
var V2 = V1 * 10
var V3 = V2 / 10

func main() {
    fmt.Println(C3, V3)
}

Constants

The special unsafe.Sizeof function returns a constant.

// +build ignore

package main

import (
	"fmt"
	"unsafe"
)

var V1 = 0x01020304
var V2 [unsafe.Sizeof(V1)]byte

func main() {
    *(*int)(unsafe.Pointer(&V2)) = V1
    fmt.Println(V2)
}

Constants

Simple rule: constants are untyped; they are mathematically exact and
do not require type conversions.

Complex consequence: exact floating point behavior depends on the
type.

Name Lookup

Name lookup in a Go compiler is simple compared to many languages.
For every name the scope in which to look it up is obvious. This
makes parsing Go quite simple.

With one exception. What is the scope for i?

func main() {
    i := 1
    f := func() T {
        return T{
            i: 1,
        }
    }
    fmt.Println(i, f())
}

Name Lookup

One possibility.

// +build ignore

package main

import "fmt"

func main() {
    i := 1
    f := func() T {
        return T{
            i: 1,
        }
    }
    fmt.Println(i, f())
}


type T map[int]int

Name Lookup

Another possibility.

// +build ignore

package main

import "fmt"

func main() {
    i := 1
    f := func() T {
        return T{
            i: 1,
        }
    }
    fmt.Println(i, f())
}


type T struct{ i int }

Name Lookup

Simple rule: in a struct composite literal you can use field names as
keys.

Complex consequence: if you don't know the type of the composite
literal, the lookup scope of names used as keys is unclear when
parsing.

Methods

Any named type can have methods. Any struct type can inherit methods
from an embedded field. It follows that you can sometimes call
methods on a variable even if it has an unnamed type.

// +build ignore

package main

import (
	"fmt"
	"os"
)

var V = struct {
    name string
    os.FileMode
}{
    name: "hello.go",
}

func main() {
    fmt.Println(V)
}

Methods

Simple rules: named types can have methods; structs can have embedded
fields.

Complex consequence: unnamed types can have methods.

Conclusion

Go is simpler to compile than most languages
There are still complexities for the compiler
Most complexities stem from making Go easier to write

Thank you

Ian Lance Taylor

Google

iant@golang.org