GOing down the compilation rabbit hole

The Go programming language appears simple because the compiler handles the complexities of running them. This talk will examine the steps of compilation, divided into the front-end (Parsing, Tokenizing, AST Generation) and the back-end (AST to SSA conversion, Optimizations, Assembly).

Abstract

Go's simplicity and speed is achieved through the heavy-lifting by the compiler. The compiler transforms human-readable Go code into highly-optimized machine code. The compiler helps obviate programmers' concern for performance, portability, and code readability.

This talk will be a quick tour of the inner workings of the Go compiler. The audience will develop a general understanding for the compiler, which would be enough to dig further into detailed documentations and even contribute to the Go project.

The talk will focus on Go project's official compiler implementation, gc (go compiler), in lieu of alternate implementations using exiting compiler backends like gcc (gccgo) and llvm (llgo).

The compiler is typically described in the "front-end" and the "back-end". The front-end is concerned with preparing the original code into a compiler-friendly structure called AST (Abstract Syntax Tree). The back-end then takes over to transform the AST into machine code.

Below is the outline of the compilation process.

Compiler Overview > based on go1.19beta1

Front-end (Source Code to AST)
- Lexical Analysis
- Parsing
- AST Generation

Back-end (AST to Machine Code)

- Noding
- Middle End Optimization
- Walk
- SSA Conversion
- SSA Optimization Passes
- Machine Code Generation


We will cover the following points for each compilation phase.

Tokenization (Lexical Analysis) Takes place in:
``` 
src/cmd/compile/internal/syntax/tokens.go (List of tokens) ../tokenizer.go (Turns code in tokens) 
``` 

- Groups characters into tokens
- Tokens: unit of word
- defined in `tokens.go`
- tokens are listed below

> _EOF, _Name, _Literal, _Operator, _AssignOp, _IncOp, _Assign, _Define, _Arrow, _Star, _Lparen, _Lbrack, _Lbrace, _Rparen, _Rbrack, _Rbrace, _Comma, _Semi, _Colon, _Dot, _DotDotDot, _Break, _Case, _Chan, _Const, _Continue, _Default, _Defer, _Else, _Fallthrough, _For, _Func, _Go, _Goto, _If, _Import, _Interface, _Map, _Package, _Range, _Return, _Select, _Struct, _Switch, _Type, _Var, tokenCount, Break, Continue, Fallthrough, Goto, Go, Defer, Def, Not, Recv, Tilde, OrOr, AndAnd, Eql, Neq, Lss, Leq, Gtr, Geq, Add, Sub, Or, Xor, Mul, Div, Rem, And, AndNot, Shl, Shr

Parsing (AST Generation) Takes place in:
``` 
src/cmd/compile/internal/syntax/parser.go (Parsing) ../nodes.go (Noder) 
``` 

- Takes tokens and builds AST (Abstract Syntax Tree)
- AST is representation of source file in tree form
- Nodes correnspond to elements
- expressions
- declarations
- statements

<!-- Go doesn't use an CST, which is different from Python. This is because Go's grammar is simpler than Python's. This may be a interesting point to cover. -->

Noding (Annotated AST Generation) Takes place in:
``` 
src/cmd/compile/internal/types (Compiler types) 
../ir (Compiler AST) 
../typecheck (AST transformation) 
../noder (Create compiler AST) 
``` 

- Go compiler consumes IR (Intermediate Representation), with its own AST definition
- Noding rewrites AST into IR for compiler
- Implementations
- Irgen - aka "noder2" or `-G=3`
- used from Go 1.18
- Unified IR
- in-development
- enable with `GOEXPERIMENT=unified`
- implements import/export and inlining
- "noder" or `-G=0`
- used until Go 1.18
- directly converted pre-type-checked syntax representation in IR

- directly converted pre-type-checked syntax representation in IR # Middle End Optimization Takes place in:

``` src/cmd/compile/internal/deadcode (Dead code elimination) 
../inline (Function call inlining) 
../devirtualize (Devirtualize known inferface method call) 
../escape (Escape analysis) 
``` 

- Perform optimization passes listed above

Highlighting Some Optimizations
1. Inlining

- Embed called functions into caller functions
- Saves overhead of starting new stack with new function call
- Budget of 80 nodes
- If called function exceeds 80 nodes, not inlined

Escape Analysis

- "Escape" variables from stack territory to heap territory
- Allows variables to be accessible even after function disappears

Walk

Takes place in: `cmd/compile/internal/walk`
- Decompose complex statements
- Desugar high-level Go constructs
- ex. `switch` statement: turned into binary search or jump tables
- ex. `map` and `channel`: replaced with runtime calls

SSA Conversion

Takes place in:

``` src/cmd/compile/internal/ssa (SSA passes and rules) 
../ssagen (Convert IR to SSA) 
``` 

- SSA (Single Static Assignment)
- Lower-level IR
- All Variables changed into Immutable Variables (hence "Single Static")
- Easier to optimize code in SSA form
- Introduced in Go 1.7 ([goreleasenotes-1.7])
- Comparing SSA to previous code generation <!--More detail-->
- `GOSSAFUNC`
- Build flag to output SSA compilation in HTML
- Shows code through each step of compilation
- invoke by `GOSSAFUNC=main go build main.go`

goreleasenotes-1.7: #compiler "Go 1.7 Release Notes - Compiler Toolchain"

SSA Optimization

Takes place in: `src/cmd/compile/internal/ssa/compile.go`
- Optimization techniques are listed in `compile.go`

Examples of optimizations:
> PhiElim, CopyElim, ShortCircuit, Decompose, CSE, PhilOpt, NilCheckElim, Prove, BCE, Loop, Fuse, DSE, WriteBarrier, insertLoopReschedChecks, Critical, LikelyAdjust, Layout, FlagAlloc, RegAlloc, LoopRotate

Machine Code Generation

Takes place in:

``` 
src/cmd/compile/internal/ssa (SSA lowering and architecture-specific passes) 
../internal/obj (Machine code generation) 
``` 

- "lower" pass
- "lowers" the abstraction of SSA code from machine-independent into machine-specific
- ex. ARM64: memory operands are possible, so maybe use load-store ops
- Final code pass examples
- dead code elimination
- move value closer to point of usage
- remove local variables that are never read
- register allocation
- stack frame layout: assign stack offsets to local variables
- pointer liveness analysis: compute which on-stack pointers are alive at each safepoint in Garbage Collector
- transform SSA into 'obj.Prog' instructions
- Write out final object file
- contains reflect data, export data, and debugging info.

Slides

Video

GoLab is a conference made by Develer.
Develer is a company based in Campi Bisenzio, near Florence. Our motto is : "Technology to give life to your products". We produce hardware and software to create exceptional products and to improve industrial processes and people's well being.
In Develer we have passion for the new technologies and we offer our clients effective solutions that are also efficient, simple and safe for the end users. We also believe in a friendly and welcoming environment where anybody can give their contribution. This passion and this vision are what we've been driven to organize our conference "made by developers for developers".


Subscribe to our newsletter

We hate spam just as much as you do, which is why we promise to only send you relevant communications. We respect your privacy and will never share your information with third parties.
©2024 GoLab | The international conference on Go in Florence-Design & devCantiere Creativo-Made withDatoCMS