Mikael
Holmén, mikael.holmen@softlab.ericsson.se
Ericsson AB
– UAB
Core
Network Development - Compiler Technology
This paper
is done as part of the examination of the course Advanced Compiler
Construction. It contains a summary of the paper “Linear Scan Register
Allocation” written by Massimiliano Poletto and Vivek Sarkar [1]. The main
topic is a new algorithm for fast register allocation, usable when compile time,
as well as run time performance of generated code, is of importance.
Register
allocation is an important part of a compiler, heavily affecting the
performance of the generated code. Unfortunately, most aggressive, optimizing
register allocators are computationally expensive due to their use of a graph
coloring framework [2]. Therefore Poletto and Sarkar presents an alternative
algorithm, called linear scan, which is not based on graph coloring. Rather,
given the live ranges of the input program, it allocates registers in a single
pass of the live ranges. The results presented in [1] show that the compile
time when using this algorithm is considerably smaller than when using a graph
coloring allocator, while the quality of the generated code is still fairly
good.
A number of
improvements of the algorithm is also discussed and some are implemented and
tested.
This
section first describes two concepts used in the linear scan algorithm. Then
the algorithm itself is presented.
The
algorithm assumes the the intermediate representation (IR) is numbered in some
way. Which numbering that is used might affect the quality of the produced
code, but it does not affect the correctness of the algorithm. Throughout the implementation
a depth first ordering is used.
Also,
crucial to the algorithm is the notion of a “live interval”. Given a numbering
of the IR, [i,j] is a live interval of a variable v iff there is no instruction
with number i’ such that i’<i and v is live at i’, and there is no
instruction with number j’ such that j<j’ and v is live at j’.
A live
interval is a conservative approximization of a variables live ranges; there
might be “holes” in a live interval where the variable is dead. Interference
between two variables is captured by whether the associated live intervals
overlap or not. An important observation is that the number of overlapping
intervals change only at the start and end points of live intervals. Other
program points are not interesting.
Given
liveness information and a numbering of the IR, live intervals can be computed
easily in a single pass over the IR.
The linear
scan algorithm takes as input a list of live intervals, sorted by increasing
start point. Thus the algorithm can quickly scan forward through the live
intervals by iterating through this list.
At each
step, the algorithm maintains a list, active, which contains the live
intervals that overlap the current program point and are stored in a
register. The active list is sorted by increasing end point.
For each
new interval from the input list the algorithm scans active to see if
any recently active interval expired at the new interval’s start point, that
is, if a member of active has an end point that is smaller (according to
the numbering) than the new interval’s start point, then the interval is
removed from active and the allocated register is freed. Since active
is sorted by increasing end point the algorithm can stop searching as soon as
an interval that is still active is encountered.
If the
number of intervals in active equals the number of registers (R), then
an interval needs to be spilled, and in the implemented version this is done
using a simple heuristic, simply spilling the interval with the largest end point.
This is either the last interval in active or the new interval.
The
algorithm ends when all intervals in the input list have been examined. See [1]
for the complete algorithm.
In the
original paper they do a number of tests where they compare an allocator based
on linear scan with other allocators based on usage count, graph coloring and
bin-packing. Some of the tests seems to be rather useless since they get
virtually the same results no matter which algorithm used, so a more extensive
test suite (with relevant benchmarks) should really be run, but nevertheless it
seems like linear scan is a good algorithm when both compile time and run time
performance of generated code is if importance.
Linear scan
is considerably faster than graph coloring while the produced code is often
only 10-20% slower. Comparing linear scan to binpacking is interesting, but
unfortunately not all tests are run for binpacking so it’s hard to draw any
conclusions.
A usage
count allocator is alot faster than linear scan, but the produced code is
significantly slower.
In the
paper some suggestions for improvements to the algorithm, both aiming to speed
up compilation and to increse quality of the generated code, are presented, but
when evaluationg these improvements it shows that they are really not wortwhile
doing. When speeding up compilation the quality of the generated code decreased
too much, and when trying to make further optimizations to increase quality to
compilation time grew so much a binpacking or graph coloring algorithm would be
better.
Linear scan
register allocation is a simple algorithm, still producing reasonably good
code. It seems to be a good alternative when both compile time and run time
performance of generated code is of importance. However if run time performance
is of uttermost importance you should probably use a graph coloring allocator
instead of trying to improve a linear scan allocator.
[1] Poletto, Sarkar, Linear
Scan Register Allocation, TOPLAS 1999
[2] Chaitin, Auslander,
Chandra, Cocke, Hopkins, Markstein, Register Allocation via coloring,
Computer Languages 6, 47-57.