RISE-OPTIMIST: Integrated Code Generation
The ultimate goal of software development is to produce an application
that runs on a platform. Code generation is the step that produce
machine code. To generate code, it is necessary to solve the tasks of
instruction selection (choosing semantically equivalent target
instructions), register allocation (deciding which values should
reside in registers) and instruction scheduling (ordering of target
instructions to minimize an objective function). Code generation tasks
are difficult to solve, and from a software engineering viewpoint
it is easier to solve each of them separately. However, there exist strong
interdependencies between them, and solving a single task efficiently
might compromise the decisions taken in a subsequent task, leading to
a final solution of lower quality than expected. This
interdependencies are even more difficult to handle for irregular
architectures that are often chosen for embedded system applications.
Within this project we investigate on a fully integrated code
generation that considers all three subproblems as a single
optimization pass. We use dynamic programming and other techniques
to compress the solution space for producing an optimal code
sequence. Our framework is also retargetable, that is, it allows
to produce code for different processor platforms by providing
a specification of the target architectue in a XML-based
architecture specification language, ADML (Architecture Description
Markup Language). In this way, the target-architecture specific aspects
of code generation are separated from the remaining compiler system
and woven together with it at compiler compilation time to obtain
a platform-specific code generator.
However, further separable aspects could be identified in the remaining
compiler components.
Subproject members
Christoph Kessler, subproject leader
Andrzej Bednarski, PhD student
The general goals of our research
- Given sufficient resources (time and space), generate optimal code
for a given optimization criterion (time, space, energy, or mixed).
- Assess the quality of heuristics that can be compared to the best
achievable performance.
- Provide feed-back to hardware designers for designing/modifying a
target platform.
Plan
In the context of the RISE project, which we joined in July 2004,
we focus on the following issues:
- We study the interdependences between different compiler phases
and do a case study of the software architecture of an
integrated, retargetable code generator.
- We investigate strategies how different aspects of code generation
(instruction selection, scheduling, register allocation,
intermediate representation,
data and control dependence, target architecture specifics,
assembler format, memory management, solution space organization, etc.)
can be (partially) separated and specified more or less independently
in separate modules without
dictating a phase-decoupled software design for the code generator.
Status
- Theory: An algorithm for integrated code generation based on dynamic programming
has been developed in earlier work. A thorough formal description has been
worked out during 2004/05 and submitted for publication.
For practical considerations, we have developed heuristic
relaxations that allow to
also solve larger problem instances (i.e., programs with larger basic blocks)
in an integrated way within reasonable compilation time.
A first application of such heuristics in the context of energy optimization
has been presented at CPC'04.
- We have implemented the integrated code generator and evaluated it.
- We made the system retargetable by factoring out the architecture
specific parts of the system in a separate module (the
ADML specification). As case studies, we have specifications
for the embedded processors ARM9E and ARM9E Thumb, and for the
DSP processor TI-C62x.
- Future work may investigate e.g. the separation of the IR format
from the code generator. Then, the code generator could be
easily reused in other compiler frameworks. However,
ADML specifications are given in terms of a fixed IR.
A potential solution of this dependence
could consist in metamodeling the IR to allow for
a simultaneous configuration of both the code generator code (C++)
and the ADML specification (XML) for a concrete IR format
by invasive software composition.
Publications
See the
OPTIMIST project publication list. OPTIMIST is within RISE since July 2004.
Acknowledgments
This project is partly financed by Ceniit (No. 01.06)
and since July 2004 partially by the SSF RISE project at PELAB.