Lab Assignment 2: VLIW Processors
Table of Contents
Objective
The purpose of this assignment is to get partial insight on which are the typical problems when developing a VLIW compiler and which are the factors to consider when deciding on a VLIW architecture.
Preparations
For this lab assignment you will need to use the SimpleScalar toolset. Instructions for setting up the toolset are given in the following link.
Assignments
A typical compiler takes a program written in some high-level language,
and translates it to a machine language.
As the development of such a compiler is a task beyond the scope of this course,
you will perform some tasks of a simpler compiler, that translates a program written
in a machine language of a non-VLIW processor, denoted with P
, to a
program written in the machine language of a VLIW processor, denoted with V
.
The data path of V
is characterised by the following units.
alu_no
arithmetic-logic units (ALUs)mul_no
multipliers-dividers (MULs)fpu_no
floating point units (FPUs)bau_no
Bus Access Units (BAUs)
Consequently, one very long instruction word will consist of the following instructions of P
(the non-VLIW processor).
alu_no
arithmetic/logic instructions ofP
mul_no
multiply/divide instructions ofP
fpu_no
floating-point instructions ofP
bau_no
load/store instructions ofP
One of the tasks of our imagined VLIW compiler is to pack
the instructions of a program written for P
into VLIW instructions to be run on V
.
The packing has to be done considering the data dependencies between the instructions and the
resource constrains
(e.g. we are not allowed to pack more than
alu_no
arithmetic/logic instructions into one VLIW instruction).
Here are the assignments you should work on:
-
Select 1 arbitrary benchmark program, denoted with
p
, from the˜TDTS10/spec95-big
subdirectory. -
Select 1 basic block, denoted with
bb
, fromp
, such thatbb
contains at least 15 instructions.
A help on the delimitation of basic blocks can be found on this web page. -
Define the architecture of the VLIW processor
V
by choosing the following numbers of units.- the number of ALUs, denoted with
alu_no
- the number of MULs, denoted with
mul_no
- the number of FPUs, denoted with
fpu_no
- the number of BAUs, denoted with
bau_no
- the number of ALUs, denoted with
-
Pack the instructions of
bb
into VLIW instructions, such that the data dependencies and resource constraints are satisfied. - Explore the trade-off between cost (the more units you add to the VLIW, the higher the cost) and performance (the more units you add, the higher the chance that the basic block will run faster).
˜TDTS10/bin/vliwc
program for the assignments.
Details
In order to select a basic block of a benchmark program, issue the following command line.
vliwc ˜TDTS10/spec95-big/<chosen_benchmark>
<chosen_benchmark>
is the name of the chosen benchmark program.
A list will be produced, It will produce a list of information of basic blocks, including
- the index number of a basic block
- the start address of the basic block
- the end address of the basic block
- the number of instructions contained in the basic block
You can sort the list, for example, in decreasing order of the instruction indices, using the following command line.
vliwc ˜TDTS10/spec95-big/<chosen_benchmark> | sort -nr +3 -4 | less
<chosen_benchmark>
is the name of the chosen benchmark program.
Then you can choose an arbitrary basic block from the list. Remember to write down its index, start address and end address.
For example, you can choose the ˜TDTS10/spec95-big/ijpeg.ss
benchmark program.
Then you should issue the following command line to get the list of basic blocks of the benchmark program.
vliwc ˜TDTS10/spec95-big/ijpeg.ss
9512
contains 10
instructions,
The basic block starts from address 0x00460480
and ends at address 0x004604d0
.
In order to see the contents of the basic block, you will have to disassemble the benchmark program. This can be done by using the following command line.
ssbig-na-sstrix-objdump -d ˜TDTS10/spec95-big/<chosen_benchmark> | less
<chosen_benchmark>
is the name of the chosen benchmark program.
In less
, if you type /
(slash),
you will be prompted for a pattern to be searched in the listing of less
.
While being prompted, you should type in the start address of the chosen basic block.
If you do not reach that address from the first match,
you can type n
for the next match, until you reach the start address of the chosen basic block.
For example, you can issue the following command line
to obtain the basic block starting from address 0x00460480
.
ssbig-na-sstrix-objdump -d ˜TDTS10/spec95-big/ijpeg.ss
00460680 <__udivmoddi4+570> mflo $a2[6] 00460688 <__udivmoddi4+578> mfhi $v1[3] 00460690 <__udivmoddi4+580> andi $a3[7],$t1[9],65535 00460698 <__udivmoddi4+588> mult $a2[6],$a3[7] 004606a0 <__udivmoddi4+590> sll $v0[2],$v1[3],0x10 004606a8 <__udivmoddi4+598> mflo $a1[5] 004606b0 <__udivmoddi4+5a0> srl $v1[3],$t3[11],0x10 004606b8 <__udivmoddi4+5a8> or $v1[3],$v0[2],$v1[3] 004606c0 <__udivmoddi4+5b0> sltu $v0[2],$v1[3],$a1[5] 004606c8 <__udivmoddi4+5b8> beq $v0[2],$zero[0],00460710 <__udivmoddi4+600>
In order to pack the instructions, you need to determine the data dependencies. We neglect artificial dependencies, namely write-after-read (WAR) and write-after-write (WAW) dependencies, as we assume that a preprocessing can eliminate them. Therefore, you should focus ONLY on the read-after-write (RAW) dependencies. You can find descriptions on data dependencies from this web page.
In the presented example, instruction
00460698 <__udivmoddi4+588> mult $a2[6],$a3[7]
depends on the instructions 00460680 <__udivmoddi4+570> mflo $a2[6]
and 00460690 <__udivmoddi4+580> andi $a3[7],$t1[9],65535
.
This is because
the mult
instruction reads from $a2[6]
and from $a3[7]
,
while the mflo
instruction writes to $a2[6]
and the andi
instruction writes to $a3[7]
.
You can find the instructions and their semantics in the following resources.
- Appendix A of The SimpleScalar Tool Set.
Note that some of those "instructions" listed in the appendix are just
macros which expand to a couple of instructions.
For example, div
is such a macro,
which first checks if divided-by-zero occurs, and only if not, performs the division thereafter.
ATTENTION: If you find an instruction in the output of ssbig-na-sstrix-objdump
but it is not listed in the appendix,
please ignore that instruction and consider it as if it did not depend on any other instructions.
It is relatively easy to detect RAW dependencies
between instructions which do NOT access the memory.
It is slightly trickier to detect RAW dependencies
between load/store instructions which access the memory,
because, at compile time, we do not know the target address,
which is very often the value of a register.
Therefore, in order to ensure the program correctness,
you have to capture the following two types of RAW dependencies
between load/store instructions.
- A load instruction should be executed only after all the store instructions which precede it in the program order have been executed.
- A store instruction has to be executed only after all the load and store instructions which precede it in the program order have been executed.
For the presented example, you can obtain the following dependency graph.
You can correct your assignment yourself, by using the vliwc
program.
For that, you'll have to write two files,
the <graph_file>
and the <vliw_file>
,
described as follows.
- The
<graph_file>
specifies the graph of data dependencies. - The
<vliw_file>
specifies the way the instructions are packed into VLIW instructions.
Each line in the <graph_file>
contains one or two instruction addresses
separated by blanks (space or tab).
If a line contains two addresses, this means that the instruction at the
second address depends on the instruction at the first address.
If an instruction does NOT depend on any other instructions,
then its address appears alone on a line.
The dependency graph depicted in the above figure can be described as lines of the <graph_file>
as follows.
0x00460690 0x00460698 0x00460680 0x00460698 0x00460698 0x004606a8 0x004606a8 0x004606c0 0x004606c0 0x004606c8 0x00460688 0x004606a0 0x004606a0 0x004606b8 0x004606b0 0x004606b8 0x004606b8 0x004606c0
In order to pack the instructions such that they fulfill the resource constraints,
you should label each graph node with the type of resources it uses,
such as ALU
, MUL
, FPU
, BAU
, or none
.
You can deduce the used types of resources from the following resources.
- Appendix A of The SimpleScalar Tool Set.
For the presented example, you can obtain the following labels:
mflo $a2[6] none mfhi $v1[3] none andi $a3[7],$t1[9],65535 ALU mult $a2[6],$a3[7] MUL sll $v0[2],$v1[3],0x10 ALU mflo $a1[5] none srl $v1[3],$t3[11],0x10 ALU or $v1[3],$v0[2],$v1[3] ALU sltu $v0[2],$v1[3],$a1[5] ALU beq $v0[2],$zero[0],00460710 ALU
You can specify your VLIW architecture and instruction packing
by means of the <vliw_file>
.
The first line in the <vliw_file> consists of 4 integer numbers
separated by blanks (space or tab).
The 4 numbers represent the alu_no
, mul_no
,
fpu_no
, and bau_no
in this order.
Each of the subsequent lines consists of alu_no + mul_no + fpu_no +
bau_no
atoms.
Each atom can be the address of an instruction or a nop
.
The first alu_no
atoms can be occupied
only by the addresses of the instructions labelled with ALU
or none
.
The next mul_no
atoms can be occupied
only by the addresses of the instructions labelled with MUL
or none
,
and so on.
In the presented example, you may chose
2 ALUs
, 1 MUL
, 1 FPU
, 1 BAU
.
The <vliw_file>
looks as follows.
2 1 1 1 0x00460690 0x004606b0 0x00460680 0x00460688 nop 0x004606a0 nop 0x00460698 nop nop 0x004606b8 0x004606a8 nop nop nop 0x004606c0 nop nop nop nop 0x004606c8 nop nop nop nop
Once you have created the <graph_file>
and the
<vliw_file>
,
you can check your solution by running the vliwc
program using the following command line.
vliwc -b <chosen_basic_block_number>
-d <graph_file> -v <vliw_file>
˜TDTS10/spec95-big/<chosen_benchmark>
<chosen_basic_block_index>
is the index number of the chosen basic block,
<graph_file>
is the name of the graph file,
<vliw_file>
is the name of the VLIW file,
and <chosen_benchmark>
is the name of the chosen benchmark program.
The vliwc
program will inform you
whether the dependency graph as well as the packing of instructions are correct.
It will also give the cost of your solution and print out the dependency graph in the xvcg
format.
You can view your dependency graph by piping the output of vliwc
to the input of the xvcg
command.
For the presented example, you can issue the following command line.
vliwc -b 9512 -d example_ijpeg.graph -v example_ijpeg.vliw ˜TDTS10/spec95-big/ijpeg.ss | xvcg -
You need to explore different VLIW architectures
(choose different values for alu_no
, mul_no
,
fpu_no
, and bau_no
)
in order to optimise (minimise) the cost-performance indicator.
What to report
- Indicate the name of the benchmark program and the index number of the basic block you chose. The basic block has to contain AT LEAST 15 instructions.
-
The
<graph_file>
and the<vliw_file>
. - Some comments on the chosen architecture: which other architectures you have tried and what cost-performance ratios you have obtained.
-
If
vliwc
says that everything is fine, then believe it. If it says that something is wrong, but you still think that your solution is correct, then please add an explanation of your solution. -
Report any bugs in the
vliwc
program
Resources
vliwc
,ssbig-na-sstrix-objdump
, andxvcg
located in the directory˜TDTS10/bin
.- D. Burger, and T. M. Austin, The SimpleScalar Tool Set, Version 2.0, 1997.
- T. M. Austin, A User's and Hacker's Guide to SimpleScalar Architectural Tool Set, 1997.
- William Stallings, Computer Organization and Architecture, 7th edition, Prentice Hall International, 2006.
Page responsible: Zebo Peng
Last updated: 2012-11-20