P
to a
program written in the machine language of a VLIW processor
V
.
The data path of V
is characterised by
alu_no
arithmetic-logic units (ALUs), mul_no
multipliers-dividers (MULs), fpu_no
floating point units
(FPUs), and bau_no
Bus Access Units (BAUs).
Consequently, one very long instruction word will consist of
alu_no
arithmetic/logic instructions of P
(the non-VLIW processor), mul_no
multiply/divide
instructions of P
, fpu_no
floating point
instructions of P
, and bau_no
load/store
instructions of P
.
One of the tasks of our imagined VLIW compiler is to pack the
instructions of a program written for P
in VLIW
instructions to be run on V
. The packing has to be done
considering the data dependencies between the instructions and the
resource constrains (for example, we are not allowed to pack more than
alu_no
arithmetic/logic instructions into one VLIW
instruction).
You will
p
from
˜TDTS55/spec95-big
subdirectory.
bb
from
p
such that bb
contains at least 15
instructions.
V
by
choosing the number of ALUs, alu_no
, the number of MULs,
mul_no
, the number of FPUs, fpu_no
, and the
number of BAUs, bau_no
.
bb
in VLIW instructions such
that the data dependencies and resource constraints are satisfied.
˜TDTS55/bin/vliwc
program.
Details:
In order to select the basic block bb
of p
you run vliwc
˜TDTS55/spec95-big/chosen_benchmark
. It will produce a
listing, where the basic block index, its start address, its end
address and its number of instructions is listed. You can sort the
list, for example, in decreasing order of instructions, by running
vliwc ˜TDTS55/spec95-big/chosen_benchmark | sort -nr +3 -4
| less
. Then you choose a basic block. Write down its index,
start address and end address.
For example, I have chosen
˜TDTS55/spec95-big/ijpeg.ss
. I run vliwc
˜TDTS55/spec95-big/ijpeg.ss
. Then I see that basic block
9512 contains 10 instructions and I choose it. I write down its start
address 0x00460480 and its end address 0x004604d0.
In order to see the contents of the basic block, you will have to
disassemble the program. You do this by running
ssbig-na-sstrix-objdump -d
˜TDTS55/spec95-big/chosen_benchmark | less
. In
less
, if you type /
(slash) you will be
prompted for a pattern to be searched in the listing of
less
. Introduce the start address of the chosen basic
block. If you do not reach that address from the first match, you type
n
for the next match, until you reach the start address
of the chosen basic block.
For my example, I run ssbig-na-sstrix-objdump -d
˜TDTS55/spec95-big/ijpeg.ss
and part of what I get is
pasted below:
00460680 <__udivmoddi4+570> mflo $a2[6] 00460688 <__udivmoddi4+578> mfhi $v1[3] 00460690 <__udivmoddi4+580> andi $a3[7],$t1[9],65535 00460698 <__udivmoddi4+588> mult $a2[6],$a3[7] 004606a0 <__udivmoddi4+590> sll $v0[2],$v1[3],0x10 004606a8 <__udivmoddi4+598> mflo $a1[5] 004606b0 <__udivmoddi4+5a0> srl $v1[3],$t3[11],0x10 004606b8 <__udivmoddi4+5a8> or $v1[3],$v0[2],$v1[3] 004606c0 <__udivmoddi4+5b0> sltu $v0[2],$v1[3],$a1[5] 004606c8 <__udivmoddi4+5b8> beq $v0[2],$zero[0],00460710 <__udivmoddi4+600>
In order to pack the instructions, you need to determine the data dependencies. We neglect artificial dependencies (write-after-read (WAR) and write-after-write (WAW)) and pretend that a preprocessing phase eliminated them. You should focus only on the read-after-write (RAW) dependencies.
In my example, instruction 00460698 <__udivmoddi4+588>
mult $a2[6],$a3[7]
depends on the instructions 00460680
<__udivmoddi4+570> mflo $a2[6]
and 00460690
<__udivmoddi4+580> andi $a3[7],$t1[9],65535
because the
mult
instruction reads from $a2[6]
and from
$a3[7]
while the mflo
instruction writes to
$a2[6]
and the andi
instruction writes to
$a3[7]
.
The instructions and their semantics are listed in Appendix A of The SimpleScalar Tool Set, Version 2.0, D. Burger, T. M. Austin.
Note that some of those "instructions" listed in the appendix are just
macros which expand to a couple of instructions. For example,
div
is such a macro. It expands to first checking whether
we divide by 0 and only after that it performs the division. If you
find an instruction in the output of
ssbig-na-sstrix-objdump
which is not listed in Appendix
A, then please ignore that instruction. Consider it as if it did not
depend on any other instruction.
It is relatively easy to detect RAW dependencies between instructions which do not access the memory. It is slightly trickier for load/store instructions because we do not know the target address at compile time as the target address very often is the value of some register. Therefore, in order to ensure the program correctness, a load instruction should be executed only after all the store instructions which precede it in program order have executed. Additionally, a store instruction has to be executed only after all the load and all the store instruction which precede it in program order have executed. Make sure that you capture these aspects in your dependency graph.
For my example, I obtain the following dependency graph:
You can correct your assignment yourself, by using the
vliwc
program. For that, you'll have to write two files:
the graph_file
and the vliw_file
. The
graph_file
specifies the graph of data dependencies. The
vliw_file
specifies the way the instructions are packed
in VLIW instructions.
Each line in the graph_file
contains one or two
instruction addresses separated by blanks (space or tab). If it
contains two addresses, then it means that the instruction at the
second address depends on the instruction at the first address. If an
instruction does not depend on any other instruction, then its address
appears alone on a line.
For the dependency graph I obtained and depicted in the figure above, I can write the following dependency graph:
0x00460690 0x00460698 0x00460680 0x00460698 0x00460698 0x004606a8 0x004606a8 0x004606c0 0x004606c0 0x004606c8 0x00460688 0x004606a0 0x004606a0 0x004606b8 0x004606b0 0x004606b8 0x004606b8 0x004606c0
In order to pack the instructions such that they fulfill the resource constraints, label each graph node with the resource it uses (ALU, MUL, FPU, BAU, or none). You can deduce the used resources from the same Appendix The SimpleScalar Tool Set, Version 2.0
For my example, I obtain the following labels:
mflo $a2[6] none mfhi $v1[3] none andi $a3[7],$t1[9],65535 ALU mult $a2[6],$a3[7] MUL sll $v0[2],$v1[3],0x10 ALU mflo $a1[5] none srl $v1[3],$t3[11],0x10 ALU or $v1[3],$v0[2],$v1[3] ALU sltu $v0[2],$v1[3],$a1[5] ALU beq $v0[2],$zero[0],00460710 ALU
You specify your VLIW architecture and instruction packing by means of
the vliw_file
. The first line in the file consists of 4
integer numbers separated by blanks (space or tab). The four numbers
represent alu_no
, mul_no
,
fpu_no
, and bau_no
in this order. Each of
the subsequent lines consists of alu_no + mul_no + fpu_no +
bau_no
atoms, where an atom can be an instruction address or
nop
. The first alu_no
atoms can be occupied
only by instructions labelled with ALU
or
none
. The next mul_no
atoms can be occupied
only by instructions labelled with MUL
or
none
and so on.
In my example, I chose 2 ALUs, 1 MUL, 1 FPU, 1 and BAU. My
vliw_file
looks like below:
2 1 1 1 0x00460690 0x004606b0 0x00460680 0x00460688 nop 0x004606a0 nop 0x00460698 nop nop 0x004606b8 0x004606a8 nop nop nop 0x004606c0 nop nop nop nop 0x004606c8 nop nop nop nop
Once you created the graph_file
and the
vliw_file
you can check your solution by running
vliwc -b chosen_basic_block_number -d graph_file -v vliw_file
˜TDTS55/spec95-big/chosen_benchmark
. The program will
inform you if the dependency graph and the packing are correct, the
cost of your solution and it will print out the dependency graph in
the xvcg
format. You can view your dependency graph by
piping the output of vliwc
to the input of
xvcg
.
For my example, I run vliwc -b 9512 -d sorma.graph -v sorma.vliw
˜TDTS55/spec95-big/ijpeg.ss | xvcg -
You will have to try different VLIW architectures (choose different
values for alu_no
, mul_no
,
fpu_no
, and bau_no
) in order to optimise
(minimise) the cost-performance indicator.
graph_file
and the vliw_file
. Please
name them in the form login1_login2.graph
and
login1_login2.vliw
. For example, if your group is formed
by Erik Johansson the 123rd (erijo123@student.liu.se) and Johan
Eriksson the 321st (joher321@student.liu.se) then your files are
called erijo123_joher321.graph
and
erijo123_joher321.vliw
.
vliwc
says that everything is fine, then believe
it. If it says that something is wrong, but you still think that your
solution is correct, then please add an explanation of your solution.
vliwc
vliwc, ssbig-na-sstrix-objdump, xvcg
are in
˜TDTS55/bin
. It's a good idea to add
˜TDTS55/bin
to your PATH
:
setenv PATH /home/TDTS55/bin:"$PATH" rehash