Review form for CSEM-reviews 2003
=================================
Paper 3, "Region-based Hierarchical Operation Partitioning for
Multicluster Processors"

Reviewer: Peter Aronsson

Summary
=======
The paper introduces a clustering algorithm for scheduling of
operations on a clustered VLIW architecture. For such architectures it
is necessary to determine which cluster an operation should be
executed on, and which data need to be sent in between clusters.
The paper also tries to classify different clustering algorithms in
the context of code generation for VLIW architectures.

The proposed clustering algorithm performs this assignment by
incrementally merging clusters until they match the given number of
clusters in the target architecture. To avoid local maximum points,
the algorithm have a refinement stage where operations can be moved
from one cluster to another.

The clustering algorithm is guided by a cost model that assigns
weights to each node (operation) and each edge (data
communication/register). The slack, e.g. how much an edge can be "moved
in time" without affecting the critical path, is also used for
determine which clusters to merge.

The refinement phase then makes improvements of the initial clustering
by moving operations between clusters. For this it uses several
metrics such as cluster weight, system load and gain. The main point
is that no scheduling of the clusters is performed but instead an
estimation of the execution time of each cluster is performed. In
fact, the edges of the operations within a cluster is note considered
at all.

The paper presents measurements and reports improvements over a known
clustering algorithm (the BUG algorithm) up to 22.6% on a 4 cluster
VLIW architecture. However, for smaller number of clusters the results
are not so well. For a two-cluster machine the average improvement is
-0.92%.


Main contributions
=================
The main contribution is a new clustering algorithm for code
generation for VLIW architectures. It is compared to an earlier known
algorithm an reports improvements for architectures with a large
number of clusters. The main difference between the proposed algorithm
and the BUG algorithm is that the BUG algorithm has a more scheduling
centric approach, i.e. it will schedule each cluster to gather
information on how to merge clusters or move operations.

A classification of clustering algorithms is also given. However no
explanation whether this classification tries to be complete or not is
given. Also, no citation of other classification or taxonomy papers
are given.

Merits and Weaknesses
=====================
Clearly the algorithm is improving both the scheduling quality, in the
form of decreased execution time, as well as the execution time of the
scheduling algorithm. However, for 2-cluster machines the improvements
are negative. This mostly depends on the inaccuracies in their cost
models. However, since the execution time of their approach is much
lower compared to earlier work, they can actually afford to improve
this model, which will probably lengthen the execution times. But a
slightly longer execution time of compiling for VLIW architectures are
often considered worthwhile if it will increase code quality. Hence,
the somewhat simplified cost model can be a drawback for 2-cluster
machines where good quality code is needed.


Numerical ratings
=================
Significance 8
Originality 8
Interest to Journal on programming lanugage and compiler technology 7
Quality of experimental evaluation 7
Overall organization 8
Presentation 8
Length 9
References appropriate 9

Overall rating 8

Recommendation : Accept 

Reviewers confidence: 6


Suggestions for improvement:

Improving the cost model such that 2-cluster architectures will also
show improvements over earlier algorithms. Also, give a
computational complexity for the algorithm since it is not obvious due
to the refinement phase.

Improve the classification part to make it more complete, or write a
second paper to classify the clustering algorithms in this field.