Review form for CSEM-reviews 2003 ================================= Paper 3, "Region-based Hierarchical Operation Partitioning for Multicluster Processors" Reviewer: Peter Aronsson Summary ======= The paper introduces a clustering algorithm for scheduling of operations on a clustered VLIW architecture. For such architectures it is necessary to determine which cluster an operation should be executed on, and which data need to be sent in between clusters. The paper also tries to classify different clustering algorithms in the context of code generation for VLIW architectures. The proposed clustering algorithm performs this assignment by incrementally merging clusters until they match the given number of clusters in the target architecture. To avoid local maximum points, the algorithm have a refinement stage where operations can be moved from one cluster to another. The clustering algorithm is guided by a cost model that assigns weights to each node (operation) and each edge (data communication/register). The slack, e.g. how much an edge can be "moved in time" without affecting the critical path, is also used for determine which clusters to merge. The refinement phase then makes improvements of the initial clustering by moving operations between clusters. For this it uses several metrics such as cluster weight, system load and gain. The main point is that no scheduling of the clusters is performed but instead an estimation of the execution time of each cluster is performed. In fact, the edges of the operations within a cluster is note considered at all. The paper presents measurements and reports improvements over a known clustering algorithm (the BUG algorithm) up to 22.6% on a 4 cluster VLIW architecture. However, for smaller number of clusters the results are not so well. For a two-cluster machine the average improvement is -0.92%. Main contributions ================= The main contribution is a new clustering algorithm for code generation for VLIW architectures. It is compared to an earlier known algorithm an reports improvements for architectures with a large number of clusters. The main difference between the proposed algorithm and the BUG algorithm is that the BUG algorithm has a more scheduling centric approach, i.e. it will schedule each cluster to gather information on how to merge clusters or move operations. A classification of clustering algorithms is also given. However no explanation whether this classification tries to be complete or not is given. Also, no citation of other classification or taxonomy papers are given. Merits and Weaknesses ===================== Clearly the algorithm is improving both the scheduling quality, in the form of decreased execution time, as well as the execution time of the scheduling algorithm. However, for 2-cluster machines the improvements are negative. This mostly depends on the inaccuracies in their cost models. However, since the execution time of their approach is much lower compared to earlier work, they can actually afford to improve this model, which will probably lengthen the execution times. But a slightly longer execution time of compiling for VLIW architectures are often considered worthwhile if it will increase code quality. Hence, the somewhat simplified cost model can be a drawback for 2-cluster machines where good quality code is needed. Numerical ratings ================= Significance 8 Originality 8 Interest to Journal on programming lanugage and compiler technology 7 Quality of experimental evaluation 7 Overall organization 8 Presentation 8 Length 9 References appropriate 9 Overall rating 8 Recommendation : Accept Reviewers confidence: 6 Suggestions for improvement: Improving the cost model such that 2-cluster architectures will also show improvements over earlier algorithms. Also, give a computational complexity for the algorithm since it is not obvious due to the refinement phase. Improve the classification part to make it more complete, or write a second paper to classify the clustering algorithms in this field.