Review form for CSEM-reviews 2003 ================================= Paper #9, "Predicate-Aware Scheduling: A Technique for Reducing Resource Constraints" by Mihail Smelyanskiy, Scott A. Mahlke, Edward S. Davidson and Hsien-Hsin S. Lee. Reviewer: Andrzej Bednarski Short summary ------------- The paper presents a technique for improving/reducing resource usage for predicated instruction architectures. This is an improvement with respect to the current scheduling techniques that are conservative: i.e. the compiler speculates on the predicate (generally TRUE) and performs resource allocation upon such a result. The algorithm in the paper performs predicate analysis for disjointness, and exploits that property to share resources among disjoint predicated instructions (i.e. instructions whose predicate cannot be evaluated to TRUE simultaneously). Then, PAS is evaluated on MediaBench benchmark and the results are analyzed. The results show improvement on average: the technique performs much better on cyclic parts than on the straight part of codes. The idea at first seems to be great, but the results show only minor performance improvement. In the first part the authors made a study ahead the evaluation from which the reader would expect better results. The main contributions ---------------------- This paper deals with the issue of resource allocation in EPIC architectures which is generally ignored. The main contributions of the paper are the reduction of "wrongly" allocated resources (due to conservative approach of other techniques) and speedup (specially for cyclic portion of code). Resource requirement is an important issue, particularly for processor design stage, where simulation can decide upon the number and type of required resources for a dedicated application. Merits and weaknesses --------------------- Merits: + Improve conservative technique of schedulers/compilers for predicated processors. + Use of a traditional approach of resource table that is adapted to allow multiple instructions in a single slot. + The authors show a deep analysis of their results. Weaknesses: - Minor improvement on the average. - The improvement are obtained after the pipeline of the architecture was modified (this is not possible with existing processors). - The initial issue concerned resource allocation. However, the authors in the evaluation part of the paper concentrate on the speedup only. Numerical rating ---------------- * Significance: 7 * Originality: 7 * Interest to a journal on programming languages and compiler technology: 10 * Quality of experimental evaluation: 8 * Overall organization: 8 * Presentation (language and style): 10 * Length appropriate: 9 * References appropriate: 7 * Overall evaluation: 8 * Recommendation: Accept * Your confidence in your review: 6 Comments to the authors ----------------------- The paper is well organized and pleasant to read. The organization of the paper reflects the research methodology employed by the authors. The results (on the average) show improvement. There is one application that is missing, and I think is quite significant due to its impact on the average figure; the benchmark "gsmencode" is not given in Table 3. This is the one application that shows nearly 40% of improvement with "Predicate Aware Scheduling" on the whole application. Why does this application benefit so significantly from the PAS? The description of may/must-use resources is clear. However, the authors miss to show how they are determined in the example in Section 3.4. Must/May-use is a dynamic property for a resource and is decided during the schedule (where predicate evaluation occurs: cmpp). Minor detail: predicates p1, p2 are incorrectly affected to op4 and op5 in Figure 1(b). Suggestions for improvement --------------------------- - Extension to speculative EPIC architectures such as Itenium seems to be difficult. - Profiling may help to decide on which disjoint instructions combination is most favorable. - How does the algorithm improve on the resource allocation? Does some benchmarks gains in parallelism if compared to base line algorithm? This would be expected, since some resources may be available with PAS, whereas (conservatively) allocated with standard techniques.