Review form for CSEM-reviews 2003 ==================================== Paper #5, "Local Scheduling Technique for Memory Coherence in a Clustered VLIW Processor wit a Distributed Data Cache" by Enric Gibert, Jesus Sanchez, Antonio Gonzalez Reviewer: Andrzej Bednarski Short summary ------------- The paper presents a software solution to the problem of data instruction inconsistency for homogeneous clustered VLIW architectures with hierarchical memory model. The authors present two static scheduling algorithms (MDC and DDGT) that assure data consistency of resulting code. Both alogorithms are evaluated and analyzed using Mediabench benchmark suite. The authors also compare benefits of both approaches. The goal is to increase local hit ratio and keep execution time low. In order to decrease the amount of remote hits (i.e. improve the local hit ratio), a solution consists in adding attraction buffers (hardware solution) to the architecture. This implies modification to the initial algorithms (provided by the authors) to cope with additional copies of data, and thus adding new possible inconsistencies. The main contributions ---------------------- The main contribution is a new scheduling algorithm for clustered architectures with distributed data cache that does not require hardware support. Evaluation of two different strategies show a decrease in number of memory stalls. Further, the authors analyzed performance bottleneck and try to characterize situation for which proposed algorithm is more suitable: this is left for future work. Merits and weaknesses --------------------- Merits: + Address the loop issues, that is central in classical DSP programs. + Optimizing technique: time but similarly power, by increasing local hit ration. + A software solution. Weaknesses: - The paper is not stand alone. The reader needs to brows the authors' previous publication for deeper understanding. This is related for instance to the profiling information that is crucial for MDC strategy for placing instruction on specific clusters. - In the result section, the authors compare both methods relative to each other. There is no base line comparison which would actually show the improvement/degradation on the resulting code. - Attraction buffers appear to be sort of patch to improve numerical evaluation. Such improvements are not realizable with existing architectures. Numerical rating ---------------- * Significance: 7 * Originality: 8 * Interest to a journal on programming languages and compiler technology: 8 * Quality of experimental evaluation: 8 * Overall organization: 8 * Presentation (language and style): 7 * Length appropriate: 8 * References appropriate: 8 * Overall evaluation: 8 * Recommendation: Accept * Your confidence in your review: 6 Comments to the authors ----------------------- The paper is well organized and entertaining. I specially liked the pseudo code that helps significantly to understand authors' approach. Both algorithms operate on intermediate representation by adding edges into the graph. The authors should mention which scheduling strategy do they use. Adding edges into the DDG decreases the freedom of the scheduler, which limits the number of valid schedules that respect data dependencies, and memory coherence. Suggestions for improvement --------------------------- The last part feels like a "patch" for improving performance. The authors opt for a hardware solution to decrease the number of remote hits. It improves the results, but unfortunately does not add merits to authors' work. The authors propose an adopted version of algorithm (DDGT) for architecture with attraction buffers. Pseudo code for MDC would be appreciated as well. The authors do not provide information how do they perform application profiling (the reader need to brows the previous work). This holds as well for the heuristics issue that are only named in this paper. Minor remark: for readability, Figure 5 should include the delays, as in Figure 3.