Overview | Publications | Code Examples | Download | License | Ongoing Work | Contact | Acknowledgments |
Exploiting effectively massively parallel architectures is a major challenge that stream programming can help to face. We investigate the problem of generating energy-optimal code for a collection of streaming tasks that include parallelizable or moldable tasks on a generic manycore processor with dynamic discrete frequency scaling. Streaming task collections differ from classical task sets in that all tasks are running concurrently, so that cores typically run several tasks that are scheduled round-robin at user level in a data-driven way. A stream of data flows through the tasks and intermediate results may be forwarded to other tasks like in a pipelined task graph. In this paper we consider crown scheduling, a novel technique for the combined optimization of resource allocation, mapping and discrete voltage/frequency scaling for moldable streaming task collections in order to optimize energy efficiency given a throughput constraint. We first present optimal off-line algorithms for separate and integrated crown scheduling based on integer linear programming (ILP). We make no restricting assumption about speedup behavior. We introduce the fast heuristic Longest Task, Lowest Group (LTLG) as a generalization of the Longest Processing Time (LPT) algorithm to achieve a load-balanced mapping of parallel tasks, and the Height heuristic for crown frequency scaling. We use them in feedback loop heuristics based on binary search and simulated annealing to optimize crown allocation.
Section Overview gives a brief overview of Crown schedulers features.
In Section Publications is a list of publications related to Crown Scheduling
You can download Drake thank to the link available in Section Download and license information is available in Section Licence.
Section Installation gives the procedure to install our crown schedulers.
Section Design a Drake streaming application gives code examples to design a Drake streaming application.
Section Ongoing work gives our plans for future work with Crown schedulers. You can find our contact information in Section Contact. Finally, we give credits to our funders in Section Acknowledgment
Main features:
Nicolas Melot, Christoph Kessler, Joerg Keller, Patrick Eitschberger: Fast Crown Scheduling Heuristics for Energy-Efficient Mapping and Scaling of Moldable Streaming Tasks on Many-Core Systems. ACM Transactions on Architecture and Code Optimization (TACO), http://dx.doi.org/10.1145/2687653
See also the html documentation generated by doxygen.
Our crown scheduler package includes implementation based on AMPL and C++ implementation of heuristics. All our crown scheduler implementations require the pelib to work. Consult its documetnation page for more information to install it. Implementations basedon AMPL additionally require a working copy of a linear integer solver with a AMPL frontend (we use AMPL version 20070505 shipped with ILOG 10.100, and Gurobi 6). Define environment variables CC=gcc and CXX=g++ (or replace these values with your favorite C and C++ compilers, respectively, as long as they use the same command line switches) One all dependencies are installed, download the crown scheduler package, extract it and open a terminal in the extraction directory. Then type:
make installto install crown schedulers. It creates directories $prefix/bin, $prefix/lib, $prefix/lib/pkgconfig and $prefix/include and install binaries, static and dynamic libraries, pkgconfig library information and include files in these directories, respectively. You can type
make install prefix=/some/other/directoryto install it elsewhere. Binaries are build with the library search path including $prefix/lib, so you may need to give a custom prefix location when linking binaries as well.
This section gives examples to illustrate the functionning of crown schedulers and its heuristics. Crown schedulers simplify the general scheduling problem by reducing possible subset of cores parallel tasks of a streaming application can be mapped to, from O(p^2) to O(p) possible mapping targets. We recursively divide the set of cores of a target execution platform into smaller, non-overlapping sets until sets include only one core; we call "group" any set of cores obtained through this recursive decomposition process.
Fig. 1 illustrate this recursive decomposition and gives crown scheduling its name. Every core of a child group also belong to the same set of ancestor larger groups from which the child group is issued.
The source code available above is released under GPLv3 licence. Please contact Nicolas Melot (nicolas.melot@liu.se) if you want to obtain another licence.
This is a work in progress. Future work include the implementation of generalized crown schedulers, that is platforms are not restricted with a power of 2 number of cores, each levels of the crown may be divided in unequal groups sizes and each level may be divided in any number of smaller groups. This can open more optimization opportunities. Future work also include the minimization of cost of task-to-task communications as well as taking platform constraints into account, such as voltage and frequency islands.
If you would like to contribute, please let us know.
For reporting bugs, please email to "<firstname> DOT <lastname> AT liu DOT se".
This work was partly funded by the EU FP7 project EXCESS and by SeRC project OpCoReS.
![]() |
![]() |
![]() |
![]() |