PEPPHER Compostion Tool Prototype - A Manual

PEPPHER (2010-2012) was an EU FP 7 project that addresses programmability and portability for modern heterogeneous systems. The composition tool for PEPPHER will provide high level abstraction to the PEPPHER runtime system, including support for static composition.

This manual targets running prototype of the PEPPHER composition tool. It aims at describing features of the current prototype and how to use it, including do's and don't's as well as limitations of the current prototype.

Introduction

The composition tool uses the meta-data and assosiated source code files to do the composition processing. The meta-data is specified using separate XML files. OBS!!! The composition tool does not process annotations in the source files. Normally (excluding utility mode) the composition tool is invoked using command e.g., compose <main.xml> where <main.xml> is link to the XML file corresponding to the main function. This <main.xml> XML file would potentially link to other components/interfaces which are invoked by the main function. In this way, the composition tool can recursively explore all the interfaces. Currently, the recursion length is limited to 1 (i.e. main function contains different component calls), because of the following reason:

Currently, the prototype mainly targets generating code that is executable on the PEPPHER runtime system which does not support recursion (i.e. a task cannot be created/executed inside body of another task). So, nesting a component call inside another component body where each component is a runtime task is not possible without code modifications. However, you may call a concrete implementation inside a component implementation (e.g. directly calling CUDA implementation of "partition" component inside the CUDA implementation of "qsort" component).

Installation

The current composition tool prototype is ported to GNU AutoTools. This means that it can be configured/built as follows:

Architecture

The PEPPHER composition tool is written in C++. It uses the Xerces C++ XML parser to parse and validate XML descriptors with respect to PEPPHER XML schema. Internally it stores information into an AST form. Any static composition and other processing is carried out on this AST and later the (modified) AST is used to generate the code for the dynamic composition (runtime system). Following figure explaing the internal architecture of the composition tool which looks very similar to traditional compiler design.

When PDL support is enabled, it uses CodeSynthesis XSD for data binding PDL schema to C++ classes/functions.

Prototype Features

Current prototype supports the following features:

PEPPHER Containers

For detail about what are the PEPPHER containers and their usage with PEPPHER components, see here for more information.

Smart containers are containers that keep track of data residing on different memory units. However, when using PEPPHER runtime system, this is not the case as the memory management is done by the PEPPHER runtime system. In this case, the containers act as smart wrappers, abstracting interaction with the memory management API of the runtime system. They transparently manage the interaction with the PEPPHER runtime system and support asynchronous execution and task partitioning when used to pass operand data to the PEPPHER components.

Currently, there are three containers implemented: All containers are generic for element type, using C++ templates. Furthermore, all three containers implement a standard interface(i.e. IContainer) that models a standard API for interaction with the PEPPHER runtime system.

IContainer

The IContainer is designed as a C++ abstract class with several virtual methods that can be overrridden by the containers.Following are the methods declared/defined in IContainer (implemented as an abstract class in C++): The main idea of IContainer is to support new PEPPHER container types in future without major changes in the peppher prototype. This can be achieved by implementing the IContainer interface for each new container. You can find signature of IContainer here.

Vector

The Vector container is a generic 1D container that implements the IContainer interface and supports 1D data partitioning. Besides methods listed in IContainer interface, the Vector container has several methods including constructors, operator[index] and destructor.
peppher::Vector<float> v0(25, 3.5f); // create a float vector "v0" with 25 elements, each initialized to value 3.5
peppher::Vector<int> v1(10); // create an integer vector "v1" with 10 elements
v1.randomize(10,50); // initialize v1 elements with random values between 10 and 50.
v1[5]=  55; // set 6th element of v1 to value 55
std::cout << "v1: " << v1; // print contents of vector v1
The Vector implementation is based on C++ so it can be used in only CPU-side code. For CUDA and OpenCL code, you can pass the underlying raw pointer using getRawType() method. Click here for method signatures.

Matrix

The Matrix container is a generic 2D container that implements the IContainer interface and supports both 1D (horizontal, vertical) and 2D partitioning. The Matrix contains several methods including Constructors, operator(row,col), operator[index] and destructor.
peppher::Matrix<float> m0(5, 10, 3.5f); // create a float matrix "m0" of size 5 X 10, each element initialized to value 3.5
peppher::Matrix<int> m1(10, 10); // create an integer matrix "m1" of size 10 X 10
m1.randomize(10,50); // initialize m1 elements with random values between 10 and 50.
m1(5,2)=  55; // set 3rd element in 6th row of m1 to value 55
std::cout << "m1: " << m1; // print contents of matrix m1
The Matrix implementation is based on C++ so it can be used in only CPU-side code. For CUDA and OpenCL code, you can pass the underlying raw pointer using getRawType() method. Click here for method signatures.

Scalar

The Scalar implements IContainer interface and is designed to model scalar values/objects. It does not support data partitioning as it models a generic scalar value. As containers are used to provide asynchronous execution across different component invocations, the Scalar can be used to implement such support for components that contain scalar parameters (e.g. int, float, class object etc.) besides Vector and Matrix. Internally, it stores the pointer to the scalar data, similar to auto_ptr and other classes in C++. Here is a usage example:
int idNo=30;
peppher::Scalar<int> pIdNo(&idNo, true); // don't deallocate memory when destructor called as the memory pointed-to is on stack.

peppher::Scalar<int> pTemp(new int(10)); // will deallocate memory when destructor called as the memory pointed-to is on heap.
std::cout << "value: " << *pTemp; // value: 10
The Scalar implementation is based on C++ so it can be used in only CPU-side code. For CUDA and OpenCL code, you can pass the underlying raw pointer using getRawType() method. Click here for method signatures.

Accessing data on CUDA and OpenCL CPU-side function

Each component in PEPPHER framework can have multiple implementations, possibly in different languages such as: C/C++(for CPU), CUDA and/or OpenCL (for GPUs). For even CUDA and OpenCL GPU component implementations, we need to have a wrapper CPU-side function which internally is responsible for calling CUDA and/or OpenCL code and returning the result back. This CPU-side wrapper-function for CUDA and OpenCL implementations have the same interface as the CPU component implementation in C/C++. In these CPU-side wrapper-function for CUDA and OpenCL implementations, the operand data cannot be accessed as the data that is passed to these wrapper functions actually points to data in CUDA or OpenCL device memory. This is because data management is done by the runtime system. For more information about this issue, please see.

Support for asynchronous component executions

The PEPPHER containers can be used with components to allow asynchronous executions across different component invocations.

Asynchronous execution across component invocations

To allow asynchronous execution for a component call, all operand(s) of that component must be either in:

Reason: The composition tool does not imply any kind of static source-code analysis to find data-dependencies and data-usage across different instructions. Rather, it relies on information specified in the XML files to generate the code. This makes it difficult for the composition tool to optimally decide for arbitrary data when to register or unregister it. Hence, for parameters not modeled using PEPPHER containers and are not scalar (e.g. int, float) read-only values, it conservatively registers and un-register them for each component invocation, which makes the component invocation sychronous(blocking) and may yield significant overhead for non-trivial applications.

Usage example

A PEPPHER component has one interface containing one method and multiple implementations of that interface, possibly for different backends (CPU, CUDA, OpenCL). For each component interface, there is one XML file that specifies meta-information for that interface, including name, signature, directory containing implementation files etc. (see XML files in example folder in prototype directory).

For a component that uses PEPPHER containers to receive operand data, the XML file need to specify this information. Here is an example file:

For interface with method signature:
void vector_scale( float *arr, unsigned size, float factor);
The XML file to specify that interface is:
<peppher:component ...>
  <peppher:interface name="vector_scale">
     <peppher:parameters>
        <peppher:parameter name="arr" type="float *" accessMode="readwrite"  numElements="size" />
        <peppher:parameter name="size" type="unsigned" accessMode="read" />
        <peppher:parameter name="factor" type="float" accessMode="read" />
     </peppher:parameters>
  </peppher:interface>
</peppher:component>
The above component definition can be executed only synchronously. To allow asynchronous execution, we need to wrap "float *" as a PEPPHER Vector container:
void vector_scale(peppher::Vector<float> &v, float factor);
And the XML file describing above interface is as follows:
<peppher:component ...>
  <peppher:interface name="vector_scale">
     <peppher:parameters>
        <peppher:parameter name="arr" type="peppher::Vector" elemType="float" accessMode="readwrite" />
        <peppher:parameter name="factor" type="float" accessMode="read" />
     </peppher:parameters>
  </peppher:interface>
</peppher:component>

As you can see above that by wrapping "float *" in a Vector, we dont need any more to pass size of the vector as it can be obtained by vector object using "v.size()" method.

The only addition for PEPPHER containers in interface XML descriptor file is elemType attribute which specifies the element type as containers are generic for the element type. The elemType can be any non-pointer type that can be instantiated with zero-argument constructor. Furthermore, the parameters using PEPPHER containers (Vector, Matrix, Scalar) are always passed by reference however in interface file, the type attribute does not specify that. In a way, the actual type of a parameter passed using PEPPHER containers is something like type<elemType> & where type and elemType are attributes specified in the interface XML descriptor file.

Support for task partitioning

In normal executions, a component invocation is translated into a single StarPU task (i.e. 1:1 mapping). However, to increase concurrency, a component invocation can be translated to "m" StarPU tasks where each task is independent and can be executed in any order or in parallel (i.e. 1:m mapping). This comes from partitioning the operand data into chunks that can be processed by different tasks in parallel. One example could be matrix multiplication which could be either executed as a single task (1:1 mapping) or could be calculated by dividing the work between m different tasks (1:m mapping) where each task calculate a subset of the output matrix. Some facts about task partitioning: The partitioning support, where applicable is added by simply adding the partition attribute to the parameter for which one needs to do the partitioning, in the interface XML descriptor file. For above vector scale example, the partitioning can be achieved by dividing the "arr" Vector into blocks which can be processed independently. OBS!!! The implementation (source code) does not change and the only change is addition of partition attribute in the interface XML descriptor file, as shown below:
<peppher:component ...>
  <peppher:interface name="vector_scale">
     <peppher:parameters>
        <peppher:parameter name="arr" type="peppher::Vector" elemType="float" accessMode="readwrite" partition="arr.size()/10" />
        <peppher:parameter name="factor" type="float" accessMode="read" />
     </peppher:parameters>
  </peppher:interface>
</peppher:component>
In essence, partition specify size of each chunk. As you can see in above example, we can use an expression instead on a contant value which allows us to specify a partition size in terms of actual vector size. The partition="arr.size()/10" means that vector will be divided in 10 partitions of equal size. Each partition will correspond to one task in the runtime system, producing 10 tasks in this example that can be processed concurrently. Please see the examples folder to know more about the partitioning for 2D matrix objects.

Support for OpenMP CPU component

For CPU backend, the composition tool supports both sequential CPU components as well as parallel components written using OpenMP.

Support for multiple implementations per backend

The composition tool supports usage of multiple implementations for each backend (CPU, CUDA, OpenCL). This allows usage of multiple implementations for a single backend (e.g. multiple sorting implementations for CUDA) while the selection between these implementations is made at the runtime by the scheduler.

Support for conditional implementation selection

The condition implementation enables specification of constraints on selection of an implementation. As these constraints are resolved at runtime, they can use the actual operand values passed to the component call as well as reference to the PDL (Platform Description Language). The constraints are specified in the implementation descriptor using the validIf attribute.

Support for PDL (Platform Description language)

A Platform Description Language has been developed as part of the PEPPHER project. The PDL can help in modeling both hardware (e.g. no of CPU cores) and software (e.g. BLAS Library available or not) properties of a system. This information can be queried for making composition decisions. By using PDL, the composition tool allows making decision at runtime (e.g. based upon actual problem size). More information about PDL can be found in this article. The PDL can be used for many purposed. One way to use PDL is to use it for conditional implementation selection: For example, the following line specifies that a CUDA implementation require availability of at least 16 streaming multiprocessors for execution:
<peppher:implementation name="..." validIf='pdl::getIntProperty("numCudaSM") .GE. 16'>

Support for generic components (using C++ templates)

In C++, a function can be made generic on its operand data using "C++ template" feature that provides static type-checking and ability to use that function for operands having different data-types. For example, making a matrix multiplication implementation generic will allow us to use it to calculate matrix multiplication for int, float, double or any other type. Support for such generic components has been implemented in the composition tool. OBS!!! It complements other features discussed earlier (such as container, asynchronous execution and task partitioning) and can be combined with those features. For matrix multiplication example: For interface with method signature:
template <typename T>
void matrixmul(peppher::Matrix<T> &A, peppher::Matrix<T> &B, peppher::Matrix<T> &C);
The XML file to specify that interface is:
<peppher:component ...>
  <peppher:interface name="matrixmul" impPath="./matrixmul_/" templateTypes="T">
     <peppher:parameters>
        <peppher:parameter name="A" type="peppher::Matrix" elemType="T" accessMode="read" />
        <peppher:parameter name="B" type="peppher::Matrix" elemType="T" accessMode="read" />
        <peppher:parameter name="C" type="peppher::Matrix" elemType="T" accessMode="readwrite" />
     </peppher:parameters>
  </peppher:interface>
</peppher:component>

The templateTypes attribute is used to specify template/generic types used in an interface declaration. In the above example, there is only one generic type, named "T" so templateTypes attribute contains "T". In case of more than one template types, they are specified in comma-separated manner, e.g. templateTypes="T,U" incase there are two generic template types "T" and "U".

Limitation: While using generic components, there are certain limitations with usage of both CUDA and OpenCL at the same time. For example, having a generic CUDA implementation, the main source file (i.e. source file containing the main function e.g. main.cpp) must be renamed to extension ".cu" (e.g. main.cu). This is because that the template code needs to be included rather than compiled separately which ultimately means that CUDA implementation will be included in the main source file. That being said, any source file containing CUDA code needs to be compiled with NVIDIA compiler (nvcc) which require ".cu" file extension. As this file is compiled by the NVIDIA compiler (nvcc) it cannot contain any OpenCL code as OpenCL is compiled with a regular C compiler such as gcc.

Support for performance-aware component selection

In the current prototype, the actual implementation variant selection is done using the dynamic scheduling capabilities of the PEPPHER runtime system. Internally, the StarPU runtime system can use performance-aware scheduling policies to do the scheduling. However, usage of such performance-aware schedluing policies require certain modifications in the code, e.g., definition of struct starpu_perf_model_t.

To enable support for using this schedluing policy the user may specify a flag (useHistoryModels). The flag can be specified for an individual component in the interface XML descriptor file which will enable performance-aware scheduling for only that component, e.g.

<peppher:interface name="vector_scale" useHistoryModels="true">
or it could be specified as a command-line argument to the composition tool which will apply it for all components in that application, e.g.
compose main.xml -useHistoryModels

Basic static composition

The current prototype supports basic static composition such as: To disable a specific implementation, you can specify disable clause in its XML descriptor, e.g.
<peppher:implementation name="scale_cpu_func" disable="true">
To enable/disable a certain type of implementations (CPU, CUDA, OpenCL), you can either specify it for a single component by specifying it in the interface XML descriptor file, e.g.
<peppher:interface name="vector_scale" disableCPU="true" ...>
or it could be specified as a command-line argument to the composition tool which will apply it for all components in that application, e.g.
compose main.xml -disableCPU
If you want even more control on static implementation selection, you can use -disableImpls, -disableXmlFiles command-line arguments of the composition tool.

Utility mode - generation of component skeletons from a C header file

Wrapping existing legacy code into PEPPHER components require addition of XML files and certain modifications to the code. To facilitate this process, the current prototype supports generation of basic component skeletons from a C/C++ header file containing a method declaration. From a basic method declaration defined in a file e.g. vector_scale.h in the following way:
#ifndef VECTOR_SCALE_H
#define VECTOR_SCALE_H
void vector_scale (float *arr, int size, float factor);
...
#endif
By running the composition tool with the -generateCompFiles option:
compose --generateCompFiles="vector_scale.h"
the composition tool will generate an XML and a C-source file for each backend (CPU, CUDA, OpenCL), six files in total, containing simple skeletons which can be further filled-in with more information. This utility mode can help component writers in writing PEPPHER components from legacy code in a time-efficient manner. Please note that when specified --generateCompFiles option, we don't need to specify any other command-line arguments to the composition tool. This is beacuse that composition tool does not do the code generation for StarPU backend in this mode.

The utility is still far from perfect but can already work with simple C/C++ method declarations ending with semicolon (;). As a PEPPHER component can have one method, so this utility just look for first method declaration and neglect remaining text in the file.

Examples

To assist with writing components with the current prototype and to demonstrate its various features, we have written several variants of vector scale and matrix multiplication toy applications:

For matrix multiplication:

For vector scale:

Command line arguments**

-v=xxx Verbose mode [1 | 2 | 3 | 0*]. In default (0), no information is displayed while more information is displayed in increasing verbose number order.
-wrapperFilesExt="xxx" Specify generated wrapper files extension (Default: ".h").
-useHistoryModels To enable usage of StarPU history performance models for all components. See!
-usePdl="xxx" A PDL XML file to be used during composition decisions.
-enableLibraryMode Enable library mode (no link statement generated).
-disableXMLFiles="xxx" List of implementation XML file-names(comma-separated if multiple) that should not be processed. The file-names should have .xml extension.
-disableImpls="xxx" Name of implementations(comma-separated if multiple) that should not be used for generating code. It is different from -disableXMLFiles option as in this case, composition tool still processes XML files but don't select these implementations when generating the code.
-disableCPU To disable CPU implementations for all components.
-disableCUDA To disable CUDA implementations for all components.
-disableOpenCL To disable OpenCL implementations for all components.
-enableMultiImpl To enable usage of multiple implementations for each backend.
-disableXMLValidation To disable XML validation done by the Xerces XML parser.
* = Default if not provided explicitly.
** = Options names are case-insensitive. However, the actual values e.g. for -disableXMLFiles="abc.xml", The xml file name "abc.xml" is case-sensitive in this case.

Porting legacy code

Porting legacy code using the Composition tool is described in the following Figure.


Contact: Usman Dastgeer, Lu Li, Prof. Christoph Kessler. For contact, please e-mail to "<firstname> DOT <lastname> AT liu DOT se".
Vetenskapsrådet The SeRC-OpCoReS Project The PEPPHER Project This work is funded by EU FP7 project PEPPHER during 2010-2012 period. Its current development is partly funded by SeRC-OpCoReS and Vetenskapsrådet.