SkePU(integratedwithStarPU)
0.8.1
|
A class representing the Scan skeleton. More...
#include <scan.h>
Public Member Functions | |
Scan (ScanFunc *scanFunc) | |
Scan (ScanFunc *scanFunc, Vector< T > *in, Vector< T > *out, ScanType type, T init=T()) | |
~Scan () | |
void | run_async () |
void | operator() (Vector< T > &input, ScanType type, T init=T()) |
void | operator() (Vector< T > &input, Vector< T > &output, ScanType type, T init=T()) |
T | scanLargeVectorRecursively_CU (T *input, T *output, std::vector< DeviceMemPointer_CU< T > * > &blockSums, unsigned int numElements, int level, ScanType type, T init, int deviceID) |
T | scanLargeVectorRecursively_CL (T *input, T *output, std::vector< DeviceMemPointer_CL< T > * > &blockSums, unsigned int numElements, int level, ScanType type, T init, Device_CL *deviceCL) |
void | replaceText (std::string &text, std::string find, std::string replace) |
void | createOpenCLProgram () |
Static Public Member Functions | |
static void | cpu_func (void *buffers[], void *arg) |
static void | omp_func (void *buffers[], void *arg) |
static void | cuda_func (void *buffers[], void *arg) |
static void | opencl_func (void *buffers[], void *arg) |
A class representing the Scan skeleton.
This class defines the Scan skeleton, also known as prefix sum. It is related to the Reduce operation but instead of producing a single scalar result it produces an output vector of the same length as the input with its elements being the reduction of itself all elements preceding it in the input. For example the input vector [4 3 7 6 9] would produce the result vector [4 7 14 20 29]. The Scan operation can either include or exclude the current element. It can be either inclusive or exclusive. In the previous example a inclusive scan was performed, the exclusive result would be [0 4 7 14 20]. Exclusive scan is sometimes called prescan. This Scan skeleton supports both variants by adding a parameter to the function calls, default is inclusive.
Once instantiated, it is meant to be used as a function and therefore overloading operator()
. There are a few overloaded variants of this operator depending on if a seperate output vector is provided.
It uses StarPU as a backend and choice between different backends can be controlled by using defines. SKEPU_OPENMP can be used to use OpenMP whose support is currently limited in StarPU, hence not recommended, SKEPU_CUDA will register CUDA backend along other defined (atleast CPU if nothing else is defined) SKEPU_OPENCL will register OpenCL backend along other defined (atleast CPU if nothing else is defined) CUDA_ONLY will eliminate all other backends (even CPU) and will force StarPU to use CUDA only. if no above macro is defined, it will use sequential CPU backend.
skepu::Scan< ScanFunc, T >::Scan | ( | ScanFunc * | scanFunc | ) |
When creating an instance of the Scan skeleton, a pointer to a binary user function must be provided. Also the environment is set and if SKEPU_OPENCL
is defined, the appropriate OpenCL program and kernel are created. Also creates a default execution plan which the skeleton will use if no other is specified.
scanFunc | A pointer to a valid binary user function. Will be deleted in the destructor. |
References skepu::Environment< T >::getInstance().
skepu::Scan< ScanFunc, T >::Scan | ( | ScanFunc * | scanFunc, |
Vector< T > * | in, | ||
Vector< T > * | out, | ||
ScanType | _type, | ||
T | _init = T() |
||
) |
When creating an instance of the Scan skeleton, a pointer to a binary user function must be provided. Also the environment is set and if SKEPU_OPENCL
is defined, the appropriate OpenCL program and kernel are created. Also creates a default execution plan which the skeleton will use if no other is specified. This constructor is used to assist in calling scan skeleton in a task-parallel skeleton(e.g. farm) by implementing Task class "run_async()" method. The parameters for scan call should be specified before e.g. using this constructor.
scanFunc | A pointer to a valid binary user function. Will be deleted in the destructor. |
in | A pointer to a input Vector object. |
out | A pointer to a output Vector object. |
_type | A ScanType object specifying scan type (INCLUSIVE, EXCLUSIVE). |
_init | An variable specifying initial value incase INCLUSIVE ScanType is used. |
References skepu::Environment< T >::getInstance().
skepu::Scan< ScanFunc, T >::~Scan | ( | ) |
When the Scan skeleton is destroyed, it deletes the user function it was created with. Furthermore, it destroy OpenCL handler and performance model objects (if created) and StarPU codelet.
|
static |
A static function used with StarPU codelet for applying Scan and is called by the StarPU if CPU backend is selected for a skeleton invocation.
buffers | Contain all StarPU managed data which in this case consists of atmost two buffers. |
arg | A read only argument which is used to pass handler to the object as this is a static function. |
void skepu::Scan< ScanFunc, T >::createOpenCLProgram | ( | ) |
A function called by the constructor. It creates the OpenCL program for the skeleton and saves the kernel name. The program is built from a string containing the user function (specified when constructing the skeleton) and a generic Scan kernel. The type and function names in the generic kernel are relpaced by user function specific code before it is compiled by the OpenCL JIT compiler. The Scan kernel actually is two kernels which both have their handles saved. The actual scan kernel and a uniform add kernel to add the block sums produced by scanning
Also handles the use of doubles automatically by including "#pragma OPENCL EXTENSION cl_khr_fp64: enable" if doubles are used.
References skepu::ScanAdd_CL(), skepu::ScanKernel_CL(), and skepu::ScanUpdate_CL().
|
static |
A static function used with StarPU codelet for applying Scan and is called by the StarPU if CUDA backend is selected for a skeleton invocation.
buffers | Contain all StarPU managed data which in this case consists of atmost two buffers. |
arg | A read only argument which is used to pass handler to the object as this is a static function. |
References skepu::Scan< ScanFunc, T >::scanLargeVectorRecursively_CU().
|
static |
A static function used with StarPU codelet for applying Scan and is called by the StarPU if OpenMP backend is selected for a skeleton invocation.
buffers | Contain all StarPU managed data which in this case consists of atmost two buffers. |
arg | A read only argument which is used to pass handler to the object as this is a static function. |
|
static |
A static function used with StarPU codelet for applying Scan and is called by the StarPU if OpenCL backend is selected for a skeleton invocation.
buffers | Contain all StarPU managed data which in this case consists of atmost two buffers. |
arg | A read only argument which is used to pass handler to the object as this is a static function. |
References skepu::Scan< ScanFunc, T >::scanLargeVectorRecursively_CL().
void skepu::Scan< ScanFunc, T >::operator() | ( | Vector< T > & | input, |
ScanType | type, | ||
T | init = T() |
||
) |
Performs the Scan on a whole Vector. With itself as output.
Depending on which backend was used, appropriate backend will be registered. In case of multiple backends (e.g. CPU and CUDA) StarPU will decide at runtime which one to use.
input | A vector which will be scanned. It will be overwritten with the result. |
type | The scan type, either INCLUSIVE or EXCLUSIVE. |
init | The initialization value for exclusive scans. |
void skepu::Scan< ScanFunc, T >::operator() | ( | Vector< T > & | input, |
Vector< T > & | output, | ||
ScanType | type, | ||
T | init = T() |
||
) |
Performs the Scan on a whole Vector. With a seperate Vector as output.
Depending on which backend was used, appropriate backend will be registered. In case of multiple backends (e.g. CPU and CUDA) StarPU will decide at runtime which one to use.
input | A vector which will be scanned. |
output | The result vector, will be overwritten with the result and resized if needed. |
type | The scan type, either INCLUSIVE or EXCLUSIVE. |
init | The initialization value for exclusive scans. |
References skepu::Vector< T >::clear(), skepu::Vector< T >::resize(), and skepu::Vector< T >::size().
void skepu::Scan< ScanFunc, T >::replaceText | ( | std::string & | text, |
std::string | find, | ||
std::string | replace | ||
) |
A helper function used by createOpenCLProgram(). It finds all instances of a string in another string and replaces it with a third string.
text | A std::string which is searched. |
find | The std::string which is searched for and replaced. |
replace | The relpacement std::string . |
|
virtual |
This is an abstract method defined in the Task
class which every data-parallel skeleton implements to allow itself to be used within task-parallel skeletons (e.g. farm). It relies on the fact that parameters for the function call are already provided e.g. via constructor or setter methods available in the public interface of the class.
Implements skepu::Task.
T skepu::Scan< ScanFunc, T >::scanLargeVectorRecursively_CL | ( | T * | input, |
T * | output, | ||
std::vector< DeviceMemPointer_CL< T > * > & | blockSums, | ||
unsigned int | numElements, | ||
int | level, | ||
ScanType | type, | ||
T | init, | ||
Device_CL * | deviceCl | ||
) |
Scans a Vector using the same recursive algorithm as NVIDIA SDK. First the vector is scanned producing partial results for each block. Then the function is called recursively to scan these partial results, which in turn can produce partial results and so on. This continues until only one block with partial results is left.
input | Pointer to the device memory where the input vector resides. |
output | Pointer to the device memory where the output vector resides. |
blockSums | A Vector of device memory pointers where the partial results for each level is stored. |
numElements | The number of elements to scan. |
level | The current recursion level. |
type | The scan type, either INCLUSIVE or EXCLUSIVE. |
init | The initialization value for exclusive scans. |
deviceID | Integer deciding which device to utilize. |
References skepu::DeviceMemPointer_CL< T >::getDeviceDataPointer().
T skepu::Scan< ScanFunc, T >::scanLargeVectorRecursively_CU | ( | T * | input, |
T * | output, | ||
std::vector< DeviceMemPointer_CU< T > * > & | blockSums, | ||
unsigned int | numElements, | ||
int | level, | ||
ScanType | type, | ||
T | init, | ||
int | deviceID | ||
) |
Scans a Vector using the same recursive algorithm as NVIDIA SDK. First the vector is scanned producing partial results for each block. Then the function is called recursively to scan these partial results, which in turn can produce partial results and so on. This continues until only one block with partial results is left.
input | Pointer to the device memory where the input vector resides. |
output | Pointer to the device memory where the output vector resides. |
blockSums | A Vector of device memory pointers where the partial results for each level is stored. |
numElements | The number of elements to scan. |
level | The current recursion level. |
type | The scan type, either INCLUSIVE or EXCLUSIVE. |
init | The initialization value for exclusive scans. |
deviceID | Integer deciding which device to utilize. |
References skepu::DeviceMemPointer_CU< T >::getDeviceDataPointer().