SkePU 0.7
Public Member Functions | Static Public Member Functions
skepu::Scan< ScanFunc, T > Class Template Reference

A class representing the Scan skeleton. More...

#include <scan.h>

Inheritance diagram for skepu::Scan< ScanFunc, T >:
Inheritance graph
[legend]
Collaboration diagram for skepu::Scan< ScanFunc, T >:
Collaboration graph
[legend]

List of all members.

Public Member Functions

 Scan (ScanFunc *scanFunc)
 Scan (ScanFunc *scanFunc, Vector< T > *in, Vector< T > *out, ScanType type, T init=T())
 ~Scan ()
void run_async ()
void operator() (Vector< T > &input, ScanType type, T init=T())
void operator() (Vector< T > &input, Vector< T > &output, ScanType type, T init=T())
scanLargeVectorRecursively_CU (T *input, T *output, std::vector< DeviceMemPointer_CU< T > * > &blockSums, unsigned int numElements, int level, ScanType type, T init, int deviceID)
scanLargeVectorRecursively_CL (T *input, T *output, std::vector< DeviceMemPointer_CL< T > * > &blockSums, unsigned int numElements, int level, ScanType type, T init, Device_CL *deviceCL)
void replaceText (std::string &text, std::string find, std::string replace)
void createOpenCLProgram ()

Static Public Member Functions

static void cpu_func (void *buffers[], void *arg)
static void omp_func (void *buffers[], void *arg)
static void cuda_func (void *buffers[], void *arg)
static void opencl_func (void *buffers[], void *arg)

Detailed Description

template<typename ScanFunc, typename T>
class skepu::Scan< ScanFunc, T >

A class representing the Scan skeleton.

Author:
Johan Enmyren, Usman Dastgeer
Version:
0.7

This class defines the Scan skeleton, also known as prefix sum. It is related to the Reduce operation but instead of producing a single scalar result it produces an output vector of the same length as the input with its elements being the reduction of itself all elements preceding it in the input. For example the input vector [4 3 7 6 9] would produce the result vector [4 7 14 20 29]. The Scan operation can either include or exclude the current element. It can be either inclusive or exclusive. In the previous example a inclusive scan was performed, the exclusive result would be [0 4 7 14 20]. Exclusive scan is sometimes called prescan. This Scan skeleton supports both variants by adding a parameter to the function calls, default is inclusive.

Once instantiated, it is meant to be used as a function and therefore overloading operator(). There are a few overloaded variants of this operator depending on if a seperate output vector is provided.

It uses StarPU as a backend and choice between different backends can be controlled by using defines. SKEPU_OPENMP can be used to use OpenMP whose support is currently limited in StarPU, hence not recommended, SKEPU_CUDA will register CUDA backend along other defined (atleast CPU if nothing else is defined) SKEPU_OPENCL will register OpenCL backend along other defined (atleast CPU if nothing else is defined) CUDA_ONLY will eliminate all other backends (even CPU) and will force StarPU to use CUDA only. if no above macro is defined, it will use sequential CPU backend.


Constructor & Destructor Documentation

template<typename ScanFunc , typename T >
skepu::Scan< ScanFunc, T >::Scan ( ScanFunc *  scanFunc)

When creating an instance of the Scan skeleton, a pointer to a binary user function must be provided. Also the environment is set and if SKEPU_OPENCL is defined, the appropriate OpenCL program and kernel are created. Also creates a default execution plan which the skeleton will use if no other is specified.

Parameters:
scanFuncA pointer to a valid binary user function. Will be deleted in the destructor.

References skepu::Environment< T >::getInstance().

Here is the call graph for this function:

template<typename ScanFunc , typename T >
skepu::Scan< ScanFunc, T >::Scan ( ScanFunc *  scanFunc,
Vector< T > *  in,
Vector< T > *  out,
ScanType  _type,
_init = T() 
)

When creating an instance of the Scan skeleton, a pointer to a binary user function must be provided. Also the environment is set and if SKEPU_OPENCL is defined, the appropriate OpenCL program and kernel are created. Also creates a default execution plan which the skeleton will use if no other is specified. This constructor is used to assist in calling map skeleton in a task-parallel skeleton(e.g. farm) by implementing Task class "run_async()" method. The parameters for map call should be specified before e.g. using this constructor.

Parameters:
scanFuncA pointer to a valid binary user function. Will be deleted in the destructor.
inA pointer to a input Vector object.
outA pointer to a output Vector object.
_typeA ScanType object specifying scan type (INCLUSIVE, EXCLUSIVE).
_initAn variable specifying initial value incase INCLUSIVE ScanType is used.

References skepu::Environment< T >::getInstance().

Here is the call graph for this function:

template<typename ScanFunc , typename T >
skepu::Scan< ScanFunc, T >::~Scan ( )

When the Scan skeleton is destroyed, it deletes the user function it was created with. Furthermore, it destroy OpenCL handler and performance model objects (if created) and StarPU codelet.


Member Function Documentation

template<typename ScanFunc , typename T >
void skepu::Scan< ScanFunc, T >::cpu_func ( void *  buffers[],
void *  arg 
) [static]

A static function used with StarPU codelet for applying Scan and is called by the StarPU if CPU backend is selected for a skeleton invocation.

Parameters:
buffersContain all StarPU managed data which in this case consists of atmost two buffers.
argA read only argument which is used to pass handler to the object as this is a static function.
template<typename ScanFunc , typename T >
void skepu::Scan< ScanFunc, T >::createOpenCLProgram ( )

A function called by the constructor. It creates the OpenCL program for the skeleton and saves the kernel name. The program is built from a string containing the user function (specified when constructing the skeleton) and a generic Scan kernel. The type and function names in the generic kernel are relpaced by user function specific code before it is compiled by the OpenCL JIT compiler. The Scan kernel actually is two kernels which both have their handles saved. The actual scan kernel and a uniform add kernel to add the block sums produced by scanning

Also handles the use of doubles automatically by including "#pragma OPENCL EXTENSION cl_khr_fp64: enable" if doubles are used.

References skepu::ScanAdd_CL(), skepu::ScanKernel_CL(), and skepu::ScanUpdate_CL().

Here is the call graph for this function:

template<typename ScanFunc , typename T >
void skepu::Scan< ScanFunc, T >::cuda_func ( void *  buffers[],
void *  arg 
) [static]

A static function used with StarPU codelet for applying Scan and is called by the StarPU if CUDA backend is selected for a skeleton invocation.

Parameters:
buffersContain all StarPU managed data which in this case consists of atmost two buffers.
argA read only argument which is used to pass handler to the object as this is a static function.

References skepu::Scan< ScanFunc, T >::scanLargeVectorRecursively_CU().

Here is the call graph for this function:

template<typename ScanFunc , typename T >
void skepu::Scan< ScanFunc, T >::omp_func ( void *  buffers[],
void *  arg 
) [static]

A static function used with StarPU codelet for applying Scan and is called by the StarPU if OpenMP backend is selected for a skeleton invocation.

Parameters:
buffersContain all StarPU managed data which in this case consists of atmost two buffers.
argA read only argument which is used to pass handler to the object as this is a static function.
template<typename ScanFunc , typename T >
void skepu::Scan< ScanFunc, T >::opencl_func ( void *  buffers[],
void *  arg 
) [static]

A static function used with StarPU codelet for applying Scan and is called by the StarPU if OpenCL backend is selected for a skeleton invocation.

Parameters:
buffersContain all StarPU managed data which in this case consists of atmost two buffers.
argA read only argument which is used to pass handler to the object as this is a static function.

References skepu::Scan< ScanFunc, T >::scanLargeVectorRecursively_CL().

Here is the call graph for this function:

template<typename ScanFunc , typename T >
void skepu::Scan< ScanFunc, T >::operator() ( Vector< T > &  input,
ScanType  type,
init = T() 
)

Performs the Scan on a whole Vector. With itself as output.

Depending on which backend was used, appropriate backend will be registered. In case of multiple backends (e.g. CPU and CUDA) StarPU will decide at runtime which one to use.

Parameters:
inputA vector which will be scanned. It will be overwritten with the result.
typeThe scan type, either INCLUSIVE or EXCLUSIVE.
initThe initialization value for exclusive scans.
template<typename ScanFunc , typename T >
void skepu::Scan< ScanFunc, T >::operator() ( Vector< T > &  input,
Vector< T > &  output,
ScanType  type,
init = T() 
)

Performs the Scan on a whole Vector. With a seperate Vector as output.

Depending on which backend was used, appropriate backend will be registered. In case of multiple backends (e.g. CPU and CUDA) StarPU will decide at runtime which one to use.

Parameters:
inputA vector which will be scanned.
outputThe result vector, will be overwritten with the result and resized if needed.
typeThe scan type, either INCLUSIVE or EXCLUSIVE.
initThe initialization value for exclusive scans.

References skepu::Vector< T >::clear(), skepu::Vector< T >::resize(), and skepu::Vector< T >::size().

Here is the call graph for this function:

template<typename ScanFunc , typename T >
void skepu::Scan< ScanFunc, T >::replaceText ( std::string &  text,
std::string  find,
std::string  replace 
)

A helper function used by createOpenCLProgram(). It finds all instances of a string in another string and replaces it with a third string.

Parameters:
textA std::string which is searched.
findThe std::string which is searched for and replaced.
replaceThe relpacement std::string.
template<typename ScanFunc , typename T >
void skepu::Scan< ScanFunc, T >::run_async ( ) [virtual]

This is an abstract method defined in the Task class which every data-parallel skeleton implements to allow itself to be used within task-parallel skeletons (e.g. farm). It relies on the fact that parameters for the function call are already provided e.g. via constructor or setter methods available in the public interface of the class.

Implements skepu::Task.

template<typename ScanFunc , typename T >
T skepu::Scan< ScanFunc, T >::scanLargeVectorRecursively_CL ( T *  input,
T *  output,
std::vector< DeviceMemPointer_CL< T > * > &  blockSums,
unsigned int  numElements,
int  level,
ScanType  type,
init,
Device_CL deviceCl 
)

Scans a Vector using the same recursive algorithm as NVIDIA SDK. First the vector is scanned producing partial results for each block. Then the function is called recursively to scan these partial results, which in turn can produce partial results and so on. This continues until only one block with partial results is left.

Parameters:
inputPointer to the device memory where the input vector resides.
outputPointer to the device memory where the output vector resides.
blockSumsA Vector of device memory pointers where the partial results for each level is stored.
numElementsThe number of elements to scan.
levelThe current recursion level.
typeThe scan type, either INCLUSIVE or EXCLUSIVE.
initThe initialization value for exclusive scans.
deviceIDInteger deciding which device to utilize.

References skepu::DeviceMemPointer_CL< T >::getDeviceDataPointer().

Here is the call graph for this function:

template<typename ScanFunc , typename T >
T skepu::Scan< ScanFunc, T >::scanLargeVectorRecursively_CU ( T *  input,
T *  output,
std::vector< DeviceMemPointer_CU< T > * > &  blockSums,
unsigned int  numElements,
int  level,
ScanType  type,
init,
int  deviceID 
)

Scans a Vector using the same recursive algorithm as NVIDIA SDK. First the vector is scanned producing partial results for each block. Then the function is called recursively to scan these partial results, which in turn can produce partial results and so on. This continues until only one block with partial results is left.

Parameters:
inputPointer to the device memory where the input vector resides.
outputPointer to the device memory where the output vector resides.
blockSumsA Vector of device memory pointers where the partial results for each level is stored.
numElementsThe number of elements to scan.
levelThe current recursion level.
typeThe scan type, either INCLUSIVE or EXCLUSIVE.
initThe initialization value for exclusive scans.
deviceIDInteger deciding which device to utilize.

References skepu::DeviceMemPointer_CU< T >::getDeviceDataPointer().

Here is the call graph for this function:


The documentation for this class was generated from the following files:
 All Classes Namespaces Files Functions Enumerations Friends Defines