A vector container class, implemented as a wrapper for std::vector. More...
#include <vector.h>
Classes | |
class | iterator |
An vector iterator class. More... | |
class | proxy_elem |
A proxy class representing one element of Vector. More... | |
Public Member Functions | |
void | randomize (int min=0, int max=RAND_MAX) |
Randomizes the vector. More... | |
void | save (const std::string &filename) |
Saves content of vector to a file. More... | |
void | load (const std::string &filename, size_type numElements=0) |
Loads the vector from a file. More... | |
Vector () | |
Vector (const Vector &vec) | |
Vector (size_type num, const T &val=T()) | |
Vector (T *const ptr, size_type size, bool deallocEnabled=true) | |
~Vector () | |
proxy_elem | operator[] (const size_type index) |
const T & | operator[] (const size_type index) const |
Vector< T > & | operator= (const Vector< T > &other) |
bool | operator== (const Vector< T > &c1) |
bool | operator!= (const Vector< T > &c1) |
bool | operator< (const Vector< T > &c1) |
bool | operator> (const Vector< T > &c1) |
bool | operator<= (const Vector< T > &c1) |
bool | operator>= (const Vector< T > &c1) |
iterator | begin () |
iterator | end () |
size_type | capacity () const |
size_type | size () const |
size_type | max_size () const |
void | resize (size_type num, T val=T()) |
bool | empty () const |
void | reserve (size_type size) |
proxy_elem | at (size_type loc) |
const T & | at (size_type loc) const |
proxy_elem | back () |
const T & | back () const |
proxy_elem | front () |
const T & | front () const |
void | assign (size_type num, const T &val) |
template<typename input_iterator > | |
void | assign (input_iterator start, input_iterator end) |
void | clear () |
void | pop_back () |
void | push_back (const T &val) |
void | swap (Vector< T > &from) |
device_pointer_type_cl | updateDevice_CL (T *start, size_type numElements, Device_CL *device, bool copy) |
Update device with vector content. More... | |
void | flush_CL () |
Flushes the vector. More... | |
void | copyDataToAnInvalidDeviceCopy (DeviceMemPointer_CU< T > *copy, unsigned int deviceID, unsigned int streamID=0) |
Used by updateDevice_CU function to copy data to a device copy.. the device copy could be a new one (just created) or an existing one with stale (marked invalid) data it tries to copy data from copies in existing device memory, then from host memory and in the end from other device memories... it can partially copy data from different sources in the process. More... | |
device_pointer_type_cu | updateDevice_CU (T *start, size_type numElements, unsigned int deviceID, bool copy, bool writeAccess, bool markOnlyLocalCopiesInvalid=false, unsigned int streamID=0) |
Update device with vector content. More... | |
void | flush_CU () |
Flushes the vector. More... | |
bool | isVectorOnDevice_CU (unsigned int deviceID) |
bool | isModified_CU (unsigned int deviceID) |
void | flush () |
T & | operator() (const size_type index) |
void | updateHost () const |
void | invalidateDeviceData () |
void | updateHostAndInvalidateDevice () |
void | releaseDeviceAllocations () |
void | updateHostAndReleaseDeviceAllocations () |
Friends | |
std::ostream & | operator<< (std::ostream &output, Vector< T > &vec) |
Overloaded stream operator, for testing purposes. More... | |
A vector container class, implemented as a wrapper for std::vector.
A skepu::Vector
is a container of vector/array type and is implemented as a wrapper for std::vector
. Its interface and behaviour is largely compatible with std::vector
but with some additions and variations. Instead of the regular element, it sometimes returns a proxy element so it can distinguish between reads and writes. It also keeps track of which parts of it are currently allocated and uploaded to the GPU. If a computation is done, changing the vector in the GPU memory, it is not directly transferred back to the host memory. Instead, the vector waits until an element is accessed before any copying is done.
It also implements support for allocating and de-allocating page-locked memory using cudaMallocHost and cudaFreeHost. This could help is running asynchronous operations especially when using multiple CUDA devices. It can be enabled by defining USE_PINNED_MEMORY flag in the skeleton program.
Please refer to C++ STL vector documentation for more information about CPU side implementation.
|
inline |
Please refer to the documentation of std::vector
.
|
inline |
Please refer to the documentation of std::vector
. The copy occurs w.r.t. elements. As copy constructor creates a new storage.
Updates vector c
before copying.
References skepu::Vector< T >::updateHost().
|
inlineexplicit |
Please refer to the documentation of std::vector
.
|
inline |
! Used to construct vector on a raw data pointer passed to it as its payload data. Useful when creating the vector object with existing raw data pointer.
skepu::Vector< T >::~Vector | ( | ) |
Releases all allocations made on device.
void skepu::Vector< T >::assign | ( | size_type | num, |
const T & | val | ||
) |
Please refer to the documentation of std::vector
.
void skepu::Vector< T >::assign | ( | input_iterator | start, |
input_iterator | end | ||
) |
Please refer to the documentation of std::vector
.
Vector< T >::proxy_elem skepu::Vector< T >::at | ( | size_type | loc | ) |
Please refer to the documentation of std::vector
.
Returns a proxy_elem
instead of an ordinary element. The proxy_elem
usually behaves like an ordinary, but there might be exceptions.
Referenced by skepu::Vector< T >::save().
const T & skepu::Vector< T >::at | ( | size_type | loc | ) | const |
Please refer to the documentation of std::vector
.
Vector< T >::proxy_elem skepu::Vector< T >::back | ( | ) |
Please refer to the documentation of std::vector
.
Returns a proxy_elem
instead of an ordinary element. The proxy_elem
usually behaves like an ordinary, but there might be exceptions.
const T & skepu::Vector< T >::back | ( | ) | const |
Please refer to the documentation of std::vector
.
Vector< T >::iterator skepu::Vector< T >::begin | ( | ) |
Please refer to the documentation of std::vector
.
Referenced by skepu::Generate< GenerateFunc >::CL(), skepu::Scan< ScanFunc >::CL(), skepu::MapArray< MapArrayFunc >::CL(), skepu::MapReduce< MapFunc, ReduceFunc >::CL(), skepu::MapOverlap< MapOverlapFunc >::CL(), skepu::Reduce< ReduceFunc, ReduceFunc >::CL(), skepu::Map< MapFunc >::CL(), skepu::Generate< GenerateFunc >::CPU(), skepu::Scan< ScanFunc >::CPU(), skepu::MapReduce< MapFunc, ReduceFunc >::CPU(), skepu::MapOverlap< MapOverlapFunc >::CPU(), skepu::Map< MapFunc >::CPU(), skepu::MapArray< MapArrayFunc >::CPU(), skepu::Reduce< ReduceFunc, ReduceFunc >::CPU(), skepu::Generate< GenerateFunc >::CU(), skepu::Scan< ScanFunc >::CU(), skepu::MapReduce< MapFunc, ReduceFunc >::CU(), skepu::MapOverlap< MapOverlapFunc >::CU(), skepu::MapArray< MapArrayFunc >::CU(), skepu::Map< MapFunc >::CU(), skepu::Reduce< ReduceFunc, ReduceFunc >::CU(), skepu::Generate< GenerateFunc >::OMP(), skepu::Scan< ScanFunc >::OMP(), skepu::MapReduce< MapFunc, ReduceFunc >::OMP(), skepu::MapOverlap< MapOverlapFunc >::OMP(), skepu::MapArray< MapArrayFunc >::OMP(), skepu::Map< MapFunc >::OMP(), and skepu::Reduce< ReduceFunc, ReduceFunc >::OMP().
Vector< T >::size_type skepu::Vector< T >::capacity | ( | ) | const |
Please refer to the documentation of std::vector
.
void skepu::Vector< T >::clear | ( | ) |
Please refer to the documentation of std::vector
.
Referenced by skepu::Generate< GenerateFunc >::CL(), skepu::Scan< ScanFunc >::CL(), skepu::MapArray< MapArrayFunc >::CL(), skepu::MapOverlap< MapOverlapFunc >::CL(), skepu::Generate< GenerateFunc >::CPU(), skepu::Scan< ScanFunc >::CPU(), skepu::MapOverlap< MapOverlapFunc >::CPU(), skepu::MapArray< MapArrayFunc >::CPU(), skepu::Generate< GenerateFunc >::CU(), skepu::Scan< ScanFunc >::CU(), skepu::MapOverlap< MapOverlapFunc >::CU(), skepu::MapArray< MapArrayFunc >::CU(), skepu::Vector< T >::load(), skepu::Generate< GenerateFunc >::OMP(), skepu::Scan< ScanFunc >::OMP(), skepu::MapOverlap< MapOverlapFunc >::OMP(), skepu::MapArray< MapArrayFunc >::OMP(), and skepu::Map< MapFunc >::operator()().
void skepu::Vector< T >::copyDataToAnInvalidDeviceCopy | ( | DeviceMemPointer_CU< T > * | copy, |
unsigned int | deviceID, | ||
unsigned int | streamID = 0 |
||
) |
Used by updateDevice_CU function to copy data to a device copy.. the device copy could be a new one (just created) or an existing one with stale (marked invalid) data it tries to copy data from copies in existing device memory, then from host memory and in the end from other device memories... it can partially copy data from different sources in the process.
copy | it is the actual copy that the data is written to... |
deviceID | id of the device where this copy belongs... |
streamID | id of the CUDA Stream that will use this copy when using MultiStream (will be used if USE_PINNED_MEMORY is defined) |
first check for copies within same device that is overlapping and valid... yes, there could be >1 copies, e.g., 2 overlapping "valid" copies if none of them is modified... 2 non-overlapping "valid" copies if atleast one of them is written...
sizeUpdStr is passed by referece.. will be updated inside called function
if "src" copy has modified contents then copy those contents to current "dst" copy but keep modified flag set for "src" copy now, if u read "dst" copy then no problem, "src" copy has "modified" flag, dst has no such flag if host or other copies need contents, they get from "src" but if u write "dst" copy then also no problem as later code in this function will mark "src" as invalid copy as "dst" has latest modified contents. now if host or other need to copy data they can copy from "dst" keep that
At one point in time, there could be at >one valid copy per each device for a container
if still there exist some parts (ranges) in copy that cannot be found in valid copies present in current device memory...
if main copy (Host) is valid then copy from there as copying from other device memories' valid copies wont be much faster than HTD?
sizeUpdStr is passed by referece.. will be updated inside called function
unfortunately, main copy is invalid so need to look for copies in other device memories...
if peer acces enabled for all of them then satt Bismillah, i.e. can transfer directly from other GPUs copies...
Copies from valid overlapping copies in other device mmeories
sizeUpdStr is passed by referece.. will be updated inside called function
it is posible that some parts are not copied yet (not present in device memories) so can copy them from host
sizeUpdStr is passed by referece.. will be updated inside called function
if peer access is not enabled then copy all overlapping "modified" copies from other device memories to host main copy and then copy it from there it does not guarantee that the main copy is valid as there might be some nonoverlapping copies in current or other devices that are modified and not written back to main copy but atleast it ensures that its safe to copy overlapping parts
Copies all overlapping copies from other devices back to host and mark them as invalid
first copy it back to host... internally set modified flag to false
remove this copy now from list of copies to be updated back to host
stupid condition as copy is not updated inside this loop, m_numOfRanges is not modified here
sizeUpdStr is passed by referece.. will be updated inside called function
now do actual Copy from all possible sources, HTD and DTD from within same and from other devices... internally sets the m_valid flag for this copy
reset ranges to default range which is total copy
References skepu::DeviceMemPointer_CU< T >::copiesOverlapInf(), skepu::DeviceMemPointer_CU< T >::copyAllRangesToDevice(), skepu::DeviceMemPointer_CU< T >::copyInfFromHostToDevice(), skepu::DeviceMemPointer_CU< T >::deviceDataHasChanged(), skepu::Environment< T >::getInstance(), skepu::DeviceMemPointer_CU< T >::isCopyValid(), MAX_COPYINF_SIZE, MAX_GPU_DEVICES, and skepu::DeviceMemPointer_CU< T >::resetRanges().
bool skepu::Vector< T >::empty | ( | ) | const |
Please refer to the documentation of std::vector
.
Referenced by skepu::Map< MapFunc >::operator()().
Vector< T >::iterator skepu::Vector< T >::end | ( | ) |
Please refer to the documentation of std::vector
.
Referenced by skepu::Scan< ScanFunc >::CL(), skepu::MapArray< MapArrayFunc >::CL(), skepu::MapReduce< MapFunc, ReduceFunc >::CL(), skepu::MapOverlap< MapOverlapFunc >::CL(), skepu::Reduce< ReduceFunc, ReduceFunc >::CL(), skepu::Map< MapFunc >::CL(), skepu::Scan< ScanFunc >::CPU(), skepu::MapReduce< MapFunc, ReduceFunc >::CPU(), skepu::MapOverlap< MapOverlapFunc >::CPU(), skepu::Map< MapFunc >::CPU(), skepu::MapArray< MapArrayFunc >::CPU(), skepu::Reduce< ReduceFunc, ReduceFunc >::CPU(), skepu::Scan< ScanFunc >::CU(), skepu::MapReduce< MapFunc, ReduceFunc >::CU(), skepu::MapOverlap< MapOverlapFunc >::CU(), skepu::MapArray< MapArrayFunc >::CU(), skepu::Map< MapFunc >::CU(), skepu::Reduce< ReduceFunc, ReduceFunc >::CU(), skepu::Scan< ScanFunc >::OMP(), skepu::MapReduce< MapFunc, ReduceFunc >::OMP(), skepu::MapOverlap< MapOverlapFunc >::OMP(), skepu::MapArray< MapArrayFunc >::OMP(), skepu::Map< MapFunc >::OMP(), and skepu::Reduce< ReduceFunc, ReduceFunc >::OMP().
void skepu::Vector< T >::flush | ( | ) |
Flushes the vector, synchronizing it with the device then release all device allocations.
void skepu::Vector< T >::flush_CL | ( | ) |
Flushes the vector.
First it updates the vector from all its device allocations, then it releases all allocations.
void skepu::Vector< T >::flush_CU | ( | ) |
Flushes the vector.
First it updates the vector from all its device allocations, then it deletes all allocations.
Vector< T >::proxy_elem skepu::Vector< T >::front | ( | ) |
Please refer to the documentation of std::vector
.
Returns a proxy_elem
instead of an ordinary element. The proxy_elem
usually behaves like an ordinary, but there might be exceptions.
const T & skepu::Vector< T >::front | ( | ) | const |
Please refer to the documentation of std::vector
.
|
inline |
Invalidates (mark copies data invalid) all device data that this vector has allocated.
Referenced by skepu::MapArray< MapArrayFunc >::CPU(), skepu::MapArray< MapArrayFunc >::OMP(), and skepu::Vector< T >::randomize().
bool skepu::Vector< T >::isModified_CU | ( | unsigned int | deviceID | ) |
Can be used to query whether vector is modified on a device or not.
Referenced by skepu::MapReduce< MapFunc, ReduceFunc >::operator()(), skepu::Map< MapFunc >::operator()(), skepu::Scan< ScanFunc >::operator()(), skepu::MapOverlap< MapOverlapFunc >::operator()(), skepu::MapArray< MapArrayFunc >::operator()(), and skepu::Reduce< ReduceFunc, ReduceFunc >::operator()().
bool skepu::Vector< T >::isVectorOnDevice_CU | ( | unsigned int | deviceID | ) |
Can be used to query whether vector is already available on a device or not.
Referenced by skepu::MapReduce< MapFunc, ReduceFunc >::operator()(), skepu::Map< MapFunc >::operator()(), skepu::Scan< ScanFunc >::operator()(), skepu::MapOverlap< MapOverlapFunc >::operator()(), skepu::MapArray< MapArrayFunc >::operator()(), and skepu::Reduce< ReduceFunc, ReduceFunc >::operator()().
|
inline |
Loads the vector from a file.
Reads a variable number of elements from a file. In the file, all elemets should be in ASCII on one line with whitespace between each element. Mainly for testing purposes.
filename | Name of file to save to. |
numElements | The number of elements to load. Default value 0 means all values. |
References skepu::Vector< T >::clear(), and skepu::Vector< T >::push_back().
Vector< T >::size_type skepu::Vector< T >::max_size | ( | ) | const |
Please refer to the documentation of std::vector
.
bool skepu::Vector< T >::operator!= | ( | const Vector< T > & | c1 | ) |
Please refer to the documentation of std::vector
.
References skepu::Vector< T >::updateHost().
T & skepu::Vector< T >::operator() | ( | const size_type | index | ) |
Behaves like operator
[] but does not care about synchronizing with device. Can be used when accessing many elements quickly so that no synchronization overhead effects performance. Make sure to properly synch with device by calling updateHost etc before use.
index | Index to a specific element of the vector. |
bool skepu::Vector< T >::operator< | ( | const Vector< T > & | c1 | ) |
Please refer to the documentation of std::vector
.
References skepu::Vector< T >::size(), and skepu::Vector< T >::updateHost().
bool skepu::Vector< T >::operator<= | ( | const Vector< T > & | c1 | ) |
Please refer to the documentation of std::vector
.
References skepu::Vector< T >::size(), and skepu::Vector< T >::updateHost().
Vector< T > & skepu::Vector< T >::operator= | ( | const Vector< T > & | other | ) |
Please refer to the documentation of std::vector
.
References skepu::Vector< T >::updateHost().
bool skepu::Vector< T >::operator== | ( | const Vector< T > & | c1 | ) |
Please refer to the documentation of std::vector
.
References skepu::Vector< T >::updateHost().
bool skepu::Vector< T >::operator> | ( | const Vector< T > & | c1 | ) |
Please refer to the documentation of std::vector
.
References skepu::Vector< T >::size(), and skepu::Vector< T >::updateHost().
bool skepu::Vector< T >::operator>= | ( | const Vector< T > & | c1 | ) |
Please refer to the documentation of std::vector
.
References skepu::Vector< T >::size(), and skepu::Vector< T >::updateHost().
Vector< T >::proxy_elem skepu::Vector< T >::operator[] | ( | const size_type | index | ) |
Please refer to the documentation of std::vector
.
Returns a proxy_elem instead of an ordinary element. The proxy_elem usually behaves like an ordinary, but there might be exceptions.
const T & skepu::Vector< T >::operator[] | ( | const size_type | index | ) | const |
Please refer to the documentation of std::vector
.
void skepu::Vector< T >::pop_back | ( | ) |
Please refer to the documentation of std::vector
.
void skepu::Vector< T >::push_back | ( | const T & | val | ) |
Please refer to the documentation of std::vector
.
Referenced by skepu::Scan< ScanFunc >::CU(), and skepu::Vector< T >::load().
|
inline |
Randomizes the vector.
Sets each element of the vector to a random number between min
and max
. The numbers are generated as integers
but are cast to the type of the vector.
min | The smallest number an element can become. |
max | The largest number an element can become. |
References skepu::Vector< T >::invalidateDeviceData(), skepu::max(), skepu::min(), and skepu::Vector< T >::size().
|
inline |
Removes the data copies allocated on devices.
void skepu::Vector< T >::reserve | ( | size_type | size | ) |
Please refer to the documentation of std::vector
.
void skepu::Vector< T >::resize | ( | size_type | num, |
T | val = T() |
||
) |
Please refer to the documentation of std::vector
.
Referenced by skepu::Generate< GenerateFunc >::CL(), skepu::Scan< ScanFunc >::CL(), skepu::MapArray< MapArrayFunc >::CL(), skepu::MapOverlap< MapOverlapFunc >::CL(), skepu::Generate< GenerateFunc >::CPU(), skepu::Scan< ScanFunc >::CPU(), skepu::MapOverlap< MapOverlapFunc >::CPU(), skepu::MapArray< MapArrayFunc >::CPU(), skepu::cpu_tune_wrapper_map(), skepu::cpu_tune_wrapper_maparray(), skepu::cpu_tune_wrapper_mapoverlap(), skepu::cpu_tune_wrapper_mapreduce(), skepu::cpu_tune_wrapper_reduce(), skepu::Generate< GenerateFunc >::CU(), skepu::Scan< ScanFunc >::CU(), skepu::MapOverlap< MapOverlapFunc >::CU(), skepu::MapArray< MapArrayFunc >::CU(), skepu::cuda_tune_wrapper_map(), skepu::cuda_tune_wrapper_maparray(), skepu::cuda_tune_wrapper_mapoverlap(), skepu::cuda_tune_wrapper_mapreduce(), skepu::cuda_tune_wrapper_reduce(), skepu::Generate< GenerateFunc >::OMP(), skepu::Scan< ScanFunc >::OMP(), skepu::MapOverlap< MapOverlapFunc >::OMP(), skepu::MapArray< MapArrayFunc >::OMP(), skepu::omp_tune_wrapper_map(), skepu::omp_tune_wrapper_maparray(), skepu::omp_tune_wrapper_mapoverlap(), skepu::omp_tune_wrapper_mapreduce(), skepu::omp_tune_wrapper_reduce(), and skepu::Map< MapFunc >::operator()().
|
inline |
Saves content of vector to a file.
Outputs the vector as text on one line with space between elements to the specified file. Mainly for testing purposes.
filename | Name of file to save to. |
References skepu::Vector< T >::at(), and skepu::Vector< T >::size().
Vector< T >::size_type skepu::Vector< T >::size | ( | ) | const |
Please refer to the documentation of std::vector
.
Referenced by skepu::Generate< GenerateFunc >::CL(), skepu::Scan< ScanFunc >::CL(), skepu::MapArray< MapArrayFunc >::CL(), skepu::MapOverlap< MapOverlapFunc >::CL(), skepu::Generate< GenerateFunc >::CPU(), skepu::Scan< ScanFunc >::CPU(), skepu::MapOverlap< MapOverlapFunc >::CPU(), skepu::MapArray< MapArrayFunc >::CPU(), skepu::Generate< GenerateFunc >::CU(), skepu::Scan< ScanFunc >::CU(), skepu::MapArray< MapArrayFunc >::CU(), skepu::MapOverlap< MapOverlapFunc >::CU(), skepu::Generate< GenerateFunc >::OMP(), skepu::Scan< ScanFunc >::OMP(), skepu::MapOverlap< MapOverlapFunc >::OMP(), skepu::MapArray< MapArrayFunc >::OMP(), skepu::MapReduce< MapFunc, ReduceFunc >::operator()(), skepu::Map< MapFunc >::operator()(), skepu::Scan< ScanFunc >::operator()(), skepu::MapOverlap< MapOverlapFunc >::operator()(), skepu::MapArray< MapArrayFunc >::operator()(), skepu::Reduce< ReduceFunc, ReduceFunc >::operator()(), skepu::Vector< T >::operator<(), skepu::Vector< T >::operator<=(), skepu::Vector< T >::operator>(), skepu::Vector< T >::operator>=(), skepu::Vector< T >::randomize(), and skepu::Vector< T >::save().
void skepu::Vector< T >::swap | ( | Vector< T > & | from | ) |
Please refer to the documentation of std::vector
.
References skepu::Vector< T >::updateHostAndReleaseDeviceAllocations().
Vector< T >::device_pointer_type_cl skepu::Vector< T >::updateDevice_CL | ( | T * | start, |
size_type | numElements, | ||
Device_CL * | device, | ||
bool | copy | ||
) |
Update device with vector content.
Update device with a vector range. If vector does not have an allocation on the device for the current range, create a new allocation and if specified, also copy vector data to device. Saves newly allocated ranges to m_deviceMemPointers_CL
so vector can keep track of where and what it has stored on devices.
start | Pointer to first element in range to be updated with device. |
numElements | Number of elemets in range. |
device | Pointer to the device that should be synched with. |
copy | Boolean value that tells whether to only allocate or also copy vector data to device. True copies, False only allocates. |
References skepu::DeviceMemPointer_CL< T >::copyHostToDevice(), and skepu::Device_CL::getDeviceID().
Referenced by skepu::MapArray< MapArrayFunc >::CL().
Vector< T >::device_pointer_type_cu skepu::Vector< T >::updateDevice_CU | ( | T * | start, |
size_type | numElements, | ||
unsigned int | deviceID, | ||
bool | copy, | ||
bool | writeAccess, | ||
bool | markOnlyLocalCopiesInvalid = false , |
||
unsigned int | streamID = 0 |
||
) |
Update device with vector content.
Update device with a vector range. If vector does not have an allocation on the device for the current range, create a new allocation and if specified, also copy vector data to device. Saves newly allocated ranges to m_deviceMemPointers_CU
so vector can keep track of where and what it has stored on devices.
start | Pointer to first element in range to be updated with device. |
numElements | Number of elemets in range. |
deviceID | Integer specififying the device that should be synched with. |
copy | Boolean value that tells whether to only allocate or also copy vector data to device. True copies, False only allocates. |
writeAccess | specifies whether this copy is going to be read or written... |
markOnlyLocalCopiesInvalid | This is for optimizations in multi-GPU execution, passed to true to only mark parent and local copies within that device memory as invalid... |
streamID | id of the CUDA Stream that will be using this vector range when using MultiStream (define USE_MULTI_STREAM and USE_PINNED_MEMORY) |
m_noValidDeviceCopy
is an optimization flag which is true when there is no valid device copy... used to skip invalidDeviceCopy function call just like updateHost() is only called when m_valid is not set
TODO: BEFORE returning, MARK all other copies as invalid if you are writing this copy and they are overlapping with this copy
add this copy to modified list... This list keeps track of copies that have modified data which is not written back.. so far
First, mark parent copy invalid...
this is possible considering gpu-gpu transfers and in some other cases e.g. map(v1 RW); ... map2(..., v1 Written);
if not fully overlapped then need to transfer as some data should be written back to device memory if fully overlapped then no need to update it as it is overwritten in current copy...
should delete this copy from this list as it needs not to be updated back...
mark copy invalid
TODO: Mark all overlapping copies from all devices as invalid
this is possible considering gpu-gpu transfers and in some other cases e.g. map(v1 RW); ... map2(..., v1 Written);
if not fully overlapped then need to transfer as some data should be written back to device memory if fully overlapped then no need to update it as it is overwritten in current copy...
should delete this copy from this list as it needs not to be updated back...
mark copy invalid
TODO: Update main copy valid flag... set it to "true", i.e., valid, if there exist no modified device copy
Check all overlapping copies from all devices as invalid
References skepu::DeviceMemPointer_CU< T >::copyHostToDevice(), skepu::DeviceMemPointer_CU< T >::doCopiesOverlap(), skepu::Environment< T >::getInstance(), skepu::DeviceMemPointer_CU< T >::isCopyValid(), and MAX_GPU_DEVICES.
Referenced by skepu::MapArray< MapArrayFunc >::CU(), skepu::cuda_tune_wrapper_map(), skepu::cuda_tune_wrapper_maparray(), skepu::cuda_tune_wrapper_mapoverlap(), skepu::cuda_tune_wrapper_mapreduce(), and skepu::cuda_tune_wrapper_reduce().
|
inline |
Updates the vector from its device allocations.
the m_valid logic is only implemented for CUDA backend. The OpenCL still uses the old memory management mechanism
Referenced by skepu::MapArray< MapArrayFunc >::CPU(), skepu::MapArray< MapArrayFunc >::OMP(), skepu::Vector< T >::operator!=(), skepu::Vector< T >::operator<(), skepu::Vector< T >::operator<=(), skepu::Vector< T >::operator=(), skepu::Vector< T >::operator==(), skepu::Vector< T >::operator>(), skepu::Vector< T >::operator>=(), and skepu::Vector< T >::Vector().
|
inline |
First updates the vector from its device allocations. Then invalidates (mark copies data invalid) the data allocated on devices.
|
inline |
First updates the vector from its device allocations. Then removes the data copies allocated on devices.
Referenced by skepu::Vector< T >::swap().
|
friend |
Overloaded stream operator, for testing purposes.
Outputs the vector on one line with space between elements to the chosen stream.