|
|
|
The interface for the classes expect the elements to be stored in row major order though. That will make it easier to interface to other C++ code.
Some classes do not use these basic routines. The classes which
are specialized for different sizes have their own optimized routines which
do the calculations as quickly as possible. That means that loops are avoided
in the classes, the needed statements are instead in a (sometimes) long
serie. The functions doesn't get very big anyway since the number of elements
in those classes are rather few. Loops can't be avoided in the normal classes
since the number of elements aren't known at compile-time.
There are different routines for element-wise assignment, addition,
subtraction, multiplication and division of two arrays with elements or
of an array and a scalar. There are also different variants of those routines
which either put the result in some other array or in one of the already
supplied arrays. Putting the result in the same array as some of the operands
are taken from reduces the code which are needed to index the arrays. All
these routines have unrolled loops in order to increase the speed a little
more.
There are also some routines for calculating dot products and inner products. Two routines can calculate dot products. They differ in that one of them allows the step between elements in one of the arrays to be greater than one. That feature is needed by some other routines, one is the routine that calculate matrix-vector products. Both routines for dot product calculation have unrolled loops.
The routines for calculating inner products use the routines that
calculate dot products. All needed routines for calculating inner products
between any combinations of vectors, matrices and tensors of rank 3 or
4 exist. No routines which produce a result which is a tensor with a higher
rank than 4 exist though (since the result can't be represented in any
way). All routines use the routines for dot product directly or via another
routine for calculating inner products.
Function | Comment |
light_assign | Assign the elements in an array the values or elements from another array, or assign all elements in an array one value. |
light_plus | Assign the elements in an array the element-wise addition of
elements in two other arrays, or
assign the elements in an array the sum of the value of one argument and the elements of another array. |
light_minus | Assign the elements in an array the element-wise subtraction of elements in two other arrays, or assign the elements in an array the difference of the value of one argument and the elements of another array. |
light_mult | Assign the elements in an array the element-wise multiplication of elements in two other arrays, or assign the elements in an array the product of the value of one argument and the elements of another array. |
light_divide | Assign the elements in an array the element-wise division of elements in two other arrays, or assign the elements in an array the quote of the value of one argument and the elements of another array. |
light_plus_same | Add the elements in an array to the elements in another array and put the result in the second array, or add the value of one argument to the elements of an array and put the result in the same array. |
light_minus_same | Subtract the value of the elements in an array from the elements in another array and put the result in the second array. |
light_mult_same | Multiply the elements in an array with the elements in another array and put the result in the second array, or multiply the value of one argument with the elements of an array and put the result in the same array. |
light_dot | Calculate the dot product of two vectors |
light_gemv | Calculate the inner product of a matrix and a vector. |
light_gevm | Calculate the inner product of a vector and a matrix. |
light_gemm | Calculate the inner product of two matrices. |
light_ge3v | Calculate the inner product of a tensor of rank 3 and a vector. |
light_gev3 | Calculate the inner product of a vector and a tensor of rank 3. |
light_ge3m | Calculate the inner product of a tensor of rank 3 and a matrix. |
light_gem3 | Calculate the inner product of a matrix and a tensor of rank 3. |
light_ge33 | Calculate the inner product of two tensors of rank 3. |
light_ge4v | Calculate the inner product of a tensor of rank 4 and a vector. |
light_gev4 | Calculate the inner product of a vector and a tensor of rank 4. |
light_ge4m | Calculate the inner product of a tensor of rank 4 and a matrix. |
light_gem4 | Calculate the inner product of a matrix and a tensor of rank 4. |
All basic routines are listed in the table above. Note that some routines are overloaded and can hence have more than one use, although similar in nature.
All classes use these routines whenever possible and that makes
it easy to change something, e.g. the degree of unrolling, in those routines
in order to customize lightmat for speed for a lightmat-user's own computer.
compiler and calculations.
The normal lightmat calculation routines can normally be inlined
though, and calling a BLAS routine incurs an extra function call. So using
BLAS is probably only a good idea if large vectors, matrices or tensors
are used in the calculations. Your mileage may vary depending on the computer
and compiler that is used. See also preprocessor
defintions.
Routines for other classes usually do nothing more than call special constructors which do the actual work. E.g. the operator+ functions usually only contain a return statement those argument is a call to a constructor. The reason for doing like that is that the C++ compilers that have been tested produce a faster code if the functions return a in the return statement constructed object. The other way to do it would be to create a, for the function, local and uninitialized object, set the elements to their correct values and then return the object. The object must then be copied (since it's a local variable) when it's returned and that takes time. Having a constructor as the argument to the return-statement can be used to avoid that copying. I.e.
classA oper(classA a, classA b){ // Calculation is done in constructor return classA(a, b, oper_enum); }is faster than
classA oper(classA a, classA b) { classA res; // Do calculation here return res; }
A few calculation functions create a local object which is later
returned. This is needed by a few functions that do conversions between
different element types, e.g. from int to double. It is also used by the
indexing functions which does not return a single element but instead,
for example, a vector or matrix. The reason for these exceptions is that
some special copying is needed for which no constructors exist.
The creation of a temporary variable can be avoided if assignment isn't used when saving the result of the calculation. The result can instead be put directly into a new variable that is constructed at the same time.
doubleNN A,B,C; A=B+C; //a temporary variable is needed doubleNN D=B+C; // no temporary variable is needed
This will avoid the construction and destruction of an extra variable
and it will also avoid the call to the assignment operator. A constructor
for the new variable need to be called though, but the result-variable
has to be constructed somewhere anyway.
limiterror(s.size == 3);
It may also be necessary to add new functions, or friends, to existing classes in order to integrate the new class with old classes. Remember to write conversion functions from specialized sizes of classes to the more general classes when appropriate. I.e. it shall be possible to convert an object of a new double22 class to an object of type doubleNN.
In the template notation classes are implemented as templates, for instance
class light4<T> .
The same code can therefore be used for both objects with elements
of type double and int.
The user specifies a variable as
light4<double> foo
This means that foo is a vector with 4 elements of
type double.
Other types of elements can be used, however, some arithmetical
operations with them will not be defined.