LightMat implementation notes

These are some implementation notes. Read the User Manual first.

Memory usage in classes

One memory area for each object

Packages like math.h++ and M++ need to dynamically allocate memory for objects. The ability to have different views in different objects of the same data-area means that the objects need to maintain reference counts of how many objects that use one memory-area. This leads to extra overhead when maintaining the reference counts and when allocating/deallocating memory. The lightmat classes try to avoid this by having one memory area for each object (no need for reference counts) and by letting each object have a fixed area on the stack for the elements. However, there is also a performance disadvantage of not being able to share the same data area in several objects. This will make assignment slower. However, see the section on Temporary Variables for a tip on how to avoid unnecessary assignments.

Static and dynamic memory

Since the amount of memory that is statically allocated on the stack need to be fixed at compile time for the code this means that objects with few elements will not use all of the memory on the stack, and that objects with many elements will have to allocate more memory dynamically in any case. However, if the default size of the allocated memory on the stack is chosen carefully, then an increase in performance can be expected without excessive use of memory. The amount of memeory allocated by default is given in the table above???.

Storage for arrays and order of elements

The usual way to store matrices in C++ is as an array of arrays. That means that the elements are stored in row major order, i.e. all elements in the first row are stored first, then the second row and so on. Fortran stores matrices in column major order, i.e. all elements in the first column are stored first etc. Tensors are stored in a similar way. A lot of code for manipulating matrices has been written in Fortran, for example BLAS. Lightmat will therefore also store the elements in column major order internally since that will make it easier to interface to routines like BLAS within the classes.



1
2
3
4
5
6
A 3 x 2 matrix
1 2 3 4 5 6
Row major order
1 3 5 2 4 6
Column major order

The interface for the classes expect the elements to be stored in row major order though. That will make it easier to interface to other C++ code.

Dynamic resizing of objects

When assigning the result of some calculation to an existing object then the size of the object will change its size to the size of the result of the calculation. See Assignment for more information. This can sometimes lead to allocation of more memory during calculations. Dynamic resizing is allowed because it is convenient to be able to create a variable that will contain the result of a calculation without knowing the size of the resulting matrix beforehand. This functionality also does not slow down other operations that can be done on the object.

The basic internal routines for calculation

This chapter applies to int, double and complex arrays.

Most computations in LightMat are based on a set of basic routines for addition, dot product and similar operations. They are all template functions, so they can be used for different types of elements (but currently there are problems to compile template code). These basic routines operate on arrays of elements. Matrices and tensors should be stored in column major order for correct results.

Some classes do not use these basic routines. The classes which are specialized for different sizes have their own optimized routines which do the calculations as quickly as possible. That means that loops are avoided in the classes, the needed statements are instead in a (sometimes) long serie. The functions doesn't get very big anyway since the number of elements in those classes are rather few. Loops can't be avoided in the normal classes since the number of elements aren't known at compile-time.

There are different routines for element-wise assignment, addition, subtraction, multiplication and division of two arrays with elements or of an array and a scalar. There are also different variants of those routines which either put the result in some other array or in one of the already supplied arrays. Putting the result in the same array as some of the operands are taken from reduces the code which are needed to index the arrays. All these routines have unrolled loops in order to increase the speed a little more.

There are also some routines for calculating dot products and inner products. Two routines can calculate dot products. They differ in that one of them allows the step between elements in one of the arrays to be greater than one. That feature is needed by some other routines, one is the routine that calculate matrix-vector products. Both routines for dot product calculation have unrolled loops.

The routines for calculating inner products use the routines that calculate dot products. All needed routines for calculating inner products between any combinations of vectors, matrices and tensors of rank 3 or 4 exist. No routines which produce a result which is a tensor with a higher rank than 4 exist though (since the result can't be represented in any way). All routines use the routines for dot product directly or via another routine for calculating inner products.

Function Comment
light_assign Assign the elements in an array the values or elements from another array, or assign all elements in an array one value.
light_plus Assign the elements in an array the element-wise addition of elements in two other arrays, or
assign the elements in an array the sum of the value of one argument and the elements of another array.
light_minus Assign the elements in an array the element-wise subtraction of elements in two other arrays, or assign the elements in an array the difference of the value of one argument and the elements of another array.
light_mult Assign the elements in an array the element-wise multiplication of elements in two other arrays, or assign the elements in an array the product of the value of one argument and the elements of another array.
light_divide Assign the elements in an array the element-wise division of elements in two other arrays, or assign the elements in an array the quote of the value of one argument and the elements of another array.
light_plus_same Add the elements in an array to the elements in another array and put the result in the second array, or add the value of one argument to the elements of an array and put the result in the same array.
light_minus_same Subtract the value of the elements in an array from the elements in another array and put the result in the second array.
light_mult_same Multiply the elements in an array with the elements in another array and put the result in the second array, or multiply the value of one argument with the elements of an array and put the result in the same array.
light_dot Calculate the dot product of two vectors
light_gemv Calculate the inner product of a matrix and a vector.
light_gevm Calculate the inner product of a vector and a matrix.
light_gemm Calculate the inner product of two matrices.
light_ge3v Calculate the inner product of a tensor of rank 3 and a vector.
light_gev3 Calculate the inner product of a vector and a tensor of rank 3.
light_ge3m Calculate the inner product of a tensor of rank 3 and a matrix.
light_gem3 Calculate the inner product of a matrix and a tensor of rank 3.
light_ge33 Calculate the inner product of two tensors of rank 3.
light_ge4v Calculate the inner product of a tensor of rank 4 and a vector.
light_gev4 Calculate the inner product of a vector and a tensor of rank 4.
light_ge4m Calculate the inner product of a tensor of rank 4 and a matrix.
light_gem4 Calculate the inner product of a matrix and a tensor of rank 4.

All basic routines are listed in the table above. Note that some routines are overloaded and can hence have more than one use, although similar in nature.

All classes use these routines whenever possible and that makes it easy to change something, e.g. the degree of unrolling, in those routines in order to customize lightmat for speed for a lightmat-user's own computer. compiler and calculations.

Using LightMat with BLAS

There also exist specializations of the template routines that calculate dot product and inner products involving vectors and matrices with elements of type double. Those routines call Fortran BLAS routines which are available for many kinds of computers. The BLAS routines are usually very fast since they are highly optimized for the respective computers.

The normal lightmat calculation routines can normally be inlined though, and calling a BLAS routine incurs an extra function call. So using BLAS is probably only a good idea if large vectors, matrices or tensors are used in the calculations. Your mileage may vary depending on the computer and compiler that is used. See also preprocessor defintions.

Calculations with BLAS are supported for double arrays (not for  int and nad npt for complex). 

Optimizing calculations and constructors

The routines that only use the special classes which have predefined sizes create, when possible, the results from calculations with the constructors that initializes all elements to their values.

Routines for other classes usually do nothing more than call special constructors which do the actual work. E.g. the operator+ functions usually only contain a return statement those argument is a call to a constructor. The reason for doing like that is that the C++ compilers that have been tested produce a faster code if the functions return a in the return statement constructed object. The other way to do it would be to create a, for the function, local and uninitialized object, set the elements to their correct values and then return the object. The object must then be copied (since it's a local variable) when it's returned and that takes time. Having a constructor as the argument to the return-statement can be used to avoid that copying. I.e.:

classA oper(classA a, classA b){
// Calculation is done in constructor
return classA(a, b, oper_enum); }
is faster than
classA oper(classA a, classA b)
{
classA res;
// Do calculation here
return res;
}

The constructors that do the calculations call the basic routines in when possible. Constructors that exist are ones that calculate addition, subtraction, multiplication, division, dot product, inner product, transpose, raise to the power and absolute value. There also exist constructors that take a function as an argument and apply it to one or two objects. Those constructors are used by a lot of trigonometric functions.

A few calculation functions create a local object which is later returned. This is needed by a few functions that do conversions between different element types, e.g. from int to double. It is also used by the indexing functions which does not return a single element but instead, for example, a vector or matrix. The reason for these exceptions is that some special copying is needed for which no constructors exist.

Temporary Variables

When doing a calculation and assigning the result to a variable, then a temporary variable usually is created by the compiler. The result of the calculation is put into the temporary variable and the variable is then given as an argument to the assignment operator.

The creation of a temporary variable can be avoided if assignment isn't used when saving the result of the calculation. The result can instead be put directly into a new variable that is constructed at the same time.

  
  doubleNN A,B,C;
  A=B+C;                 //a temporary variable is needed
  doubleNN D=B+C;      // no temporary variable is  needed

This will avoid the construction and destruction of an extra variable and it will also avoid the call to the assignment operator. A constructor for the new variable need to be called though, but the result-variable has to be constructed somewhere anyway.

Unrolling

The loops that do most calculations and copying of data in LightMat are in the source file light_basic.icc. Some compilers may produce faster code if the loops are written in some special way. The loops in light_basic.icc can easily be changed by any programmer if that is the case.

Note on assignments

A lot of these functions use another function named light_assign that does the actual copying of elements. The loop in the function light_assign is unrolled in order to make it faster. The degree of unrolling can also easily be changed since the code for all assignments is in only one place. Also see `The basic routines for calculation'' for more information on light_assign.

Making Extensions to LightMat

Some tips for adding new functionality to LightMat is provided below. As a general rule: look at how the existing functions/classes works and mimic it where appropriate.

Functions

There are some things that can useful to remember when adding a new function to a class and when changing an existing function.

New Classes

When making a new LightMat class, e.g. a new size, then it's easiest to copy one of the existing ones and change it where needed.

It may also be necessary to add new functions, or friends, to existing classes in order to integrate the new class with old classes. Remember to write conversion functions from specialized sizes of classes to the more general classes when appropriate. I.e. it shall be possible to convert an object of a new double22 class to an object of type doubleNN.

Template notation (Not available)

Note: Our experiments with various compilers show that template implementations (SparcCompiler, GNU C++ and MicroSoft Visual C++) vary in syntactical limitations. Furthermore, diagnostics produced by the compilers is not sufficient in order to compile our tests. We spent several days in attempts to go through compilation with no success. Therefore implementation of our library with templates is delayed until better compilers will be produced. You can check whether your compiler produces meaningful messages by setting LIGHTMAT_TEMPLATE preprocessor definition. See preprocessor definition table.

In the template notation classes are implemented as templates, for instance class light4<T> . The same code can therefore be used for both objects with elements of type double and int.
The user specifies a variable as light4<double> foo This means that foo is a vector with 4 elements of type double. Other types of elements can be used, however, some arithmetical operations with them will not be defined.

Non-template notation

Unfortunately, some compilers do not support templates in full scale. We suggest to use classes in non-template notation.


Vadim Engelson
Last modified: Mar 23 1998