The Asynchronous-PRAM Programming Language ForkLight

Source-to-source Compiler for all machines supporting the P4 library

This page is permanently under reconstruction. Last update Nov. 24, 1998 - Christoph W. Kessler

Asynchronous PRAM model

The Asynchronous PRAM (Asynchronous Parallel Random Access Machine) is a MIMD multiprocessor with a shared memory and with a private memory for each processor. Memory access time needs not be uniform, but sequential memory consistency is assumed.

The Asynchronous-PRAM programming language ForkLight

ForkLight is a control-synchronous, parallel programming language for the Asynchronous PRAM model.
Control-Synchronicity means that all processors belonging to the same (active) group are working on the same basic block.
Like its predecessor Fork95, ForkLight is based on ANSI C with extensions for management of shared and private address subspaces and variables, and recursively nested levels of control synchronicity by hierarchical construction of processor subgroups.

The language design of ForkLight has been developed in 1997/98 by Christoph W. Kessler and Helmut Seidl, partially based on their previous work on Fork95.

Documentation for ForkLight:

Example programs

ForkLight Compiler

The ForkLight compiler flcc, written by C. W. Kessler, is a source-to-source compiler that emits C code with calls to the P4 library routines. This offers portability of the generated code over a wide range of parallel architectures, including (multiprocessor) SOLARIS workstations and the SB-PRAM.

The current beta version 1.0 of the compiler, including all sources, is available by ftp here (about 425 KB gzip'ed tar file).

For bug reports or further information please contact me by email at chrke \at
flcc is partially based on the lcc1.9 C compiler.

Hardware platforms:

You need an installation of the P4 library package and an ANSI-C compiler like gcc.

Installation and compiler options (postscript, 1 page)

First Performance Results:

The parallel architectures running P4 that are currently accessible to us are (1) multiprocessor SOLARIS workstations and (2) the SB-PRAM. We have done some measurements for both of them.

1. multiprocessor SUN server

We have tested our compiler on a loaded 4 processor SUN server running SOLARIS 2.5.1.
This machine has no atomic fetch-add accessible via the P4 routines; thus atomic memory access is sequentialized. Nevertheless this configuration is useful for testing purposes.

We obtained the following run times (in milliseconds) on some small example programs.


We have ported the compiler to the SB-PRAM. An important improvement has been reached by exploiting the native fetch-add operators which do not lead to sequentialization, in comparison to standard P4 which does not efficiently support atomic fetch-add or atomic-add.

There is an operational 128 PE prototype of the SB-PRAM at Saarbrücken running the SB-PRAM operating system PRAMOS. Due to some operating system restrictions, only up to 124 PE's can be used for the application program, the other ones (one PE per processor board) are reserved as I/O servers by PRAMOS.

For our mergesort example program using (a) the optimized sequential qsort() function, as listed above and (b) a hand-written sequential quicksort routine, we obtain the following run time results:

p=1 p=2 p=4 p=8 p=16 p=32 p=64
(a) N=1000 573 ms 373 ms 232 ms 142 ms 88 ms 57 ms 39 ms
(b) N=10000 2892 ms 4693 ms 3865 ms 1571 ms 896 ms 509 ms 290 ms

For the parallel quicksort program, using a handwritten sequential quicksort routine, we obtain the following execution times:

p=1 p=2 p=4 p=8 p=16 p=32 p=64
N=1000 1519 ms 807 ms 454 ms 259 ms 172 ms 145 ms 130 ms

This page by Christoph W. Kessler.