PARDS:
A library for PARallel programs with Dataflow Synchronization

Table of contents

(Original version is in Japanese. Still under translation...)

  1. Introduction
    1. Abstract
    2. Environment
  2. How to use
    1. First step
    2. SyncList<T>
    3. SyncQueue<T>
  3. Example of programs
    1. Quick Sort
    2. bzip2
  4. Reference
    1. Global functions
    2. Sync
    3. SyncList
    4. SyncQueue
    5. Other notes
  5. Conclusion
    1. Design policy
    2. Future work

Introduction

This is a library for writing parallel programs (for UNIX OSes). With this library, you can write parallel programs more easily than using other libraries like pthread directly. As a practical example, bzip2 is parallelized using this library.

Abstract

Today, speedup of single processor is becoming difficult, and using multiple processors is a popular way to achieve high performance. This is true not only for EWS, but also for PCs and embedded processors.

However, it is not easy to write programs for parallel machines. Usually, programmers should use libraries like pthread, and need to use lock and/or semaphores. This is not easy, and tend to cause bugs that are quite difficult to debug, because the behavior of the program changes at every program run.

This library offers a way to write parallel programs that is more intuitive and easier. It offers:

For example, if "a" is declared as "Sync<int> a", you can do operations like "a.read()", "a.write(1)". The operation "a.write(1)" makes the contents of "a" 1, and the operation "a.read()" gets the contents of "a". You can do "inter process communication" using this functionality.

Here, the operation "a.read()" stops (blocks) until the operation "a.write(1)" is executed. This is called dataflow synchronization. The write operation can only be applied once for the same variable (more exactly, other operations after the 1st write operation cannot change the contents.)

SyncList<T> is a list of Sync<T>, and SyncQueue<T> is SyncList<T> whose length is limited.

Environment

This library is for C++ and works on UNIX OSes; this library uses "fork", and System V IPC (shared memory and semaphore). Development environment is Red Hat Linux 9, but should work on other UNIXes.

How to use

First step

I will explain the usage using the samples.cc file in the samples directory. Here is the main of samples.cc with comments.

int main()
{
  pards_init();      // ...(main:1) Library initialization
  
  Sync<int> a, b, c; // ...(main:2) Decl. of sync vars

  SPAWN(add(1,b,c)); // ...(main:3) fork add, wait for b, executed 3rd
  SPAWN(add(1,a,b)); // ...(main:4) fork add, wait for a, executed 2nd

  a.write(3);        // ...(main:5) executed 1st

  int v = c.read();  // ...(main:6) wait for add(1,b,c) of (main:3)

  printf("value = %d\n",v);
  pards_finalize();  // ...(main:7) Finalize of the library
}

In order to use the library, you need to call pards_init()(main:1)。 In addition, to finalize the library, you need to call pards_finalize() (main:7).

Synchronization variable are declared like Sync<int> a,b,c(main:2). In this case, these variables can contain values whose type is "int".

The function add(1,b,c) is SPAWNed at (main:3). This means that the function add(1,b,c) is forked as a process. (SPAWN is implemented as an macro).

Here, the function add is defined as follows:

void add(int i, Sync<int> a, Sync<int> b)
{
  int val;

  val = i + a.read(); // ...(add:1) a.read() waits for a to be written
  b.write(val);       // ...(add:2) b.write writes a value
}

This function adds the 1st argument and the 2nd argument, and returns the value as the 3rd argument.The type of the 1st argument is simple "int", and that of 2nd and 3rd argument is Sync<int>.

"a.read()" in (add:1) blocks until the value of a is written. After the value of a is written, it restarts the execution, and get the value of it. After that, the value is added with the 1st argument, and written to the variable val.

In (add:2), val is written to b. This makes the processes that wait for b's value restart its execution.

Back to the main. The 2nd argument of the function add that is forked in (main:3) is not written by any processes. Therefore, this function will block for a while.

Likewise, the 2nd argument of the function add that is forked in (main:4) is not written at this time, this add also blocks.

Then, 3 is written to a in (main:5). This makes the function add that is forked in (main:4) restarts. After execution, 4 is written to b.

After the value is written to b, the function add that is forked in (main:3) also restarts. After execution, 5 is written to c.

The value of c is read in (main:6). This also blocks until a value is written to c. Therefore, it waits for the execution of the function add that is forked in (main:3).

As you can see, inter process communication and synchronization between the processes forked by SPAWN can be realized using Sync<int> variables.

Here, multiple writes to the same variable cannot change the value. This kind of variable is called single assignment variable.

You can write an algorithm like first-come-first-served using this functionality.

Implementation

As mentioned above, SPAWN is implemented as a macro that calls fork() from it.

Sync<T> uses System V IPC to realize inter process communication; shared memory is allocated in pards_init().

In addition, semaphore of System V IPC is used in order to realize block and resume of processes and mutual exclusion of shared memory access.

Sync<T> variable only stores a pointer to shared memory and IDs of semaphores. Therefore, this variable can be passed as value to functions (arguments of function add in sample.cc). Of course, you can pass these variables as pointers or references.

The important thing is that even if you modify the global variable in the SPAWNed function, it does not affect other processes, because we use fork instead of pthread. Changing global variables in threads is a typical reason of bugs that cannot be corrected easily, but our library does not cause such bugs. And SPAWNed functions can read global / local variables that is set before SPAWN, because fork() logically copies all memory spaces (to be exact, the copy occurs only when write to the memory happens).

Allocation and release of resources

If we use many synchronization variables or the program runs for a long time, we need to release resources (shared memory and semaphores). Here, shared memory and semaphores are shared between multiple processes, so it is dangerous to release these resources in the destructor; even the resources is not needed in the process that writes a value to the synchronization variable, the process that reads the value still needs the resources. Therefore, basically in this library, you need to release resources explicitly.

As for variables allocated in the stack, you need to call free(). Of course, free() should be called only when other processes are not referring the resources. Typically, there is one writer process and one reader process, and just after the read is finished, free() can be called. Example of free() is in fib.cc in the samples directory.

If you allocate a synchronization variable using new, not only the value inside of the variable, but also memory for Sync<T> will be stored in the shared memory area. In this case, you can just use delete in order to release both shared resources and memory area for the synchronization variable.

The reason of this specification is that I wanted to make the specification similar to that of SyncList<T>. I will explain SyncList<T> next.

SyncList<T>

SyncList<T> is used in the generator-consumer pattern. In this pattern, one process creates list of values (generator), and the other process uses these values (consumer). By using different processes for generating and consuming lists, pipeline parallel processing becomes possible.

I will explain this using listsample.cc in the samples directory.

int main()
{
  pards_init();

  SyncList<int> *a;      // ...(main:1) declaration of first cell of the list
  a = new SyncList<int>  // ...(main:2) allocation of the list cell

  SPAWN(generator(a));   // ...(main:3) fork generator process
  SPAWN(consumer(a));    // ...(main:4) fork consumer process

  pards_finalize();
}

First,the first "cell" of the list is declared and allocated at (main:1), (main:2). Then, the generator process and the consumer process are forked at (main:3), (main:4). The first cell of the list is passed to the generator process and the consumer process.

Then, let's see the definition of the generator process.

void generator(SyncList<int> *a)
{
  int i;
  SyncList<int> *current, *nxt;  
  current = a;                 // ...(gen:1) assign the argument to current

  for(i = 0; i < 10; i++){
    current->write(i);         // ...(gen:2) write a value to the current list cell
    printf("writer:value = %d\n",i);
    nxt = new SyncList<int>;   // ...(gen:3) allocate new list cell
    current->writecdr(nxt);    // ...(gen:4) set the allocated cell as cdr of the current cell
    current = nxt;             // ...(gen:5) set the allocated cell as the current cell
    sleep(1);                  // ...(gen:6) "wait" to show the behavior
  }
  current->write(i);
  printf("writer:value = %d\n",i);
  current->writecdr(0);        // ...(gen:7) terminate the list using 0
}

The generator process creates a list and inserts values to it. Like Sync<T>, a value can be set to the list cell using write() (gen:2).

The next cell of the list is created using new at (gen:3). Then the cell is connected to the previous cell using writecdr() at (gen:4).

Here, a new cell should be created using "new"; don't connect a cell that is allocated on the stack. This is because the consumer process cannot read the memory if the cell is on the stack. The cell allocated using new is stored in the shared memory, so the consumer process can read it.

Because I need to make "new" of SyncList<T> allocate shared memory, I also made "new" of Sync<T> allocate shared memory.

The list is created by iterating the above process using the for loop. In order to show the behavior, 1 second wait is inserted at the end of the loop (gen:6). The end of the list is terminated by 0 (gen:7).

Then, let's see the definition of the consumer process.

void consumer(SyncList<int> *a)
{
  SyncList<int> *current,*prev;
  current = a;

  while(1){
    printf("reader:value = %d\n",
                current->read()); // ...(cons:1) read the value of the cell and print it
    prev = current;               // ...(cons:2) save the current cell
    current = current->readcdr(); // ...(cons:3) extract the cdr of the current cell, 
                                         and make it the current cell
    
    delete prev;                  // ...(cons:4) delete the used cell
    if(current == 0) break;       // ...(cons:5) check the termination
  }
}

The value of the cell is extracted and shown at (cons:1). Here, this read blocks until the value is written like Sync<T>.

The current cell is saved at (cons:2). Cdr of the current cell is extracted and is made to be the current cell (cons:3). Like read(), readcdr() blocks until cdr is written.

After the cdr is read, the previous cell is no longer needed. So it is deleted at (cons:4). Here, "delete" releases the memory in the shared memory area, and releases the semaphores.

Lastly, termination is checked at (cons:5).

The output of this program should be like this:

writer:value = 0
reader:value = 0
writer:value = 1
reader:value = 1
writer:value = 2
reader:value = 2
...

The consumer process waits for the write of the generator process. Therefore, above output is shown second by second.

Abbreviation

Since the list creation and consumption described above is typical pattern, I prepared abbreviated notation that reduces the amount of codes

Firstly, the operation "create a new list cell, and connect it to the current list cell" is described as follows:

  nxt = new SyncList<int>; 
  current->writecdr(nxt);
  current = nxt;

In order to describe this concisely, there is a create() member function that "creates new SyncList<T> variable, which is connected to the target object, and the newly created variable is returned". Using this member function, the above example can be described as follows:

  current = current->create();

Now, the temporary variable nxt is no longer needed.

Next, the operation "extract cdr from the current cell, and make this as the current cell and delete the previous cell" is described as follows:

  prev = current;
  current = current->readcdr();
  delete prev;

In order to describe this concisely, there is a release() member function that "extracts cdr and delete the cell, then returns the cdr". Using this, the above example can be written as follows:

  current = current->release();

Using these abbreviated notation, you can write programs concisely. The example that uses these notations is in listsample2.cc.