A Distributed Sat Solver: Dynamic Worksharing

Since we know that we can safely share work by sharing the contents of the workstack, the next step is to organise our processes so they can cooperate and effectively share work and (spectacularly?) speed up our solver.

An Example With two Processes

We will begin in a very simple scenario where we have two processes, process 1 and process 2, available to do work and a single formula as input.

We will consider a process as having two important states.

If it has work available, it works
If there is no work avaiable, it requests it from another process

We will first start up both of our processes, giving a formula to process 1 allowing it to begin work. Process 1's workstack will initially contain a single entry with the first variable assignment. A single item workstack cannot be shared, so process 2 will have to wait until at least two pieces of work are available.

Process 2 has no work initially, but can request work from process 1. However, there is a chance that this will happen before process 1 has had a chance to produce enough to share. In this case process 1 will respond saying that it has no work to share. Process 2 responds to having it's work request rejected by going to sleep for a while and then requesting work again later. The expectation is that work will be continually generated and will become available in the future. Process 1, meanwhile, will follow the Davis-Putnam algorithm fattening its workstack as it searches for a solution. Now after a brief sleep process 2 can make a second work request and process 1 will most likely have some work to give. After sharing is complete our solver carries on doing twice as much work as before.

This scheme should work for two processes. But clearly, for scalability and our own personal satisfaction, we would like to extend this approach to an arbitrarily large number of cooperating processes.

A Side Note on Termination

It should be noted that in this example we conspicuously neglected an important aspect of our sat solver; termination!

In the case where one process finds a satisfying assignment to the formula being worked on the answer must be communicated back to the caller, while the remaining processes need to be shut down.

In Erlang we can link processes to one another in such a way that terminating one process will cause other linked processes to terminate. This give us a convenient mechanism for halting processes when a satisfying assignment is found. However, in the case where a formula is unsatisfiable this approach is not sufficient.

A formula is known to be unsatisfiable only when every process has an empty workstack. In the two process example we could add some communication that determines whether at least one process has work available. But this approach will not scale well in a many-processes solution. Determining that all work has been exhausted with an arbitrary, or just very large, number of cooperating processes is probably the most exciting and fun aspect of worksharing. It's also a bit fiddly and will have to be covered in a separate post.

Many Processes

In the example of two cooperating processes given above, it was convenient that every process knew about every other process that was working on the same problem. This meant they could easily request work from all other processes and if any was available it would be provided. However, this will not be as effective if we start adding additional processes.

If we allow an arbitrary number of working processes we will need to take a different approach. Linking every process together will prevent scalability as processes spend time and memory dealing with their enormous process lists. For this reason we a better network structure to limit the number of directly linked processes without compromising global communication.

For this purpose we have developed a list network structure for organising processes and allowing (eventual) total communication between all working processes. Working processes are organised in a bi-directionally linked list. The list does not loop so there are always two processes (called the top and bottom respectively) connected to a single process.

When a message, such as a work request, is sent across the network we can easily have the message bounce from process to process along the list until a process with sharable work receives it. If the work request message itself contains the pid (a pid in erlang uniquely identifies a process allowing us to send it messages) of the requesting process we can respond directly without disturbing any other processes.

This does introduce some overhead in having each process mediating messages. But in testing this overhead has been shown to be minimal compared to the work involved in grinding away looking for satisfying assignments to a boolean formula.

As a side note this list structure actually began life as a tree and was later converted into a list. The tree structure has some very charming properties and will be covered in a future post.

Masters, Slaves and Control Issues

The process list network described above is markedly different from the more common master slave architecture used to distribute sat solvers. With our approach it is hoped that we can avoid a strangled master process and facillitate effortless and natural scaling. A lovely dream.

A Distributed Sat Solver

Monday, June 8, 2009

Dynamic Worksharing

No comments:

Post a Comment

Followers

Blog Archive

About Me