Lenny Domnitser’s
domnit.org

⇙ Jump to content

explicit

This is a static archive of the domnit.org blog,
which Lenny Domnitser wrote between 2006 and 2009.

Procs: Run Python Functions in Parallel Processes

Sixth Python post in three weeks, woo!

Python FAQ: “be creative with dividing the work up between multiple processes rather than multiple threads.”

I wrote a library that takes a list of functions and their parameters, executes them all in parallel child processes, and returns the results. It’s called procs.

Code, docs. The code is just one file, and a small one at that.

Using procs

Suppose you have parallelizable code that looks like this:

result1 = slow_function(5, 'hello', foo=3.14)
result2 = slower_function(x=10, y=3)

With procs, instead of calling it directly, you make process specifications that almost look like the call:

from procs import *
proc1 = proc(slow_function, 5, 'hello', foo=3.14)
proc2 = proc(slower_function, x=10, y=3)

Then call them:

results = call([proc1, proc2])
result1, result2 = tuple(results)

results, in the example above, is a generator that yields the results of the functions in the order they were passed to call. They are yielded as soon as they are ready, but in order. This has its upside and downside:

In the likely case that one function is called several times with different parameters, procs also provides pmap, which is a map function interface to procs calling. This will be used in the next example.

Really using procs

Let’s see how it fares on a Wide Finder benchmark. I started with Fredrik Lundh’s wf-6.py, removed more code than I added in, and got the expected slightly slower result. My laptop runs his code in about 0.64 time, negligible clock, mine in about 0.64 time, 0.01 clock. Since it’s more suited to the way procs works, I create a process for each chunk right away. For the given dataset, 4 worker processes are running on my 2 CPU cores.

In my version, all of Fredrik’s parallelization code is taken out, and replaced with:

import procs
from functools import partial
result = procs.pmap(partial(process, FILE),
                    getchunks(FILE, 50*1024*1024))

Notice how nicely functional code can be parallelized—my wide finder can be converted to a serial finder by changing the above lines to:

from functools import partial
result = map(partial(process, FILE),
                     getchunks(FILE, 50*1024*1024))

If you have some serial code like this, procs might be able to easily widen it.

Bugs

Procs doesn’t handle exceptions yet. If a child process fails, it zombifies with no indication that anything went wrong.

There are probably other bugs. Please tell me.

Conclusion

If you are interested in procs, please email me or leave comments with bug reports, patches, and such. I don’t know enough about this problem to put out something I’d call 1.0 by myself, but if Open Source Happens, I think we could have something good.

Again: code, docs.

Update 2007-10-11: Exception handling added.