Python: Some Concurrency Tricks

Here are a few concurrency tricks if you're stuck using threads. I used these tricks years ago to write a Swing application in Jython, and I found them to be helpful enough to warrant a blog post, albeit a few years delayed.

First, let's suppose you have a UI and you want to talk to an external program written in another language that might occasionally block. Use the main thread for the UI and use a separate thread to coordinate with the external program. In general, most things that might block should have their own thread.

Name your threads.

It's likely that certain code should only be run by the UI thread and vice versa. At the top of each method, do an assertion on the thread name.

Avoid sharing data. Sharing data involves mutexes, etc. which is generally painful and easy to mess up. Instead, constrain each bit of data to a single thread. If you need to interact with that data from another thread, "ask the other thread for help".

The way I like to do this is to use the queue module which takes care of its own locking. You can pass requests from one thread to another via a queue. In some cases, I even found it helpful to pass a callback on the queue. That's like saying, "Call this callback, but do it from your own thread."

Much of this approach now falls under the label of "actors". An actor is the combination of an input queue, a thread, and an object. I like to think of them as "living objects that you talk to asynchronously." However, even years before I had heard the term actor or knew how to code in Erlang, these tricks were still useful.

Comments

Anonymous said…
It just so happens I am writing an application using this very architecture. I made a class ThreadedQueueManager(threading.Thread) then a bunch of subclasses off that. Each step of the process is a subclass (pretty much each step interacts with some external resource) which gets a Job from a Queue.Queue, does a little part of the process, and passes it along the pipeline by putting the processed Job into an output Queue.Queue. So far so good.

What I need though is a synchronized Set class. Basically I want a thread which polls the external resources, sees if they are available and puts them on an available Queue.Queue. Think of a kind of poolable resource.

My question is the built in Set class thread safe (beyond the global interpreter lock) or is there a PoolableResource class available somewhere without having to deal with too many low level mutexes, locks, etc?

Sorry for being lazy. I am happy to see this architectural pattern getting some air. It has worked pretty well for me so far creating a resource manager which parallelizes jobs across several subordinate processing resources.
The first ever 'major' python application i wrote - PyPsp - which was kind of a wrapper around ffmpeg so that i could convert videos to play on my psp (it did other stuff too). Well i did something like this. Create another thread that runs ffmpeg using subprocess, then read the output from ffmpeg and insert into a Queue object. The main thread of program was just cmd mainloop.

Alex Martelli also says that "always arrange for a single thread to deal with any given object or subsystem that is external to the program (such as a file, a database, a GUI, or a network connection)." - Python In A Nutshell, page 350.
> What I need though is a synchronized Set class. Basically I want a thread which polls the external resources, sees if they are available and puts them on an available Queue.Queue. Think of a kind of poolable resource...

> My question is the built in Set class thread safe (beyond the global interpreter lock) or is there a PoolableResource class available somewhere

I seriously doubt that the built-in set class is thread safe by way of mutexes. It's probably thread safe by way of the GIL, as you suggest. Somewhere, somebody has probably made a synchronized list or dict, but I don't know of one specifically. Again, perhaps it doesn't matter thanks to the GIL.

I'm a little confused. If you really have more than one thread trying to write to the set, that violates the whole principal of my post. My basic idea is that you avoid having multiple threads sharing data, with the exception of queues which are thread safe. You can always have multiple threads posting things to the same queue, and you can have a thread listening on the other side of that queue adding things to a set. You can also have multiple threads posting "Give me something from the set" requests onto the queue.
> Alex Martelli also says that "always arrange for a single thread to deal with any given object or subsystem that is external to the program (such as a file, a database, a GUI, or a network connection)." - Python In A Nutshell, page 350.

Yep, makes sense, although I'm not sure I'd want to do it for files unless NFS is involved ;)
Anonymous said…
> I'm a little confused. If you really have more than one thread trying to write to the set, that violates the whole principal of my post.

What happened in my app was this. I was chugging along fine with Threads and Queues until I needed to write a ResourcePollerManager. This Thread would poll a static list of external resource providers and put their address on a Queue if they were free. So the ResourcePollerManager is the only Thread putting stuff in the Queue, but there might be a couple Threads pulling these resources off.

The problems arose when I realized that I needed a Set not a Queue. The ResourcePollingManager might see that a resource has been free for several minutes and put its address into the Queue multiple times. This would be a bad thing since the consumer threads would pull that address off many times when in fact the resource was not free after the first time it got pulled off. That's how I came to think about Sets, but I wanted the thing to be synchronized beyond the GIL (maybe I just like things to be "correct" or maybe I was hoping to avoid a 1 in 10^6 race condition bug)

So the gist of it is that I am still using the same architectural pattern of Threads passing data through Queues, but in this case Queue would not do what I needed.
I see. A queue is not a place to store things. It is a way of communicating asynchronously.

If you use a synchronized set, that's cool, but that's not what this post is about. A synchronized set is actually the normal way a Java programmer might set things up. Again, that's not necessarily bad, but it's not what I'm talking about.

If you follow the spirit of what this post is about, then your resource poller thread would keep a private set of available resources. If another thread needed a resource, it would put a request into the queue asking for one. Then, the resource poller would respond with the resource.

The queues are for talking asynchronously, not for storing stuff. In fact, this "actor" approach says that each thread should have an input queue where you put messages for it to digest.

Hence, you give me a message on my queue, and I respond to you with a message on your queue.