Skip to main content

Python: Some Concurrency Tricks

Here are a few concurrency tricks if you're stuck using threads. I used these tricks years ago to write a Swing application in Jython, and I found them to be helpful enough to warrant a blog post, albeit a few years delayed.

First, let's suppose you have a UI and you want to talk to an external program written in another language that might occasionally block. Use the main thread for the UI and use a separate thread to coordinate with the external program. In general, most things that might block should have their own thread.

Name your threads.

It's likely that certain code should only be run by the UI thread and vice versa. At the top of each method, do an assertion on the thread name.

Avoid sharing data. Sharing data involves mutexes, etc. which is generally painful and easy to mess up. Instead, constrain each bit of data to a single thread. If you need to interact with that data from another thread, "ask the other thread for help".

The way I like to do this is to use the queue module which takes care of its own locking. You can pass requests from one thread to another via a queue. In some cases, I even found it helpful to pass a callback on the queue. That's like saying, "Call this callback, but do it from your own thread."

Much of this approach now falls under the label of "actors". An actor is the combination of an input queue, a thread, and an object. I like to think of them as "living objects that you talk to asynchronously." However, even years before I had heard the term actor or knew how to code in Erlang, these tricks were still useful.

Comments

Anonymous said…
It just so happens I am writing an application using this very architecture. I made a class ThreadedQueueManager(threading.Thread) then a bunch of subclasses off that. Each step of the process is a subclass (pretty much each step interacts with some external resource) which gets a Job from a Queue.Queue, does a little part of the process, and passes it along the pipeline by putting the processed Job into an output Queue.Queue. So far so good.

What I need though is a synchronized Set class. Basically I want a thread which polls the external resources, sees if they are available and puts them on an available Queue.Queue. Think of a kind of poolable resource.

My question is the built in Set class thread safe (beyond the global interpreter lock) or is there a PoolableResource class available somewhere without having to deal with too many low level mutexes, locks, etc?

Sorry for being lazy. I am happy to see this architectural pattern getting some air. It has worked pretty well for me so far creating a resource manager which parallelizes jobs across several subordinate processing resources.
The first ever 'major' python application i wrote - PyPsp - which was kind of a wrapper around ffmpeg so that i could convert videos to play on my psp (it did other stuff too). Well i did something like this. Create another thread that runs ffmpeg using subprocess, then read the output from ffmpeg and insert into a Queue object. The main thread of program was just cmd mainloop.

Alex Martelli also says that "always arrange for a single thread to deal with any given object or subsystem that is external to the program (such as a file, a database, a GUI, or a network connection)." - Python In A Nutshell, page 350.
jjinux said…
> What I need though is a synchronized Set class. Basically I want a thread which polls the external resources, sees if they are available and puts them on an available Queue.Queue. Think of a kind of poolable resource...

> My question is the built in Set class thread safe (beyond the global interpreter lock) or is there a PoolableResource class available somewhere

I seriously doubt that the built-in set class is thread safe by way of mutexes. It's probably thread safe by way of the GIL, as you suggest. Somewhere, somebody has probably made a synchronized list or dict, but I don't know of one specifically. Again, perhaps it doesn't matter thanks to the GIL.

I'm a little confused. If you really have more than one thread trying to write to the set, that violates the whole principal of my post. My basic idea is that you avoid having multiple threads sharing data, with the exception of queues which are thread safe. You can always have multiple threads posting things to the same queue, and you can have a thread listening on the other side of that queue adding things to a set. You can also have multiple threads posting "Give me something from the set" requests onto the queue.
jjinux said…
> Alex Martelli also says that "always arrange for a single thread to deal with any given object or subsystem that is external to the program (such as a file, a database, a GUI, or a network connection)." - Python In A Nutshell, page 350.

Yep, makes sense, although I'm not sure I'd want to do it for files unless NFS is involved ;)
Anonymous said…
> I'm a little confused. If you really have more than one thread trying to write to the set, that violates the whole principal of my post.

What happened in my app was this. I was chugging along fine with Threads and Queues until I needed to write a ResourcePollerManager. This Thread would poll a static list of external resource providers and put their address on a Queue if they were free. So the ResourcePollerManager is the only Thread putting stuff in the Queue, but there might be a couple Threads pulling these resources off.

The problems arose when I realized that I needed a Set not a Queue. The ResourcePollingManager might see that a resource has been free for several minutes and put its address into the Queue multiple times. This would be a bad thing since the consumer threads would pull that address off many times when in fact the resource was not free after the first time it got pulled off. That's how I came to think about Sets, but I wanted the thing to be synchronized beyond the GIL (maybe I just like things to be "correct" or maybe I was hoping to avoid a 1 in 10^6 race condition bug)

So the gist of it is that I am still using the same architectural pattern of Threads passing data through Queues, but in this case Queue would not do what I needed.
jjinux said…
I see. A queue is not a place to store things. It is a way of communicating asynchronously.

If you use a synchronized set, that's cool, but that's not what this post is about. A synchronized set is actually the normal way a Java programmer might set things up. Again, that's not necessarily bad, but it's not what I'm talking about.

If you follow the spirit of what this post is about, then your resource poller thread would keep a private set of available resources. If another thread needed a resource, it would put a request into the queue asking for one. Then, the resource poller would respond with the resource.

The queues are for talking asynchronously, not for storing stuff. In fact, this "actor" approach says that each thread should have an input queue where you put messages for it to digest.

Hence, you give me a message on my queue, and I respond to you with a message on your queue.

Popular posts from this blog

Drawing Sierpinski's Triangle in Minecraft Using Python

In his keynote at PyCon, Eben Upton, the Executive Director of the Rasberry Pi Foundation, mentioned that not only has Minecraft been ported to the Rasberry Pi, but you can even control it with Python . Since four of my kids are avid Minecraft fans, I figured this might be a good time to teach them to program using Python. So I started yesterday with the goal of programming something cool for Minecraft and then showing it off at the San Francisco Python Meetup in the evening. The first problem that I faced was that I didn't have a Rasberry Pi. You can't hack Minecraft by just installing the Minecraft client. Speaking of which, I didn't have the Minecraft client installed either ;) My kids always play it on their Nexus 7s. I found an open source Minecraft server called Bukkit that "provides the means to extend the popular Minecraft multiplayer server." Then I found a plugin called RaspberryJuice that implements a subset of the Minecraft Pi modding API for B

Ubuntu 20.04 on a 2015 15" MacBook Pro

I decided to give Ubuntu 20.04 a try on my 2015 15" MacBook Pro. I didn't actually install it; I just live booted from a USB thumb drive which was enough to try out everything I wanted. In summary, it's not perfect, and issues with my camera would prevent me from switching, but given the right hardware, I think it's a really viable option. The first thing I wanted to try was what would happen if I plugged in a non-HiDPI screen given that my laptop has a HiDPI screen. Without sub-pixel scaling, whatever scale rate I picked for one screen would apply to the other. However, once I turned on sub-pixel scaling, I was able to pick different scale rates for the internal and external displays. That looked ok. I tried plugging in and unplugging multiple times, and it didn't crash. I doubt it'd work with my Thunderbolt display at work, but it worked fine for my HDMI displays at home. I even plugged it into my TV, and it stuck to the 100% scaling I picked for the othe

Creating Windows 10 Boot Media for a Lenovo Thinkpad T410 Using Only a Mac and a Linux Machine

TL;DR: Giovanni and I struggled trying to get Windows 10 installed on the Lenovo Thinkpad T410. We struggled a lot trying to create the installation media because we only had a Mac and a Linux machine to work with. Everytime we tried to boot the USB thumb drive, it just showed us a blinking cursor. At the end, we finally realized that Windows 10 wasn't supported on this laptop :-/ I've heard that it took Thomas Edison 100 tries to figure out the right material to use as a lightbulb filament. Well, I'm no Thomas Edison, but I thought it might be noteworthy to document our attempts at getting it to boot off a USB thumb drive: Download the ISO. Attempt 1: Use Etcher. Etcher says it doesn't work for Windows. Attempt 2: Use Boot Camp Assistant. It doesn't have that feature anymore. Attempt 3: Use Disk Utility on a Mac. Erase a USB thumb drive: Format: ExFAT Scheme: GUID Partition Map Mount the ISO. Copy everything from