Skip to main content

Python: A Response to Glyph's Blog Post on Concurrency

If you haven't seen it yet, Glyph wrote a great blog post on concurrency called Unyielding. This blog post is a response to that blog post.

Over the years, I've tried each of the approaches he talks about while working at various companies. I've known all of the arguments for years. However, I think his blog post is the best blog post I've read for the arguments he is trying to make. Nice job, Glyph!

In particular, I agree with his statements:

What I hope I’ve demonstrated is that if you agree with me that threading has problematic semantics, and is difficult to reason about, then there’s no particular advantage to using microthreads, beyond potentially optimizing your multithreaded code for a very specific I/O bound workload.
There are no shortcuts to making single-tasking code concurrent. It's just a hard problem, and some of that hard problem is reflected in the difficulty of typing a bunch of new concurrency-specific code.

In this blog post, I'm not really disputing his core message. Rather, I'm just pointing out some details and distinctions.

First of all, it threw me off when he mentioned JavaScript since JavaScript doesn't have threads. In the browser, it has web workers which are like processes, and in Node, it has a mix of callbacks, deferreds, and yield. However, reading his post a second time, all he said was that JavaScript had "global shared mutable state". He never said that it had threads.

The next thing I'd like to point out is that there are some real readability differences between the different approaches. Glyph did a good job of arguing that it's difficult to reason about concurrency when you use threads. However, if you ignore race conditions for a moment: I think it's certainly true that threads, explicit coroutines, and green threads are easier to read than callbacks and deferreds. That's because they let you write code in a more traditional, linear fashion. Even though I can do it, using callbacks and deferreds always cause my brain to hurt ;) Perhaps I just need more practice.

Another thing to note is that the type of application matters a lot when you need to address concurrency concerns. For instance, if you're building a UI, you don't want any computationally heavy work to be done on the UI thread. For instance, in Android, you do as little CPU heavy and IO heavy work as possible on the UI thread, and instead push that work off into other threads.

Other things to consider are IO bound vs. CPU bound, stateful vs. stateless.

Threads are fine, if all of the following are true:

  • You're building a stateless web app.
  • You're IO bound.
  • All mutable data is stored in a per-request context object, in per-request instances, or in thread-local storage.
  • You have no module-level or class-level mutable data.
  • You're not doing things like creating new classes or modules on the fly.
  • In general, threads don't interact with each other.
  • You keep your application state in a database.

Sure there's always going to be some global, shared, mutable data such as sys.modules, but in practice Python itself protects that using the GIL.

I've built apps such as the above in a multithreaded way for years, and I've never run into any race conditions. The difference between this sort of app and the app that lead to Glyph's "buggiest bug" is that he was writing a very stateful application server.

I'd also like to point out that it's important to not overlook the utility of UNIX processes. Everyone knows how useful the multiprocessing module is and that processes are the best approach in Python for dealing with CPU bound workloads (because you don't have to worry about the GIL).

However, using a pre-fork model is also a great way of building stateless web applications. If you have to handle a lot of requests, but you don't have to handle a very large number simultaneously, pre-forked processes are fine. The upside is that the code is both easy to read (because it doesn't use callbacks or deferreds), and it's easy to reason about (because you don't have the race conditions that threads have). Hence, a pre-fork model is great for programmer productivity. The downside is that each process can eat up a lot of memory. Of course, if your company makes it to the point where hardware efficiency costs start outweighing programmer efficiency costs, you have what I like to call a "nice to have problem". PHP and Ruby on Rails have both traditionally used a pre-fork approach.

I'm also a huge fan of approaches such as Erlang that give you what is conceptually a process, without the overhead of real UNIX processes.

As Glyph hinted at, this is a really polarizing issue, and there really are no easy, perfect-in-every-way solutions. Any time concurrency is involved, there are always going to be some things you need to worry about regardless of which approach to concurrency you take. That's why we have things like databases, transactions, etc. It's really easy to fall into religious arguments about the best approach to concurrency at an application level. I think it's really helpful to be frank and honest about the pros and cons of each approach.

That being said, I do look forward to one day trying out Guido's asyncio module.

See also:


James said…
This is a really interesting read and I can't recall whether or not I read Glyph's blog post on the matter! I will read it today sometime... I have a great level of respect for Glyph and the work he's done.

It would be interesting for me I think to write about this as wlel since this is now the 10th year of development for a kind of concurrency (application) framework (circuits). I and many developers over the years have tried very hard to make the so-called "brain hurting" of callbacks and deferreds as minimal as possible. In short circuits does not employ the use of callbacks and defereds per say but rather events and promises which I've found much easier to both reason about and follow.

I guess I'll re-read both blog posts (yours and Glyph's) and formulate my own response as well :) Another 2c worth can't hurt? :)

jjinux said…
Sounds good. Nice to meet you, by the way :)

I'd be interested in hearing how deferreds in Twisted differ from promises in Circuits. Usually those terms are treated as synonymous.

Also, registering to listen for an event is a lot like registering yourself with a deferred. At some level, you're still registering a callback, right?

Your description of Circuits reminds me of Flight.js which a buddy of mine wrote.
James said…
For a minute there I thought you both knew me and circuits :) But I mistook your blog for this blog post: -- which had a very similar theme/style to your own blog :)

Re deferreds vs. promises -- I always took deferreeds in Twisted (at least) to be more tightly bound to callbacks (callbacks and errbacks) where you "register" a function that gets called upon succession of failure of some part of a chain.

In contrast circuits (whilst has the notation of callbacks if you will in it's core -- we call them event handlers) has promises which behave more like proxied values which become the value when ready. Since circuits started it's development in ~2002 and became "circuits" the name and branded project in 2004 it shared nothing in common with Twisted in terms of design or behavior.

James said…
Oh yes and nice to meet you too! :)

jjinux said…
I see, so the way your code is called is different. However, you still can't write linear code such as:

value = query_value_from_database()

print value

James said…
Actually in circuits you can :)

James said…
My aploogies for the short reply, but you can do something like this in (3.0 being relased soon):

class App(Component):

def bar(self):
return "Hello World!"

def foo(self):
x = yield

A contributor and trivial example I know; but eh core concepts here show how you could scale this to something far more complex. e.g: firing an event to a database engine that performs several I/O operations and does some CPU bound work before returning a result, waiting for that completion of that event and returning a result.

circuits 3.x utilizes Python generators to create a sort of co-routine based control structure on top of it's event-driven architecture. So you can do things like: wait for an event to occur, call an event synchronously (asynchronously underneath), etc.

This allows us to reason about "event handlers" and what they do with the "event" (data) as they participate in a large system. Scale in terms of complexity is derived from building larger more complex components from simpler ones as demonstrated by circuits.web and many applications written atop this.


Popular posts from this blog

Drawing Sierpinski's Triangle in Minecraft Using Python

In his keynote at PyCon, Eben Upton, the Executive Director of the Rasberry Pi Foundation, mentioned that not only has Minecraft been ported to the Rasberry Pi, but you can even control it with Python. Since four of my kids are avid Minecraft fans, I figured this might be a good time to teach them to program using Python. So I started yesterday with the goal of programming something cool for Minecraft and then showing it off at the San Francisco Python Meetup in the evening.

The first problem that I faced was that I didn't have a Rasberry Pi. You can't hack Minecraft by just installing the Minecraft client. Speaking of which, I didn't have the Minecraft client installed either ;) My kids always play it on their Nexus 7s. I found an open source Minecraft server called Bukkit that "provides the means to extend the popular Minecraft multiplayer server." Then I found a plugin called RaspberryJuice that implements a subset of the Minecraft Pi modding API for Bukkit s…

Apple: iPad and Emacs

Someone asked my boss's buddy Art Medlar if he was going to buy an iPad. He said, "I figure as soon as it runs Emacs, that will be the sign to buy." I think he was just trying to be funny, but his statement is actually fairly profound.

It's well known that submitting iPhone and iPad applications for sale on Apple's store is a huge pain--even if they're free and open source. Apple is acting as a gatekeeper for what is and isn't allowed on your device. I heard that Apple would never allow a scripting language to be installed on your iPad because it would allow end users to run code that they hadn't verified. (I don't have a reference for this, but if you do, please post it below.) Emacs is mostly written in Emacs Lisp. Per Apple's policy, I don't think it'll ever be possible to run Emacs on the iPad.

Emacs was written by Richard Stallman, and it practically defines the Free Software movement (in a manner of speaking at least). Stal…

JavaScript: Porting from react-css-modules to babel-plugin-react-css-modules (with Less)

I recently found a bug in react-css-modules that prevented me from upgrading react-mobx which prevented us from upgrading to React 16. Then, I found out that react-css-modules is "no longer actively maintained". Hence, whether I wanted to or not, I was kind of forced into moving from react-css-modules to babel-plugin-react-css-modules. Doing the port is mostly straightforward. Once I switched libraries, the rest of the port was basically:
Get ESLint to pass now that react-css-modules is no longer available.Get babel-plugin-react-css-modules working with Less.Get my Karma tests to at least build.Get the Karma tests to pass.Test things thoroughly.Fight off merge conflicts from the rest of engineering every 10 minutes ;) There were a few things that resulted in difficult code changes. That's what the rest of this blog post is about. I don't think you can fix all of these things ahead of time. Just read through them and keep them in mind as you follow the approach above.…