Skip to main content

Async: To Be or Not To Be

Just because I have to use a callback-oriented style on the client doesn't mean I want to use a callback-oriented style on the server. Now, before anyone gets all upset and tells me that I don't know the difference between async and a kitchen sink, let me explain :)

The client is necessarily an event-oriented place. If I don't know which button the user is going to press, it makes a lot of sense to use a different callback for each button. The server is different. If I'm waiting for the result of a database query before I can continue processing a request, it sure is convenient to just block and wait.

My key point is that it's important to separate what style you want to code with and what performance and scalability characteristics you want. You shouldn't necessarily pick a callback-oriented style just because you want the performance and scalability characteristics of asynchronous networking APIs.

My favorite two examples are gevent and Erlang, but Go is similar. When you code using gevent or Erlang, your code looks like synchronous, blocking code. However, below the covers, they use asynchronous networking APIs. Now, before anyone tells me that it's impossible, buggy, or that it'll never work, let me point out that these tricks have been in production for decades at Ericsson, Yahoo Groups, and IronPort Cisco.

Furthermore, I should point out that asynchronous networking APIs aren't a perfect fit for every problem. For instance, if your goal is to send 10 gigabytes of information to another server, it turns out that synchronous networking APIs will actually outperform asynchronous networking APIs. The reason asynchronous networking APIs are so popular is because they can handle a larger number of clients than synchronous networking APIs can and because they use less memory than a large number of threads, which each have to have their own stack. gevent and Erlang can handle a large number of clients, don't use up much memory, and don't require a real OS-level stack per client.

So what's my problem with the callback-oriented style? I find it a lot harder to read. I've coded projects in Twisted, Node.js, etc., and I prefer the gevent approach. You get roughly the same performance and scalability characteristics, but with much easier to read code. Of course, what's readable to me may not be readable to other people. I've met people who are perfectly happy using Twisted Web 1 and don't think that callback-oriented code poses any real challenge.

If you're interested in hearing more about my thoughts on async and concurrency, check out my other blog posts, which include a link to my Dr. Dobb's Journal article on Python concurrency.

Comments

verte said…
I know that a lot of arguments about async these days are around performance, but the primary motivation for it actually is conceptual. I'm sure you've read the problem with threads, since it gets thrown around on #python quite a bit, but a better one-page explanation of the conceptual simplicity of async is the distributed computing example in E.

I imagine the sync/async divide is similar to the functional/stateful divide, where there are implementation details that drive performance issues, but the more interesting aspect is a matter of how we think about problems, and what problems become significantly simpler to understand when posed asynchronously. Something you may have considered is what it would take to design a sensible memory model for python or javascript in a threaded vs an async world (if you imagine they are mutually exclusive).
Sam Rushing said…
Event-driven stuff doesn't scale. 8^)

You might be able to write an event-driven HTTP server. But once you try to combine it with an event-driven DNS resolver and an event-driven database client, the state space has exploded.

You might then write a set of tools - help from the compiler or runtime - to essentially reinvent a cooperative threading package. And that works great. But at some point the difference between your threading package and the next one comes down to semantics.

The true Holy Grail of [server] scalability would be to route around the barrier presented by the operating system [which is itself a coroutine system] and have an OS with no kernel/user wall and without artificial limits on scalability. Depending on your politics, you could call it an "in-kernel server" or a "user-space tcp stack".
Peter Zsoldos said…
I don't have enough experience with C# 5's await keyword to judege it properly, but it sure makes for an easier read. I wonder if something like that would be possible with some helper function in python... E.g.: with await(asyncMethodInvoication): process result

And the point of async not always being the right solution is certainly a valid one!
jjinux said…
verte, sorry I haven't read anything more than the abstract for "the problem with threads", and I haven't read "the distributed computing example in E." It's going to take me a while to get to those. In the meantime, do you care to summarize?

I actually wasn't arguing for threads. I don't actually like threads. Erlang has something it calls processes, which I think is ideal. gevent has greenlets which are actually a lot more deterministic than threads.
jjinux said…
Peter, C#'s await keyword reminds me of EventMachine in Ruby. It looks like it's a way to use blocks as callbacks. Is that right?

I do think having blocks makes callback-oriented programming easier to read, but it's still not as easy to read as, say, gevent's approach.
verte said…
The distributed computing example is a summary, so I won't try to summarise it here. The introduction is about a page.

"The problem with threads" deals with the explosive nondeterminism resulting from shared-state concurrency. To be clear, I would also consider gevent to be shared state concurrency*, in that you can't look at a function and be able to tell from its body if it contains a context switch or not - that could be hidden away inside some function that we call.

This is a significant burden on the programmer. As someone who maintains a moderately-sized swing application rife with concurrency bugs, I think I can say, until programmers are forced to think about concurrency from the outset, maintenance is a battle between introducing new code and trying to figure out the way it interacts with existing locks and tasks.

But don't take my word for it: the article mentions a concrete example of an application written by concurrency experts that mysteriously deadlocked once they bought a machine with more cores.

In general, I'm all for runtimes and compilers that figure out details so you don't have to. I like dynamic types, I like garbage collection. But concurrency is a more complicated subject, I think, and it deserves very explicit language from the programmer. (Concurrent Haskell is an interesting example - the language is functional, the concurrency features serve only to give greater control over communication to the programmer).

* important side note: finalisers actually introduce shared state concurrency in many languages, python included. See eg. unexpected concurrency
jjinux said…
verte, thank you for the excellent comment! In general, I agree with you.

gevent is non-deterministic, but it's not as bad as threads which can context switch at any time. Since it can only context switch when doing IO, the problem isn't nearly as heinous. Sure, that's not perfect, but it's a lot easier for me to wrap my brain around than threads.

As for multi-threading in Swing, I wrote some quick tricks here (http://jjinux.blogspot.com/2007/12/python-some-concurrency-tricks.html). Basically, I avoid mutable, shared state like the plague.
gus said…
Since you mentioned it, I'm curious why didn't Ironport use Erlang instead of developing a new concurrency framework for a slow interpreted language like Python?

If you were starting today (2013), do you think Erlang is the right tool for network appliances like Iron port's?

BTW, great blog!

Cheers,

Gus
jjinux said…
Thanks, Gus. Erlang wasn't as popular back then as it is now. My guess is that none of the early IronPort people even knew about it. In contrast, Sam Rushing already knew how to solve the async problem in Python.

Python will never be as good as Erlang at what Erlang does. Hence, for certain network servers, it makes a lot of sense to use Erlang. However, Python has so many other advantages that it probably makes sense to use Python (and gevent) as the "main" language for a company.

Popular posts from this blog

Drawing Sierpinski's Triangle in Minecraft Using Python

In his keynote at PyCon, Eben Upton, the Executive Director of the Rasberry Pi Foundation, mentioned that not only has Minecraft been ported to the Rasberry Pi, but you can even control it with Python. Since four of my kids are avid Minecraft fans, I figured this might be a good time to teach them to program using Python. So I started yesterday with the goal of programming something cool for Minecraft and then showing it off at the San Francisco Python Meetup in the evening.

The first problem that I faced was that I didn't have a Rasberry Pi. You can't hack Minecraft by just installing the Minecraft client. Speaking of which, I didn't have the Minecraft client installed either ;) My kids always play it on their Nexus 7s. I found an open source Minecraft server called Bukkit that "provides the means to extend the popular Minecraft multiplayer server." Then I found a plugin called RaspberryJuice that implements a subset of the Minecraft Pi modding API for Bukkit s…

Apple: iPad and Emacs

Someone asked my boss's buddy Art Medlar if he was going to buy an iPad. He said, "I figure as soon as it runs Emacs, that will be the sign to buy." I think he was just trying to be funny, but his statement is actually fairly profound.

It's well known that submitting iPhone and iPad applications for sale on Apple's store is a huge pain--even if they're free and open source. Apple is acting as a gatekeeper for what is and isn't allowed on your device. I heard that Apple would never allow a scripting language to be installed on your iPad because it would allow end users to run code that they hadn't verified. (I don't have a reference for this, but if you do, please post it below.) Emacs is mostly written in Emacs Lisp. Per Apple's policy, I don't think it'll ever be possible to run Emacs on the iPad.

Emacs was written by Richard Stallman, and it practically defines the Free Software movement (in a manner of speaking at least). Stal…

JavaScript: Porting from react-css-modules to babel-plugin-react-css-modules (with Less)

I recently found a bug in react-css-modules that prevented me from upgrading react-mobx which prevented us from upgrading to React 16. Then, I found out that react-css-modules is "no longer actively maintained". Hence, whether I wanted to or not, I was kind of forced into moving from react-css-modules to babel-plugin-react-css-modules. Doing the port is mostly straightforward. Once I switched libraries, the rest of the port was basically:
Get ESLint to pass now that react-css-modules is no longer available.Get babel-plugin-react-css-modules working with Less.Get my Karma tests to at least build.Get the Karma tests to pass.Test things thoroughly.Fight off merge conflicts from the rest of engineering every 10 minutes ;) There were a few things that resulted in difficult code changes. That's what the rest of this blog post is about. I don't think you can fix all of these things ahead of time. Just read through them and keep them in mind as you follow the approach above.…