PyCon: HTTP in Python: which library for what task?

HTTP in Python: which library for what task?

The talk was by a Mercurial guy. He works on Google Code.

When you think of HTTP, what you're probably thinking of is RFC 2616, i.e. HTTP/1.0. HTTP/1.1 has features that you're probably not remembering to handle correctly. This resulted in a bug that took him a very long time to diagnose.

HTTP/1.1 allows pipelining, which means you send all the requests right away and then wait for the responses to come in serialized over the same socket.

Using chunked encoding lets you keep the connection open between requests even if you don't know how much data will be streamed. Did you know that you can specify additional headers between chunks?

Did you know the "100 Continue" response gets sent before you finish sending the body?

httplib (used by urllib2) is very minimal. It doesn't even do SSL certificate validation! It doesn't support keepalive. It doesn't have unit tests.

httplib2 is just a wrapper around httplib that adds some stuff.

There is PycURL. It's based on libcurl, which is the gold standard for HTTP libraries. However, it's not very Pythonic and it has a steep learning curve.

twisted.web.http only supports HTTP/1.0 [or perhaps it doesn't support very much of HTTP/1.1].

The author is working on a new library. It uses select and is thus non-blocking. It has "100 Continue" support. It has lots of unit tests. However, it doesn't support pipelining.

Using httplib (via urllib2) is okay if your needs are simple.

PycURL is awesome if you can tolerate the steep learning curve.

You can get the author's new library from http://code.google.com/p/py-nonblocking-http/.

Comments

Sam Rushing said…
HTTP/1.1 pipelining etc dates back to 1996. The fact that many implementations are still broken in 2011 is a great demonstration of the folly of making a 400kB RFC. It also shows how hard it is to fix the Unix forking-with-stuff-in-the-buffer problem, which also plagues SMTP.

All protocols should *begin* with a sized packet-stream (like chunking), and add features like transaction ID's and out-of-order replies on top of it.
For posterity, the speaker was mistaken about twisted.web.client. It supports most HTTP/1.1 features.
> HTTP/1.1 pipelining etc dates back to 1996. The fact that many implementations are still broken in 2011 is a great demonstration of the folly of making a 400kB RFC. It also shows how hard it is to fix the Unix forking-with-stuff-in-the-buffer problem, which also plagues SMTP.

Good point.

> All protocols should *begin* with a sized packet-stream (like chunking), and add features like transaction ID's and out-of-order replies on top of it.

Certainly that was a motivating idea for ZeroMQ.
> For posterity, the speaker was mistaken about twisted.web.client. It supports most HTTP/1.1 features.

Thanks for the correctly. I suspected he might be wrong.
Alex, thanks for all the links to the videos :)