HTTP in Python: which library for what task?
The talk was by a Mercurial guy. He works on Google Code.
When you think of HTTP, what you're probably thinking of is RFC 2616, i.e. HTTP/1.0. HTTP/1.1 has features that you're probably not remembering to handle correctly. This resulted in a bug that took him a very long time to diagnose.
HTTP/1.1 allows pipelining, which means you send all the requests right away and then wait for the responses to come in serialized over the same socket.
Using chunked encoding lets you keep the connection open between requests even if you don't know how much data will be streamed. Did you know that you can specify additional headers between chunks?
Did you know the "100 Continue" response gets sent before you finish sending the body?
httplib (used by urllib2) is very minimal. It doesn't even do SSL certificate validation! It doesn't support keepalive. It doesn't have unit tests.
httplib2 is just a wrapper around httplib that adds some stuff.
There is PycURL. It's based on libcurl, which is the gold standard for HTTP libraries. However, it's not very Pythonic and it has a steep learning curve.
twisted.web.http only supports HTTP/1.0 [or perhaps it doesn't support very much of HTTP/1.1].
The author is working on a new library. It uses select and is thus non-blocking. It has "100 Continue" support. It has lots of unit tests. However, it doesn't support pipelining.
Using httplib (via urllib2) is okay if your needs are simple.
PycURL is awesome if you can tolerate the steep learning curve.
You can get the author's new library from http://code.google.com/p/py-nonblocking-http/.