Tuesday, February 27, 2007

PyCon: Interactive Parallel and Distributed Computing with IPython

This was my favorite talk, aside from the OLPC keynote.
  • CPUs are getting more parallel these days, they aren't getting necessarily faster.
  • The free ride of relying on Moore's law to solve your performance problems isn't working anymore.
  • We need to embrace parallelism, but it's hard.
  • The IPython physicists have problems that simply can't be solved in a reasonable amount of time using a single CPU.
  • They wanted an interactive, Mathematica-like environment to do parallel computing.
  • They have been able to implement multiple paradigms on their foundation, including MPI, tuple spaces, MapReduce, etc.
  • They're using Twisted.
  • It's like coding in multiple Python shells on a bunch of computers at the same time.
  • Each machine returns the value of the current statement, and they are aggregated into a list.
  • You can also specify which machines should execute the current statement.
  • You can move data to and from the other machines.
  • They implemented something like deferreds (aka "promises") so that you can immediately start using a value in an expression even while it's busy being calculated in the background.
  • They've tested their system on 128 machines.
  • The system can automatically distribute a script that you want to execute.
  • The system knows how to automatically partition and distribute the data to work on using the "scatter" and "gather" commands.
  • It knows how to do load balancing and tasking farming.
  • Their overhead is very low. It's appropriate for tasks that take as little as 0.1 seconds. (This is a big contradiction to Hadoop.)
  • You can "talk to" currently running, distributed tasks.
  • It also works non-interactively.
  • All of this stuff is currently on the "saw" branch of IPython.

1 comment:

Asheesh Laroia said...

Wow, this sounds really slick. I wrote some custom hacks to make parallel jobs with python in the past; I had a Makefile that would scp data to a randomly picked machine, run a job, and copy the results back, and "make -j23" would do the right thing.

I'd be much happier not having to have done that myself, though! It wasn't the most flexible thing in the world....