- CPUs are getting more parallel these days, they aren't getting necessarily faster.
- The free ride of relying on Moore's law to solve your performance problems isn't working anymore.
- We need to embrace parallelism, but it's hard.
- The IPython physicists have problems that simply can't be solved in a reasonable amount of time using a single CPU.
- They wanted an interactive, Mathematica-like environment to do parallel computing.
- They have been able to implement multiple paradigms on their foundation, including MPI, tuple spaces, MapReduce, etc.
- They're using Twisted.
- It's like coding in multiple Python shells on a bunch of computers at the same time.
- Each machine returns the value of the current statement, and they are aggregated into a list.
- You can also specify which machines should execute the current statement.
- You can move data to and from the other machines.
- They implemented something like deferreds (aka "promises") so that you can immediately start using a value in an expression even while it's busy being calculated in the background.
- They've tested their system on 128 machines.
- The system can automatically distribute a script that you want to execute.
- The system knows how to automatically partition and distribute the data to work on using the "scatter" and "gather" commands.
- It knows how to do load balancing and tasking farming.
- Their overhead is very low. It's appropriate for tasks that take as little as 0.1 seconds. (This is a big contradiction to Hadoop.)
- You can "talk to" currently running, distributed tasks.
- It also works non-interactively.
- All of this stuff is currently on the "saw" branch of IPython.
Tuesday, February 27, 2007
PyCon: Interactive Parallel and Distributed Computing with IPython
This was my favorite talk, aside from the OLPC keynote.