This was my favorite talk, aside from the OLPC keynote.
- CPUs are getting more parallel these days, they aren't getting necessarily faster.
- The free ride of relying on Moore's law to solve your performance problems isn't working anymore.
- We need to embrace parallelism, but it's hard.
- The IPython physicists have problems that simply can't be solved in a reasonable amount of time using a single CPU.
- They wanted an interactive, Mathematica-like environment to do parallel computing.
- They have been able to implement multiple paradigms on their foundation, including MPI, tuple spaces, MapReduce, etc.
- They're using Twisted.
- It's like coding in multiple Python shells on a bunch of computers at the same time.
- Each machine returns the value of the current statement, and they are aggregated into a list.
- You can also specify which machines should execute the current statement.
- You can move data to and from the other machines.
- They implemented something like deferreds (aka "promises") so that you can immediately start using a value in an expression even while it's busy being calculated in the background.
- They've tested their system on 128 machines.
- The system can automatically distribute a script that you want to execute.
- The system knows how to automatically partition and distribute the data to work on using the "scatter" and "gather" commands.
- It knows how to do load balancing and tasking farming.
- Their overhead is very low. It's appropriate for tasks that take as little as 0.1 seconds. (This is a big contradiction to Hadoop.)
- You can "talk to" currently running, distributed tasks.
- It also works non-interactively.
- All of this stuff is currently on the "saw" branch of IPython.
Comments
I'd be much happier not having to have done that myself, though! It wasn't the most flexible thing in the world....