Recently, I had to improve the performance of mesh networking in a mesh of, say, 10 nodes. The original code used a simple RPC system built using JSON on top of netstring. Every message to every node involved a new connection. On a cluster of 10 nodes, I was getting 30 messages per second.
I enhanced the code by using persistent connections. I also switched from RPC (i.e. using a whole roundtrip that blocks the whole connection) to message passing (i.e. passing a message doesn't necessarily result in a response and doesn't tie up the socket). This improved the performance to 300 messages per second.
Next, my buddy encouraged me to try out ZeroMQ. Man was I amazed! I hit something like 1800 messages per second on a cluster of 10 nodes! I can only imagine what ZeroMQ was doing in order to hit this number. Perhaps it was batching messages more intelligently (from my experience, that's an amazingly effective technique).
I ran the same test on a range of cluster sizes, from 2 to 10. The performance graph had a sawtooth shape. The shape was consistent between runs. I'm inferring that ZeroMQ doesn't try to have one node send the message to all the other nodes. Rather, the message filters through the cluster in a tree-like manner.
Anyway, I'm sorry I can't share the graphs or the code, but let me just say that I was very impressed with ZeroMQ!