Skip to main content

Databases: More Atomic Cluster Commits

I'm currently reading the book Scalable Internet Architectures, and I'm enjoying it.

On page 141, he describes two-phase commits:
The basic idea is that the node attempting the transaction will notify its peers that it is about to commit, and they will react by preparing the transaction and notifying the originating node that they are ready to commit. The second phase is noticing any aborts in the process and then possibly following through on the commit (hopefully everywhere)...The two-phase commit is not perfect, but it is considered sufficient for applications such as stock trading and banking...Despite its shortcomings, the only common "correct" multimaster replication technique used today is the 2PC, and, due to its overhead costs, it is not widely used for geographic replications.
I didn't catch the phrase "not perfect" the first time I read that section, so I spent the day wondering how the hell they implemented fully atomic transactions across a cluster of machines. What happens if all the machines agree that they're right about to finish the commit and then one of the machines goes down right before writing the final bit to the transaction log and the others don't? I'll readily admit I'm no database expert, so I figured I must be missing something. I fought with it for a while, and I was very pleased to re-read that section and see the phrase "not perfect".

Well, how could you do a perfect cluster-wide commit with absolutely no race conditions? Let's assume the machines are all near each other since 2PC doesn't work across the country anyway without a ton of lag. I have a silly brute force solution that may not be perfect, but is a bit closer to atomic (i.e. it relies on the hardware in the same way mutexes do):

Start with battery-backed RAID cards. However, design the RAID cards so that they have one input wire and one output wire specifically to address this problem. Now, put all the machines in a circuit using this wire:
_______________
| |
|_N1_N2_N3_N4_|
On each of the RAID cards is a switch. If all the machines open their switches, current can flow. If any one of them closes their switch, current doesn't flow. It's the hardware equivalent of an AND operation; did I mention I'm not a hardware guy? Anyway, my commit system proceeds pretty much the same as the 2PC. When the machines have agreed that they're ready to do the final commit, the kernel setups up the sector to be written to disk and lets the RAID card take over. Then, they each open their switches. When the current starts flowing, the RAID card recognizes this signal to write the sector containing the data for the transaction log saying that the commit took place. If the RAID card is, for some reason, unable to perform its duties, it shuts down and declares that it's broken. I.e. the machine will die, but there won't be inconsistency.

Well, there may be holes in this scheme. Perhaps it was written about 30 years ago just like all the other interesting CS problems. It does require specialized hardware. All those reservations aside, I think this leads to a more atomic cluster commit.

Comments

Anonymous said…
The trick is that there's a master node with a log. If a node goes down before writing that final bit, when it gets back up the master will replay its log and all will be fine.
jjinux said…
What happens if:

The master writes its final log message.

Then the master manages to tell half the nodes that it wrote the final log message.

Then it dies.

The other half of the nodes think that the master did not succeed, so they roll back.

Sorry if this is a silly, basic question. This is all new stuff to me.
Anonymous said…
You should look up the 3 phase commit http://en.wikipedia.org/wiki/Three-phase-commit_protocol
bong said…
You might find the discussion about two-phase commit in Java Transaction Processing interesting especially the parts that discuss what can happen when the commit phase fails. Don't let the Java portions throw you off, they cover transaction processing very well.
jjinux said…
> Java Transaction Processing

I wish I had a copy to read about it. Is there any way you can summarize in a paragraph or two?

Popular posts from this blog

Drawing Sierpinski's Triangle in Minecraft Using Python

In his keynote at PyCon, Eben Upton, the Executive Director of the Rasberry Pi Foundation, mentioned that not only has Minecraft been ported to the Rasberry Pi, but you can even control it with Python. Since four of my kids are avid Minecraft fans, I figured this might be a good time to teach them to program using Python. So I started yesterday with the goal of programming something cool for Minecraft and then showing it off at the San Francisco Python Meetup in the evening.

The first problem that I faced was that I didn't have a Rasberry Pi. You can't hack Minecraft by just installing the Minecraft client. Speaking of which, I didn't have the Minecraft client installed either ;) My kids always play it on their Nexus 7s. I found an open source Minecraft server called Bukkit that "provides the means to extend the popular Minecraft multiplayer server." Then I found a plugin called RaspberryJuice that implements a subset of the Minecraft Pi modding API for Bukkit s…

Apple: iPad and Emacs

Someone asked my boss's buddy Art Medlar if he was going to buy an iPad. He said, "I figure as soon as it runs Emacs, that will be the sign to buy." I think he was just trying to be funny, but his statement is actually fairly profound.

It's well known that submitting iPhone and iPad applications for sale on Apple's store is a huge pain--even if they're free and open source. Apple is acting as a gatekeeper for what is and isn't allowed on your device. I heard that Apple would never allow a scripting language to be installed on your iPad because it would allow end users to run code that they hadn't verified. (I don't have a reference for this, but if you do, please post it below.) Emacs is mostly written in Emacs Lisp. Per Apple's policy, I don't think it'll ever be possible to run Emacs on the iPad.

Emacs was written by Richard Stallman, and it practically defines the Free Software movement (in a manner of speaking at least). Stal…

JavaScript: Porting from react-css-modules to babel-plugin-react-css-modules (with Less)

I recently found a bug in react-css-modules that prevented me from upgrading react-mobx which prevented us from upgrading to React 16. Then, I found out that react-css-modules is "no longer actively maintained". Hence, whether I wanted to or not, I was kind of forced into moving from react-css-modules to babel-plugin-react-css-modules. Doing the port is mostly straightforward. Once I switched libraries, the rest of the port was basically:
Get ESLint to pass now that react-css-modules is no longer available.Get babel-plugin-react-css-modules working with Less.Get my Karma tests to at least build.Get the Karma tests to pass.Test things thoroughly.Fight off merge conflicts from the rest of engineering every 10 minutes ;) There were a few things that resulted in difficult code changes. That's what the rest of this blog post is about. I don't think you can fix all of these things ahead of time. Just read through them and keep them in mind as you follow the approach above.…