Skip to main content


Showing posts from January, 2009

A REST-RPC Hybrid for Distributed Computing

It's everything the Web was never meant to do...and so much more!

While I was doing some consulting with Warren DeLano of PyMOL, we envisioned a REST-RPC hybrid for use in distributed computing.

Imagine an RPC system built on top of REST. (I know that REST enthusiasts tend to really dislike the RPC style, but bear with me.) That means two things. It means you're calling remote functions, but it also means you should take advantage of HTTP as much as possible. That means you should use URLs, proxies, links, the various HTTP verbs, etc.

Imagine every RPC function takes one JSON value as input and provides one JSON value as output. Such a system could be used with relative easy in a wide range of programming languages, even C. I'm thinking of a P2P system with many servers who talk to each using RESTful Web APIs, but do so in an XML-RPC sort of way. However, instead of using XML, they use JSON.

Here's what I think URLs should look like: http://server:port/path/to/objec…

Logic: Occam's Razor

I just read the summary of Occam's Razor on Wikipedia, and it turns out that most people, including me, don't understand what he was really trying to say. Specifically, it does not mean "All other things being equal, the simplest solution is the best." Here's the quote:Ockham's razor (sometimes spelled Occam's razor) is a principle attributed to the 14th-century English logician and Franciscan friar, William of Ockham. The principle states that the explanation of any phenomenon should make as few assumptions as possible, eliminating those that make no difference in the observable predictions of the explanatory hypothesis or theory. The principle is often expressed in Latin as the lex parsimoniae ("law of parsimony" or "law of succinctness"): "entia non sunt multiplicanda praeter necessitatem", roughly translated as "entities must not be multiplied beyond necessity". An alternative version "Pluralitas non est …

Linux: Fun with Big Files

Recently, I was playing with a 150G compressed XML file containing a Wikipedia dump. Trying to delete the file gave me a fun glimpse into some Linux behavior that I normally wouldn't notice.

I had a job running that was parsing the file. I hit Control-c to kill the job. Then I deleted the file. "rm" returned immediately. I thought to myself, wow, that was fast. Hmm, I'm guessing that all it had to do was unlink the file. I would have figured it would have taken longer to mark all the inodes as free.

I ran "df -h" to see if there was now 150G more free space on my drive. There was no new free space. Hmm, that's weird. I futzed around for a bit. I started cycling through my tabs in screen. I discovered that I had only killed the job that was tailing one of the files, not the actual job itself.

This reminded me that Linux uses reference counting for files. Even if you can't get to a file through the filesystem, a file might still exist becaus…

Personal: Looking for Work

Hey guys, I'm looking for work.

I'd prefer to work part-time from home since I'm already working part-time from home on another startup. I'm better at building startups from scratch than I am at rescuing crufty code. I'm better at engineering scalable systems than I am at whipping out throw-away prototypes in a hurry.

I have clean code, a friendly demeanor, and great references. Here's my resume.

By the way, sorry for the advertisement ;)

Computer Science: The Autoturing Test

Can a personality construct in a virtual world apply a Turing test to itself?

In Neuromancer, William Gibson plays around with the idea of personality constructs. Dixie is a hacker who died, but his "personality" was recorded to a ROM. Within the matrix, you can interact with Dixie, and in fact, Dixie won't know he's a personality construct until you tell him.

Another thing that happens in the book is that Case flatlines. When he flatlines, time slows to a crawl, and he proceeds to "live" within the matrix at a more fundamental level.

My question is, is there some test that Case and Dixie can apply to themselves that will help each of them to figure out who is the real human?

Of course, this question is kind of meaningless at this point. It assumes that we'll someday be able to create personality constructs, but that they won't be the same as "the real thing."

Nonetheless, the deeper question remains. Is there a test that a human and an AI…

Virtualization: VirtualBox

I've been using VMware Fusion, but I decided to give VirtualBox a try. It's from Sun. To summarize:It seems faster than VMware FusionIt's free and mostly open sourceIt's just a bit rougher around the edgesWhat do I mean it's mostly open source? There are two versions. According to their docs:The VirtualBox Open Source Edition (OSE) is the one that has been released under the GPL and comes with complete source code. It is functionally equivalent to the full VirtualBox package, except for a few features that primarily target enterprise customers. This gives us a chance to generate revenue to fund further development of VirtualBox.

Please note that the Open Source Edition does not include an installer or setup utilities, as it is mainly aimed at developers and Linux distributorsWhat this means in practice is that it's not easy to use the open source version since there are no precompiled binaries and no installer. Hence, you're stuck with the free, but not…

Python: Parsing Wikipedia Dumps Using SAX

Wikipedia provides a massive dump containing all edits on all articles. It's about 150gb and takes about a week to download. The file is

Clearly, to parse such a large file, you can't use a DOM API. You must use something like SAX. There is a Python library to parse this file and shove it into a database, but I actually don't want it in a database. Here's some code to parse the data, or at least the parts I care about:

Updated! Fixed the fact that the characters method must apply its own buffering. Fixed an encoding issue.#!/usr/bin/env python

"""Parse the enwiki-latest-pages-meta-history.xml file."""

from __future__ import with_statement

from contextlib import closing
from StringIO import StringIO
from optparse import OptionParser
import sys
from xml.sax import make_parser
from xml.sax.handler import ContentHandler

from blueplate.parsing.tsv import create_def…

Vim: jVi

jVi is a plugin for NetBeans that provides Vim-like key bindings. The good news is that it's close enough to be comfortable instead of frustrating. It's better than most Vi emulation modes (including the one in Komodo Edit) and it's way better than the key bindings provided by NetBeans (of course, that's a matter of opinion). The bad news is that certain key features like rectangular select (Cntl-v) and rewrapping block comments (gq}) don't work. So far, those are my two biggest complaints.

First of all, installing the plugin was painless. I downloaded it using my browser, unzipped it, and installed it via the Tools :: Plugins menu item in NetBeans. Easy peasy.

Next, I went down the list of complaints I had about the Vim key bindings in Komodo Edit and tried each of them in jVi. Many things were fixed. Some still didn't work. Here is a list of my discoveries:

Using ":e filename" to open a file doesn't work.

Using Cntl-o to go back to where yo…

IDE: NetBeans

After trying out Komodo Edit I decided to give NetBeans a whirl. Here's the summary: NetBeans is a pleasant to use, reasonably well-polished IDE that mysteriously seems to be missing certain key features that even Komodo Edit has. If I were to put my finger on it, I'd say that NetBeans is better at being an IDE (doing things such as code completion, code tips, etc.), but has a worse editor (for instance, it lacks a rectangle selection mode and it has no option to rewrap a multi-line comment block).From the Web SiteHere are some high-level bits from the web site along with some of my own comments:In addition to full support of all Java platforms (Java SE, Java EE, Java ME, and JavaFX), the NetBeans IDE 6.5 is the ideal tool for software development with PHP, Ajax and JavaScript, Groovy and Grails, Ruby and Ruby on Rails, and C/C++.

Discover the joys of Python programming with the NetBeans IDE for Python Early Access. Enjoy great editor features such as code completion, semant…

C++: Counting Function Calls

How many function calls are involved in executing this piece of C++ (from a QT project):/**
* Given a QString, safely escape it properly for sh. For example, given
* $`"\a\" return \$\`\"\a\\".
ConfIO::writeString(const QString s)
QString ret;

for (int i = 0; i < s.length(); i++)
QChar c = s[i];

if (c == '$' || c == '`' || c == '"' || c == '\\')
ret += '\\';
ret += c;

return ret;
}If you don't count any function calls made by .length(), etc., I've counted
21 so far!

Python: Builds of PyWebkitGtk and Webkit-Glib-Gtk

I saw this on python-announce, and all I can say is "What the heck?" I think this means you can write a Python application and have it compile down to an Ajax application or a desktop application, but I could be wrong:webkit-glib-gtk provides gobject bindings to webkit's DOM model. pywebkitgtk provides python bindings to the gobject bindings of webkit's DOM model.

files are available for download at:

separate pre-built .debs for AMD64 and i386 Debian are included, for pywebkitgtk and webkit-gtk with gobject bindings to the DOM model. if you have seen OLPC/SUGAR's "hulahop", or if you have used Gecko / XUL DOM bindings, or KDE's KHTMLPart DOM bindings, you will appreciate the value of webkit-glib-gtk. pywebkitgtk with glib/gobject bindings basically brings pywebkitgtk on a par with hulahop.

if you find the thought of pywebkitgtk with glib bindi…