Saturday, December 29, 2007

OpenSocial talk at Google

I went to an OpenSocial talk at Google about a month ago. Sorry it's taken me so long to write a summary. Hopefully it won't be completely out of date ;) Here are a bunch of random notes:
  • The talk was held in the same room that the Bay Area Python Interest Group normally holds its meetings. However, I knew right away that something was different when I got there a half an hour ahead of time, and the room was already filling up. During the meeting, I counted rows and columns, and estimated there were about 200 people present.

  • Google was making a big deal of the meeting. They were providing dinner, which they don't do for our group.

  • Looking at the JavaScript examples, the code looks strangely verbose.

  • Security is not defined in the spec. Dealing with third-party JavaScript is a challenge. Facebook's answer was FBJS. At least at that point in time, OpenSocial didn't have a well-defined answer to that problem.

  • One of the demos crashed.

  • They're taking the lowest common denominator approach and letting people build extensions on top of it.

  • There were something like nine speakers. One thing that was really strange was how many of them had strong foreign accents.

  • If you have an OpenSocial application running on multiple networks, and the same person has an account on multiple networks and uses your app on multiple networks, there's nothing implicit in the spec allowing you to know it's the same user. If one of the networks allows users to sign up without verifying their email address, it'd be really easy to masquerade as another user, totally sabotaging all the OpenSocial applications.

  • Facebook has a policy against caching a user's friends for more than a day. OpenSocial does not define any such policies. It's up to the individual networks. This may prove to be a challenge for app writers and policy enforcers alike.

  • OpenSocket is a Facebook application that acts as a socket for OpenSocial applications. It is iframe based. Facebook said, "We think it's cool." However, it may have legal difficulties because the apps that plug into it must obey Facebook's TOS, yet it has no way to enforce that.

  • The OpenSocial spec still has a lot of ambiguities.

  • Apache Shindig is an open source container for OpenSocial applications.

  • Many people obviously want a way for their identities to span networks. I've argued for a long time that OpenSocial was solving the wrong problem. I don't want something to make it easier for me to write apps for multiple social networks. I want something to make it easier for me to cope with the fact that my friends use multiple networks.

  • Strangely enough, Windows laptops dominated among the speakers. A few of the speakers used Macs, but it was a lower percentage than usual.

  • OpenSocial is an improvement because it let developers move away from screen scraping MySpace, and it lets users get away from cut-and-pasting code snippets into their profiles.

  • It was a very long meeting.

  • One of the speakers was German. He was writing an application to let users collaboratively solve jigsaw puzzles. He took a dig at Americans saying, "Collaborate? We just like to blow things up!" It wasn't a very popular application.

  • A lot of the people who were excited about OpenSocial are developers who were "late to the Facebook party" and viewed OpenSocial as a chance to be ahead of the curve.

Books: Isaac Asimov Predicted Wikipedia

I'm reading Isaac Asimov's book, "The Beginning and the End". I can't get enough Asimov, which is good, considering he's the most prolific author that ever lived.

"The Democracy of Learning" (Chapter 3 of "The Beginning and the End") is a short essay that appeared in "Know" magazine, which was a short-lived magazine that was put out by the makers of the "Encyclopaedia Britannica". It's ironic that Asimov was writing for the "Encyclopaedia Britannica" when he wrote:
And I look forward to the time when computerization will place in every home a terminal connected to some central library which will place, in facsimile, or on the television screen, the resources of human generations at the very fingertips of even the least of humanity. But that, alas, was not in my time.
Well, Asimov, you're right. Wikipedia is incredible, and I'm sorry you missed it.

Tuesday, December 25, 2007

Python: Some Concurrency Tricks

Here are a few concurrency tricks if you're stuck using threads. I used these tricks years ago to write a Swing application in Jython, and I found them to be helpful enough to warrant a blog post, albeit a few years delayed.

First, let's suppose you have a UI and you want to talk to an external program written in another language that might occasionally block. Use the main thread for the UI and use a separate thread to coordinate with the external program. In general, most things that might block should have their own thread.

Name your threads.

It's likely that certain code should only be run by the UI thread and vice versa. At the top of each method, do an assertion on the thread name.

Avoid sharing data. Sharing data involves mutexes, etc. which is generally painful and easy to mess up. Instead, constrain each bit of data to a single thread. If you need to interact with that data from another thread, "ask the other thread for help".

The way I like to do this is to use the queue module which takes care of its own locking. You can pass requests from one thread to another via a queue. In some cases, I even found it helpful to pass a callback on the queue. That's like saying, "Call this callback, but do it from your own thread."

Much of this approach now falls under the label of "actors". An actor is the combination of an input queue, a thread, and an object. I like to think of them as "living objects that you talk to asynchronously." However, even years before I had heard the term actor or knew how to code in Erlang, these tricks were still useful.

Friday, December 14, 2007

Python: Using PyFacebook in Pylons

I overhauled the WSGI, Paste, and Pylons support for PyFacebook. Large portions of it had suffered from bit rot. Everything should be working now. I wrote a demo application in Pylons and it went smoothly. Best of all, I even documented my work ;)

Wednesday, December 12, 2007

Books: Ajax in Action

This is a review of Ajax in Action.

It's amazing how much the JavaScript world has changed.

This book has a relaxing style, and it was enjoyable to read. However, it no longer represents what I think of as "modern" JavaScript. For instance, it doesn't cover closures until appendix B, and even then it tells the reader to avoid them. These days, having studied Dojo, jQuery, and Douglas Crockford's videos, it's clear that closures are at the heart of how modern JavaScript is written.

The copyright for this book is 2006, yet the index doesn't even mention Firebug, YUI, dojo, or jQuery which are now staples of the JavaScript community. Dojo is at least mentioned in the list of Ajax frameworks and libraries, but the others aren't.

This book is an interesting relic from that period when Ajax was first gaining popularity, before the major JavaScript frameworks had gained a foothold. These days, for those wanting to learn modern JavaScript, I recommend watching Douglas Crockford's videos instead.

Thursday, December 06, 2007

It Had Too Many Functions

Just in case you haven't seen this, this is frickin' hilarious:

It Had Too Many Functions

Python: Getting Genshi to Output FBML in Pylons

This took me quite a while to figure out, so I'm going to blog it for the sake of Google. To get Pylons to tell Genshi to output XHTML so that you can output FBML for Facebook, edit your environment.py and do:
# Customize templating options via this variable
tmpl_options = config['buffet.template_options']

# Without this, all the FBML tags get stripped.
tmpl_options['genshi.default_format'] = 'xhtml'
My templates now start with:
<fb:fbml xmlns="http://www.w3.org/1999/xhtml"
xmlns:py="http://genshi.edgewall.org/"
xmlns:xi="http://www.w3.org/2001/XInclude"
xmlns:fb="fbml">

Thursday, November 29, 2007

Web: Flock

I've been trying out Flock lately as an alternative to Firefox. Although it's based on the same rendering engine, I've heard that Flock is more stable. Flock is interesting in the way it integrates with the many social networking and blogging sites I use, but since I'm a minimalist, it's not really my cup of tea. It's been more stable than Firefox, but it doesn't support the Foxmarks plugin, so I can't synchronize my bookmarks with it. This put me in sort of an awkward position since all my bookmarks are in Firefox, and although I was able to import them, I'm not ready to give Firefox and Foxmarks up yet. I've heard that it at least supports Firebug, which is another must have for me.

Well, I've been using Flock for a week or so now, and I can really see why some people would like it. However, it's just not my thing, and it finally crashed on me. It's probably Mozilla's fault that it crashed, but I'm now tempted to switch back to Firefox.

In summary, Flock is cool, but I sure wish Mozilla were smaller and more stable. Personally, I blame C++ ;)

Updated:

I just ran out of disk space, which could be the reason Flock crashed. That is, perhaps, more forgivable ;)

Wednesday, November 28, 2007

Python: Walking Recursive Generators

import types

__docformat__ = "restructuredtext"


def walk_recursive_generators(generator):

"""Walk a tree of generators without yielding things recursively.

Let's suppose you have this:

>>> def generator0():
... yield 3
... yield 4
...
>>> def generator1():
... yield 2
... for i in generator0():
... yield i
... yield 5
...
>>> def generator2():
... yield 1
... for i in generator1():
... yield i
... yield 6
...
>>> for i in generator2():
... print i
...
1
2
3
4
5
6

Notice the way the generators are recursively yielding values. This
library uses a technique called "bounce" that is usually used to
implement stackless interpreters. It lets you write:

>>> def generator0():
... yield 3
... yield 4
...
>>> def generator1():
... yield 2
... yield generator0()
... yield 5
...
>>> def generator2():
... yield 1
... yield generator1()
... yield 6
...
>>> for i in walk_recursive_generators(generator2()):
... print i
...
1
2
3
4
5
6

Look Ma! No recursive yields!

"""

stack = [generator]
while stack:
for x in stack[-1]:
if isinstance(x, types.GeneratorType):
stack.append(x) # Recurse.
break
else:
yield x
else:
stack.pop()


def _test():
import doctest
doctest.testmod()


if __name__ == "__main__":
_test()

Updates:

  • Don't insert at the beginning of the list. Append to the end.
  • Call the generator before passing it. That is more flexible, and safer too since you can check against the GeneratorType.

Friday, November 23, 2007

On Paul Graham and Joel Spolsky

I've often enjoyed reading the works of Paul Graham and Joel Spolsky, but there's always been something that bothered me about them, especially Paul Graham. Paul Graham writes his arguments like a mathematical proof. Each step in the process seems reasonable, and by the time you reach the end, you don't feel like there's any room to disagree. However, I just don't think life is so black and white.

My buddy Alex Jacobson finally explained it to me. They are good "story tellers". Apparently, this is even part of the culture in New York, where Joel is from. Hence, it's enjoyable to listen to their arguments. However, there's a problem with good story tellers. Their tales are often so enjoyable that it's easy to be lulled into a false sense of security and overlook the exaggerations and mis-truths. Like listening to a good talk-show host, it's easy to forget to be objective.

For instance, it's somewhat frustrating to listen to Paul Graham's constant preaching about Lisp in comparison to any other programming language. Now, I like Lisp. I think it's fantastic. However, I think Paul Graham's arguments in favor of Lisp can occasionally be closed minded. For instance, in Beating the Averages, he portrays languages as lying along a single continuum of sophistication:
As long as our hypothetical Blub programmer is looking down the power continuum, he knows he's looking down. Languages less powerful than Blub are obviously less powerful, because they're missing some feature he's used to.
That's a seductive argument. It's so seductive, that it's easy to overlook the assumption that a continuum even makes sense. There are tons of features that Lisp doesn't have that are useful for writing applications. In fact, I like to think there's a master set of language features and each programming language simply picks some subset of that master set. Lisp has macros. Cool. However, Haskell has a really neat type system and lazy evaluation. Where does Haskell lie on the continuum in comparison to Lisp? I will remind you that Haskell too has a macro system (template Haskell). Liskell is a programming language that uses Lisp syntax on top of Haskell in order to enjoy all the benefits of both. Does that mean Liskell is the highest on the power continuum? Maybe, but it's far more likely that there simply is no continuum.

Anyway, I'm not advocating that we stop reading Paul Graham, Joel Spolsky, etc. I'm just advocating that we maintain the use of our own brains. Not everything they say is gospel.

Happy Hacking!

Thursday, November 15, 2007

Books: Founders at Work

I just finished reading Founders at Work: Stories of Startups' Early Days. I've been nibbling away at it over the last five months or so. It's now one of my favorite books. If you work at a startup or are thinking of starting a startup, this book is a must read!

I like to interview, so when I'm looking for a new job, I tend to interview a lot. The last time I was looking for a job, I felt like this book was my personal guidebook to Silicon Valley (which loosely includes San Francisco). So many of the people and companies I was interviewing for were in the book. I felt like I was getting the inside scoop. Even when they weren't in the book, the book gave me insights on what a good startup looks like.

Now that I've finished building Free or Best Offer, it's time for me to look for my next startup. I wonder where the book will lead me next. Although they aren't mentioned explicitly in the book, I'm currently leaning toward Metaweb.

Tuesday, November 13, 2007

Free or Best Offer

Today, we launched our Facebook app Free or Best Offer. I wrote most of the code, whereas my co-worker Alex Jacobson designed most of the features. We've been working on it for the last few months.

In other news, I finally finished reading "Agile Web Development with Rails" cover-to-cover. 685 pages! Man, my head hasn't hurt this bad since I decided to read "Design Patterns: Elements of Reusable Object-Oriented Software" cover-to-cover.

Anyway, it's been quite a day. If you get a chance, checkout my app and get a free beer ;)

Friday, November 09, 2007

OOP: Alan Kay

In "Dreaming in Code" on page 289, Alan Kay, the creator of Smalltalk and of the windowing paradigm, said, "I made up the term object-oriented...and I can tell you, I did not have C++ in mind." In fact, he said that the OOP that we see today is a bastardization of his original ideas.

One thing that he had in mind was that objects would be actors. An actor is an object + a thread + a message queue. Just imagine if each object had its own thread and they passed messages to each other asynchronously. This idea is gaining popularity these days in languages like Erlang, Scala, and Groovy. I like to say that what we have today is one employee (i.e. thread) wearing many hats (i.e. running the code for many objects) vs. one employee per hat.

For a while I've been intrigued by Alan Kay's thoughts on OOP. My buddy Mike Cheponis sent me this, Dr. Alan Kay on the Meaning of "Object-Oriented Programming".

Interestingly enough, the two OO-haters I know actually agree with Kay's original thoughts on OOP.

Wednesday, November 07, 2007

Python: Giving a Talk at Google

I'll be giving a talk on various approaches to concurrency in Python with an emphasis on network servers at the next BayPiggies meeting which is being held at Google this Thursday.

Find out more.

Saturday, October 27, 2007

'fsck' Apple, I'll take my freedom!

<rant mode>
The newest version of OS X just came out, and my buddy was telling me about all its great new features. Many of those features have existed in the Linux world for years; some haven't.

He sent me email saying, "There is a big chasm and OS X is driving at 1000 miles an hour to close it on the Unix side. Now if Linux gets it's act together and does all the things OS X does correctly...cool!" Hmm, if "Linux gets its act together..."

Macs are nice. I won't deny that. However, let's face it. A big part of the success of OS X was that they were able to make use of existing software like FreeBSD, KHTML, bash, Python, gcc, etc. OS X can "drive a 1000 miles an hour" because there's so much outstanding open source code to draw on. The reason why the Linux world can't keep up with all the innovation in OS X is because things like Cocoa, Aqua, and Quartz aren't open source. Frankly, I'd love to take someone else's software, tweak it, package it up in a nice shiny box, and sell it for an exorbitant profit! Sure, they open sourced Darwin, but I need another derivation of BSD like I need another Python Web framework!

My response? 'fsck' Apple, I'll take my freedom!

I want the freedom to run my OS on the cheapest hardware I can find (aka my $375 Compaq).

I want to download my software from the Internet instead of wasting trees on packaging and time on a FedEx truck.

I want to share copies of my software with my buddies.

When it doesn't do what I want, I want to fix it myself so that it does.

When I'm trying to debug a problem in my code, I want to look at the source code for the library I'm using.

I want to write software for my phone without feeling like an outlaw, and I don't want to wait a year to do it.

I want to use my phone on whatever carrier is cheapest.

If I do switch carriers, I don't want to end up with a $300 iBrick.

Having been an open source advocate for years, I'm beginning to see that Stallman might be crazy, but he's also right. Open source software isn't always technically better. Sometimes it's a lot worse. Free software is about freedom.

'fsck' Apple, I'll take my freedom!
</rant mode>

Friday, October 26, 2007

Linux: Xubuntu 7.10 on a Compaq Presario C500

I was frustrated with Ubuntu 7.10 on my Compaq Presario C500, so I thought I'd give Xubuntu a try. So far, I really like it. It's crazy fast, and I have almost a gig of RAM free :)

Like Ubuntu, the non-standard display resolution worked correctly out of the box. Sound works, although it was crackly during install. In Ubuntu, suspend crashed my machine, but hibernate worked; I haven't tried it under Xubuntu.

Note that since the wireless card doesn't work by default, it's best to be plugged into a wired network during install. The installer makes use of the Internet connection to download various things.

Since this is a laptop, it's best to turn on sub-pixel hinting in Applications >> User Interface Preferences.

I'm not sure if it's needed for the instructions below, but I always like to enable all repositories:
  Applications >> System >> Synaptic Package Manager:
Settings >> Repositories:
Click on all of them except source code.
Unclick cdrom.
On the updates tab:
gutsy-security
gutsy-updates
I found out last time that you really don't want to rely on the bcm43xx driver for this wireless card. ndiswrapper is really the way to go:
  apt-get update
apt-get install ndiswrapper-utils-1.9
apt-get install build-essential
apt-get install linux-headers-`uname -r`
wget http://ftp.us.dell.com/network/R151517.EXE
rmmod bcm43xx
modprobe ndiswrapper
unzip -a R151517.EXE
cd DRIVER
ndiswrapper -i bcmwl5.inf
ndiswrapper -l
ndiswrapper -m
echo ndiswrapper >> /etc/modules
echo blacklist bcm43xx >> /etc/modprobe.d/blacklist
Reboot.
Due to the location and sensitivity of the touchpad, I find it necessary to turn off tapping and scrolling:
  apt-get install gsynaptics
Added 'Option "SHMConfig" "true"' to the "Synaptics Touchpad" section of
/etc/X11/xorg.conf and logged back in.
apt-get install gsynaptics
gsynaptics: # As a normal user.
Disabled tapping and scrolling.
Adjusted the sensitivity very slightly or else it gets set to zero on the
next login.
Unfortunately, I have to configure gsynaptics everytime I log in. For some reason, it's not remembering my settings. This is currently my biggest complaint.

Simply plugging in my printer was sufficient to configure it. Nice ;)

By default, plugging in your headphones does not disable the external speakers. However, a friendly reader of my blog posted a workaround:
  echo 'options snd-hda-intel model=laptop' >> /etc/modprobe.d/alsa-base
Reboot.

Thursday, October 18, 2007

Linux: Ubuntu 7.10 on a Compaq Presario C500

I just installed Ubuntu 7.10 (Gutsy Gibbon) on my Compaq Presario C500. Things went really well. Have I mentioned how much I love this little $375 laptop?

My laptop has enough room for Windows Vista (which I never use) and two copies of Linux. I like to keep around the old version of Ubuntu while upgrading to the new in case something goes wrong and I need a working system. This time, it recognized the other copy of Linux and migrated the users and their settings. By settings, I mean the settings for Gaim, Mozilla, and Evolution. This seems like a rather odd feature, considering it didn't copy all of the other files. Nonetheless, it didn't hurt anything.

Note, that it's probably better to be plugged into a wired network during the install so that it can setup repositories and download security updates. I wasn't, so I had to setup the repositories later.

Happily, the weird resolution (1280x800) just worked this time. Unfortunately, attempting to suspend crashed my machine, but hibernating still works. Sound works, even though it was crackly during install. Using my headphones still doesn't disable the speakers, which is a known bug in ALSA.

Here are my instructions for getting wireless to work:
  Enable repositories:
System >> Administration >> Synaptic Package Manager:
Settings >> Repositories:
Click on all of them except source code.
Unclick cdrom.
On the updates tab:
gutsy-security
gutsy-updates
System >> Administration >> Restricted Driver Manager:
Click enabled.
Download from Internet.
At this point, my blue wireless light came on.
If it doesn't, try pressing the wireless button.
Click on the correct icon on your panel to pick an access point.
My buddy, Adam Ulvi, has the same laptop, but is having more trouble than I am with his wireless card. However, it was also giving him a hard time in Ubuntu 7.04.

Due to the location and sensitivity of the touchpad, I find it necessary to turn off tapping and scrolling:
  apt-get install gsynaptics
Added 'Option "SHMConfig" "true"' to the "Synaptics Touchpad" section of
/etc/X11/xorg.conf and logged back in.
System >> Preferences >> Touchpad:
Disabled tapping and scrolling.
Adjusted the sensitivity very slightly or else it gets set to zero on the
next login.
Because this is a laptop, the fonts look better if you turn on subpixel smoothing in System >> Preferences >> Appearance. Note that the location of the font preferences has changed. While you're in there, you can click on the "Visual effects" tab if you want to turn on more eye candy.

Anyway, I'm happy. I've been waiting for this release for like a month, so today kind of felt like Christmas ;)

Monday, October 15, 2007

Computer Science: What's Wrong with CS Research

I absolutely love this blog post: What's wrong with CS research.

I'm a wannabe language designer. I've written three articles on Haskell, but I adore Python. I've been thinking of going back to get my Ph.D so that I can try to move the industry forward. I could never figure out why programming language research had to be so dang complex or mathematized.

A lot of his points matched the points I made in one of my articles, Everything Your Professor Failed to Tell You About Functional Programming, especially in the "What's Up with All the Math?" section.

I'm so glad that I read this post! I feel like I've been set straight. Now I know that hanging out with Guido is probably more useful than trying to understand all those crazy research papers ;)

Monday, September 24, 2007

Link: Web 2.0 ... The Machine is Us/ing Us

Kelly Yancey linked to this video: Web 2.0 ... The Machine is Us/ing Us. Just in case you missed it like I did, I'm linking to it too ;)

Friday, September 21, 2007

Browsers

So it seems like Firefox is having problems and everyone has been complaining about it a lot. I sure hope they fix it quickly. It's crashing on me constantly when I indulge in my YouTube addiction, and it uses up an ever-increasing amount of memory. A lot of my friends are enjoying Opera, but I just can't bring myself to install a proprietary browser.

I decided to give KDE and Konqueror another shot. I'm actually pretty pleased with it. Although Konqueror doesn't support two of my favorite Web sites, GMail and YouTube, it is very stable and very snappy. Even better, it uses half as much memory. It has a menu option to open the current page in Firefox, which is helpful for the times it doesn't work. This matches what a lot of Mac users do: they use a mix of Firefox and Safari.

On the other hand, something strange happened when I was installing KDE. I accidentally uninstalled the Ubuntu flashplugin-nonfree package. Today, when I went to YouTube, Firefox asked me to install the Flash plugin locally (i.e. not system-wide). Since all I had to do was hit OK, I did. Voila, Firefox stopped crashing all the time. Weird.

Wednesday, September 12, 2007

Computer Science: Prototypal Match Templates

In object-oriented programming languages, you can subclass an existing class and override a few of its methods. This allows you to take an existing piece of code and tweak it for your own use. However, it's only as granular as the methods that you are overriding. If you want to change one line in a 30 line method, you either have to refactor that 30 line method into several methods (which is the right thing to do if you're in control of the code) or you have to copy the 30 lines and modify that one line (which may be the only thing you can do if you're not in control of the code). Sometimes I actually do both. If I'm using a third-party library that has a 30 line function that I want to change one line of, I copy the whole function into my class, and then refactor it there as if I were refactoring the superclass.

Genshi has a cool mechanism called match templates. I assume XSLT has this too, but since I don't know XSLT, I can't say for certain. Genshi's mechanism let's you write an HTML template and say things like "Every time you see HTML that matches this XPath expression, replace it with this HTML". It turns out that this is a really flexible way of doing templating. It makes it really easy to setup a global look-and-feel and then customize it however you want on a per-template basis. You just write match templates that "tweak" the global look-and-feel. Unlike the template design pattern, the person writing the global look-and-feel doesn't need to do anything to set you up. He doesn't need to create "hook" divs for you to override or anything like that. You can tweak anything you like.

I wonder if the same thing might be useful as a replacement for object-oriented inheritance. Instead of subclassing a class and then overriding some of its methods, you subclass a class and then write match templates that "tweak" the code in the superclass. I think "prototypal match templates" are a good name for this, because you're taking a prototype piece of code and then tweaking it to your needs as if you were doing text substitutions. You would need something like XPath that would make sense for the programming language you're using, but that's not too hard to imagine.

Ok, let me show you what I have in mind. Let's start with how I would do things today:
class Greeter:

"""Let's pretend this is in a third-party module."""

def greet(self, sex):
print 'Howdy,',
if sex == 'female':
print 'good looking!'
else:
print 'stranger.'
Here's my subclass:
class PoliticallyCorrectGreeter(Greeter):

def greet(self, sex):
"""I either have to refactor or I have to duplicate code here."""
print 'Howdy,',
if sex == 'female':
print 'person of the opposite sex.'
else:
print 'stranger.'
If I had prototypal match templates (including some sort of XPath-like syntax for Python syntax), I could write something like:
class PoliticallyCorrectGreeter(Greeter):

match def[name='greet']/if/print[args[0]]:
'person of the opposite sex.'
Ok, I can imagine that many people are going to hate this idea. That's why I'm turning off comments...just joking ;)

One valid complaint is that this breaks encapsulation. I'm overriding a method in a way that requires knowledge of the implementation. That's a fair point. However, I'd like to punt on this issue. When I'm subclassing something, I often need to understand the implementation of the superclass anyway to do what I need to do. I think that if you subclass a class, you're "closer" to that class than if you were just using it. If the superclass's implementation changes, it'll break my code. That's okay. I can look at how it's changed and fix it. That's just a normal part of my life as a modern programmer. Furthermore, I think there are smart ways to use this feature and not-so-smart ways to use this feature. It's a hammer--don't hit yourself over the head with it ;)

Next, I'm sure there's a Lisp programmer out there somewhere saying, "Yeah, been there done that. Haven't you heard of macros?" That's a good point too. Lisp is nice because the syntax tree is just Lisp data. That's one of the nicest things about Lisp syntax. However, an XPath-like syntax for navigating an AST for, say, Python would let Python programmers use some of the same tricks that Lisp programmers use. McCarthy said, "Any sufficiently complicated C or Fortran program contains an ad-hoc, informally-specified bug-ridden slow implementation of half of Common Lisp," but I sure do like the way that Python makes Lisp-based ideas more readable and accessible to the masses. Back when Lisp was created, doing things like what I'm talking about just didn't make sense. Making the AST normal Lisp data was a brilliant move. These days with strong reflection in scripting languages, it does make sense to play with the AST in your code. You can have your syntax and tweak it too :-D

Tuesday, September 11, 2007

Python: PyWeek


I just finished PyWeek! (Here's the code.) It's a contest where you have one week to write a video game using PyGame. We also used PGU, which is a pretty helpful library for writing video games. This time, my buddy Adam Ulvi participated with me. We took two days off from work and wrote about 800 lines of code. I'm proud to say we created what I think is a pretty impressive and fun game.

I leaned pretty heavily on object-oriented programming this time. I know a lot of people like to talk smack about OOP, but I often find it helpful. The player, the enemies, and even the artillery all subclass a class called SuperSprite that takes care of things like integrating per-frame movement, per-frame animation, collisions, etc. OOP lets me say "these guys should all act exactly the same except in these slightly different ways." Often times, the subclass is no more than a few class-level constants, like a different image. Sometimes they behave slightly differently, which is achieved by overriding a single method.

Since I'm also a fan of functional programming, I also relied heavily on closures for doing animation. Each subclass of SuperSprite has an animator_func function. It's a pointer to whatever function I happen to be using for animation at the time. Hence, when you destroy an enemy, I set that enemy's animator_func to a closure that animates the enemy's explosion:
    def create_destroyed_animator(self):

"""This is the animator for when the enemy is damaged.

And by "damaged" I mean exploding. This function returns a closure.

"""

def f():
f.count += 1
n = f.count / 3
if n <= 2:
self.image = self.g.images['explosion-%s' % n][0]
else:
self.after_destroyed_func()

def task():
self.invincible = True

play_wav('explosion.wav')
f.count = 0
self.g.post_frame_tasks.append(task)
return f
The closure plays number games with the frame count to switch images every few frames and then finally go away. This made it really easy to animate explosions using three successive images. We used this same trick to animate a bunch of different things.

Another neat thing is that the turrets always turn to face you when they shoot. My trigonometry is pretty rusty and I was pretty tired when I came up with that code, so I'm just glad that I was able to remember sohcahtoa.

All in all, I'm pretty thrilled about what we came up with. There are many programmers in the world who are far more talented than I am, but it's nice to actually create something in so short a time span and say, "Hey, look what I can do!"

Thursday, September 06, 2007

Python: Useful Utility for PGU's leveledit

I'm participating in PyWeek right now, and I'm using PGU. If you're not using PGU, you can skip this post.

If you're like me, you sometimes get confused about when you're editing the tiles and when you're editing the background. My buddy drew an entire level, and the tiles and background were totally messed up. Rather than redo the entire level, I wrote a little utility to force all the tiles into the background. It's quick-and-dirty, but quite useful when you need it:
#!/usr/bin/env python
#
# Copyright 2007 Adam Ulvi, Shannon Behrens
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.

"""Take a TGA file, and force any tiles onto the background.

This is to make up for UI interface "difficulties" in PGU's leveledit.

"""

import os
import sys

import pygame

__docformat__ = "restructuredtext"


try:
if (not len(sys.argv) == 3 or
not os.path.isfile(sys.argv[1])):
raise ValueError
except: # Catch all exceptions, not just ValueErrors.
print >> sys.stderr, "usage: force_background.py INPUT.tga OUTPUT.tga"
sys.exit(2)

in_f, out_f = sys.argv[1], sys.argv[2]
in_img = pygame.image.load(in_f)
out_img = in_img.copy()
w, h = in_img.get_width(), in_img.get_height()
for y in range(h):
for x in range(w):
(tile, bg, code, alpha) = in_img.get_at((x, y))
if tile:
bg, tile = tile, 0
out_img.set_at((x, y), (tile, bg, code, alpha))
pygame.image.save(out_img, out_f)

Saturday, August 25, 2007

Computer Science: Smart People Have Weird Hangups

Have you ever noticed that smart, interesting people have weird technical hangups? Often, they take a good idea to its logical conclusion in such a manner that it dominates their lives. For instance:

I'm an open source fanatic. I'll put up with a lesser product if it means the difference between being open source or not. For instance, I think Apple has a better desktop experience than Ubuntu does, and I also think they have slicker laptops than Dell has. However, I refuse to buy a Macbook because it's not open source, despite the fact that all the people around me have Macbooks--even my heros Guido van Rossum and Bram Moolenaar.

My buddy Mike C. hates OOP. Mike's a wicked sharp guy from MIT, so he's earned the right to his opinion. (As an aside, it's strange how vehement many Lisp hackers hate OOP, despite the fact that OO systems exist for Common Lisp.) It'd be one thing if he were simply a fan of Scheme over Java, but Mike often codes in Python and refuses to use OOP.

My co-worker Alex J. has many strange technical hangups. He's also been coding in Haskell for nine years, which I think qualifies him for "smart and interesting". Alex hates RDBMSs because of the way they use disks. Clearly in the 70s, RDBMSs were an important way to abstract disk usage, but he argues that memory is plentiful enough these days that we should no longer optimize for disk usage. The thing that irritates him most is that when you query a database, you don't know whether it's going to be able to answer the query from memory or have to incur several 9ms seek time penalties in order to answer the query. He calls this a "leaky abstraction". Sounds somewhat reasonable right? Except Alex takes it to the point where it literally pains him to work in any company that uses an RDBMS.

I have a previous co-worker named Jesse M. who hates the Web. He's a senior architect at IronPort Systems; he's the type of guy who can get big things done. Jesse's a user interface purist and he argues that the Web is a fundamentally broken user interface. The fact that every Web site looks and behaves slightly differently is horrible from a user interface perspective. His pet peeve is that text areas have scroll bars while the page itself also has a scroll bar. Personally, I had never thought to be irritated by that. It's reasonable for a user interface purist to gripe about the Web, but Jesse takes it to what might be considered unreasonable levels. He almost entirely refuses to use the Web. He occasionally has to write scripts to scrape data from Web pages so that he can avoid using a Web browser. In fact, he refused the adoption of a company wiki because that would require a Web browser.

My buddy Sam R. loves asynchronous networking, but hates writing asynchronous code. These days Erlang is popular enough that people are beginning to see that you can write asynchronous code without breaking everything into callbacks. However, Sam figured this stuff out years ago. He was the inspiration behind Stackless Python and wrote the mail server for eGroups (i.e. Yahoo Groups). There are cases where the performance benefits of asynchronous networking are simply unnecessary; cases where it'd be nice to simply use the built in server libraries. However, Sam would rather write everything from scratch--hacking at the innermost bits of Python--that put up with synchronous networking APIs or having to write Twisted code. By the way, I'm a convert to Sam's religion. I think Twisted is awful ;)

Guido van Rossum has an interesting hangup. He built restrictions into the very syntax of Python in order to force programmers to write more readable code. Did you ever edit a piece of C where the indentation didn't match the braces? Guido's response was to write a language where it's not possible to indent the code in a way that conflicts with the meaning of the code.

Similarly, in many functional languages, you can write code like
a = if winner:
1
else:
2
The "if" statement is itself an expression that returns a value. You can do this in all of the functional languages and in Ruby too. However, Guido feels that the following is more readable:
if winner:
a = 1
else:
a = 2
Guido's response: Python makes a distinction between expressions and statements that prevents you from writing the code in a way that Guido feels is less readable.

Well, since it's my blog, I can confess to one more over-the-top hangup. I love style guides. I think following style guides improves code readability. In fact, I have the style guides for C (BSD), Java, Python, and Perl mostly memorized. Sounds like a good idea, right? The problem is that I'm so obsessed with style guides that I have a hard time reading code that doesn't follow the style guide. During code reviews, if a chunk of code doesn't follow the smallest requirement in the Python style guide, I'm so distracted that I can barely focus at all on what the code does or whether it has any bugs.

It's interesting to see how smart, interesting people can often take reasonable ideas and take them to their logical conclusions in such a way that it dominates their lives. Sometimes, it's in a way that is not only disproportional to the subject at hand, but is even occasionally harmful to them overall.

Thursday, August 23, 2007

What's Going on with my Wireless?

Yesterday, my Dish Networks satellite went out and took 20 minutes to come back.

Also yesterday, my wife's bluetooth headset kept disassociating itself from her phone. She had to reset it several times.

Last night, my access point stopped working. From the access point, I could ping my DNS server. From my laptop, I could establish a wired and/or wireless connection to the access point, but from my laptop, I could not ping my DNS server. The same was true of my wife's laptop. I hadn't changed anything on my AP in months. Finally, I gave up, restored it to factory defaults, and set it up again from scratch. Now it works.

What's going on? Was there a solar flare I didn't hear about? Might it be that my 100 year old house is not providing "clean" electricity? Any ideas? Weird.

Monday, August 20, 2007

Computer Science: Popular Languages Never Die

It's interesting to me that while a modern Web application seems to have a shelf life of two years, popular programming languages never die. This isn't news, but I thought I'd just point out a few:
FORTRAN
FORTRAN is still a favorite among scientists.
COBOL
COBOL is still alive and well in ERPs and banking systems.
C
C isn't dead by a long shot. Kernels (e.g. Linux) and interpreters (e.g. Python) are still written in C.
Lisp
Even though Lisp was first written about 40 years ago, Lisp is still used at various companies like Orbitz, and rest assured that as long as Paul Graham lives, he'll never stop talking about it ;)
APL
APL seems dead, but it's not. Every once in a while, I'll meet a strange hacker who can translate a long algorithm into a single magical incantation of funny symbols in APL.
Forth
Forth is alive and well at the firmware level.
Pascal
Pascal's not dead. It's still being taught as a first programming language.
Ada
Ada is still being used by the military.
JavaScript
You might ask why I bring up JavaScript since it's clearly everywhere. I'm sure that if its designer knew that it was going to be the single most widespread programming language interpreter on the planet, it might have gotten a bit more of a design review ;) JavaScript is wonderful and horrible at the same time, and I highly doubt it will die within my lifetime, even though it'd be wonderful if we could replace it.
Prolog
I recently found out that Prolog is still being used in various natural language processing contexts.
Sed and Awk
Many hackers still use Sed and Awk in shell scripts when the complexity of a larger language like Perl isn't justified.
BASIC
It makes me sad, but there are still kids who learn programming by way of BASIC.
Smalltalk
Smalltalk is still alive and well in projects like Squeak and Seaside.
Assembly
One might think that the only reason to code in assembly is to write the backend for a compiler or to write boot code for an operating system, but assembly is still used anytime your resources are scarce, and you need to code close to the machine.
So the next time someone says that Java is dead, know that he's dead wrong. Java will be around at least as long as COBOL.

Saturday, August 18, 2007

Ruby: All Your Method are Belong to Me

Ruby has a curious approach to protecting instance variables, constants, and private methods.

I've often heard Java programmers criticize Python because it doesn't enforce privacy in any way. Personally, I think that it'd be great if Python could be fully sandboxed like JavaScript can, but sandboxing is a completely separate topic. Preventing a programmer who works on my team from calling a method that I've named _private_method isn't all that interesting to me. If he sees the fact that I've named the method with a leading underscore, and he still feels the need to call it, so be it.

Ruby does provide private instance variables, constants, and private methods, but really, those are just suggestions.

For instance, if you override a constant, you just get a warning:
irb(main):001:0> A = 1
=> 1
irb(main):002:0> A = 2
(irb):2: warning: already initialized constant A
=> 2
irb(main):003:0> puts A
2
=> nil
If you have an object, and you want to call a private method, you can just inject a method into that object in order to get access to the private method:
class SuperSecret
private
def secret
puts "Wombats!"
end
end

obj = SuperSecret.new
begin
puts obj.secret
rescue
puts "Yep, it blocked me properly." # Yep, it gets blocked.
end

def obj.hack_the_secret
secret
end

obj.hack_the_secret # Prints "Wombats!"
You can use the same "inject a method" trick to get access to instance variables:
def obj.get_a
@a
end
In no way am I criticizing Ruby for this behavior. As I said, I think it's a bad situation if you can't trust your team members. I just wanted to point out that in Ruby, the protection mechanisms are really just suggestions ;)

Thursday, August 16, 2007

Treo 650 on Ubuntu 7.04 (Feisty Fawn)

I got my Treo 650 working under Ubuntu 7.04. I think some stuff is broken, because this is harder than it should be.

Create /etc/udev/rules.d/10-local.rules with:
BUS=="usb", SYSFS{serial}=="PalmSN12345678", KERNEL=="ttyUSB[13579]*", SYMLINK="treo"
Then do sudo /etc/init.d/udev restart

Add visor to the end of /etc/modules.

Run sudo modprobe visor

Setup JPilot. The device should be /dev/treo. The speed should be 57600. Yes, I know this shouldn't matter for USB devices, but it won't work if you don't set this.

Remember to hit the hardware sync button and then the JPilot sync button.

Here are some random tips:

Pay attention to the logs: sudo tail -f /var/log/messages

See what /dev/treo is being set to: ls -l /dev/treo

Make sure your user is a member of the dialout group. Mine was by default.

Operating Systems: OpenDarwin Shutting Down

I totally missed this: OpenDarwin is shutting down:
OpenDarwin has failed to achieve its goals in 4 years of operation, and moves further from achieving these goals as time goes on...The original notions of developing the Mac OS X and Darwin sources has not panned out. Availability of sources, interaction with Apple representatives, difficulty building and tracking sources, and a lack of interest from the community have all contributed to this.
I can't say I'm surprised. When it comes to playing fair in the open source world, I simply trust the Linux guys more than I trust Apple. Besides, Darwin isn't even the most interesting thing about OS X--Cocoa is. Tragically, it's closed source.

As you all know, I've been pondering operating systems lately. I just don't think people are going to tolerate Apple's walled garden / vendor lock-in forever. I don't get the sense that Vista is a huge success. Based on my attendance at Linux Expo for the last seven years in a row, Linux seems to be somewhat quiet these days, at least on the desktop side. It makes me wonder what's going to happen on the desktop.

Maybe the desktop is dead--killed by the Web and Google. Maybe the desktop is going to be reborn via the likes of Adobe Air. I sure hope not. I don't need any more proprietary systems from Adobe! Maybe the desktop and the world-wide Web are both less important these days now that Facebook is functioning as a social operating system.

I don't know, but I do think there is some room for innovation.

Saturday, August 11, 2007

Python: Coding in the Debugger for Beginners

Python has a wonderful interactive interpreter (i.e. shell). IPython is a third-party Python shell that's even nicer, but that's a topic for another post. It's fairly common to code in the shell until you have the code working correctly, and then copy-and-paste it into your program. Developing super-interactively is a great way to keep bugs at bay.

However, sometimes you need more setup before you can start coding. For instance, when writing a Web app in, say Pylons, you might need an actual request and a database connection before you can start coding what you want. You might even need a form POST before you can start. Ideally, you'd be able to start the shell from in the middle of your application at just the right spot. I'm pretty sure that someone out there knows how to get IPython to do the right thing, but I find using pdb, the Python debugger, really helpful for this purpose.

First of all, add the following wherever you want to break into the debugger, "import pdb; pdb.set_trace()". If you need some more help about the debugger itself, type "?". Now, you'll want to see what variables you have to play with, so use "dir()". Let's say the "request.environ" object has what you want, but you're not sure under what key. Well, start by printing it, "p request.environ". If you need a nicer looking version, try "from pprint import pprint; pprint(request.environ)". If you find the right key, say 'spam', then print that, "pprint(request.environ['spam'])". Maybe spam is an object, and you need to know what methods it has, "pprint(dir(request.environ['spam']))".

In the Python shell, you can use "help(x)" to find out more about x if x is a module, class, or function. If x is an object, try "help(x.__class__)". In the debugger, help() tends to not work for a reason that someone smarter than me probably understands. However, you can still do things like "p x.__class__.__doc__".

The debugger isn't very good for writing new functions or multi-line statements, but it's great for evaluating simple expressions. Once you have the code you want, you can copy-and-paste it back into your program and repeat the process. Writing code in this way is really helpful so that you can deal with the details one at a time without trying to debug a larger program that might have more than one bug. That's not to say I don't make heavy use of print-statement debugging, but using pdb is great if you want to "look around."

Friday, August 10, 2007

Facebook: HTTP GET

On the Internet, no one knows you're a dog.

On Facebook, no one knows you're a GET ;)

Tuesday, August 07, 2007

Python: Database Migrations

As part of my day job, I've written a Rails-style database migration script. This lets you write migrations from one version of a schema to the next. This allows you to develop schemas iteratively. It also lets you upgrade or downgrade the schema. Best of all, if an attempted upgrade fails, it can back it out even if you're not using transactions. Of course, this is based on writing "up" and "down" routines--it's practical, not magical.

I'm releasing this code in the hope that others will find it useful. It's well-written, solid, and well-tested. This is the type of thing you could probably write in a day. I took four, and polished the heck out of it.

It uses SQLAlchemy to talk to the database. However, that doesn't mean you have to use SQLAlchemy. Personally, I like writing table create statements by hand. You can do either.

My database configuration is stored in a .ini file ala Paste / Pylons. Hence, the script takes a .ini file to retrieve the database configuration. If you don't use Pylons, but you still want to use my script, that's an easy change to make. Migrations are stored in Python modules named like "${yourpackage}.db.migration_${number}.py". Again, I use Pylons to figure out what "${yourpackage}" is, but that's easy enough to change.

The name of my Pylons app is "multicosmic", and the script is installed in my application. You'll need to change the name to match your app.

Start by creating directories and __init__.py files for "multicosmic/db" and "multicosmic/scripts".

First, there's a migrate script in "multicosmic/scripts/migrate.py":
#!/usr/bin/env python

"""This is a script to apply database migrations.

Run this script with the -h flag to get usage information.

Migration Modules
-----------------

Each migration is a module stored in
``${appname}/db/migration_${revision}.py`` where revision starts at 000
(i.e. an empty database). Each such module should have a module-level
global named migration containing a list of pairs of atoms. For
instance::

migration = [
# (up, down)
("CREATE TABLE A(...)", "DROP TABLE IF EXISTS A"),
("CREATE TABLE B(...)", "DROP TABLE IF EXISTS B")
]

The up and down atoms may either be SQL strings, or they may be
functions that accept a SQLAlchemy connection.

Since I'm using SQLAlchemy, you might wonder why I'm writing actual SQL.
I like to use the SQLAlchemy ORM. However, when creating tables in
MySQL, there are so many fancy options that I find it easier to write
the SQL by hand.

Error Handling
--------------

* If something goes wrong when down migrating, just let the exception
propagate.

* If something goes wrong when up migrating, complain, try to back it
out, and then let the exception propagate. If backing it out fails,
just let that exception propagate.

* Use transactions as appropriate. There are a lot of cases in
MySQL where transactions aren't supported. Hence, backing things
out is sometimes necessary. However, it's also possible that a
transaction might rollback, and then the code to back things out
runs anyway. It's best to make your down atoms idempotent. For
instance, use "DROP TABLE IF EXISTS" rather than just "DROP
TABLE".

Avoiding SQLAlchemy, Pylons, Paste, and Python 2.5
--------------------------------------------------

I'm using SQLAlchemy, but that doesn't force you to use SQLAlchemy in
the rest of your app. I'm using Paste's configuration mechanism because
that's how my database configuration information is stored. Passing a
CONFIG.ini to the script meets the needs of Paste and Pylons users
perfectly. If you're not one of those users and you want to use my
script, it's easy to subclass it and do something differently.
Similarly, if you're not using Python 2.5, I'm happy to remove the
Python 2.5-isms. Let's talk!

"""

# Copyright: "Shannon -jj Behrens <jjinux@gmail.com>"
# License: I am contributing this code to the Pylons project under the same license as Pylons.

from __future__ import with_statement

from contextlib import contextmanager, closing
from glob import glob
from optparse import OptionParser
import os
import re
import sys
import traceback

from paste.deploy import loadapp
from pylons import config as conf
from pylons.database import create_engine

__docformat__ = "restructuredtext"


class Migrate:

"""This is the main class that runs the migrations."""

def __init__(self, args=None):
"""Set everything up, but don't run the migrations.

args
This defaults to ``sys.argv[1:]``.

"""
self.setup_option_parser(args)

def setup_option_parser(self, args):
"""Parse command line arguments."""
self.args = args
usage = "usage: %prog [options] CONFIG.ini"
self.parser = OptionParser(usage=usage)
self.parser.add_option('-r', '--revision', type='int',
help='schema revision; defaults to most current')
self.parser.add_option('-p', '--print-revision', action="store_true",
default=False,
help='print current revision and exit')
self.parser.add_option("-v", "--verbose", action="store_true",
default=False)
(self.options, self.args) = self.parser.parse_args(self.args)
if len(self.args) != 1:
self.parser.error("Expected exactly one argument for CONFIG.ini")

def run(self):
"""Run the migrations.

All database activity starts from here.

"""
self.load_configuration()
self.engine = create_engine()
self.engine.echo = bool(self.options.verbose)
with closing(self.engine.connect()) as self.connection:
self.find_migration_modules()
self.find_desired_revision()
self.find_current_revision()
if self.options.print_revision:
print self.current_revision
return
self.find_desired_migrations()
self.print_overview()
for migration in self.desired_migrations:
self.apply_migration(migration)

def load_configuration(self):
"""Load the configuration."""
try:
loadapp('config:%s' % self.args[0], relative_to=os.getcwd())
except OSError, e:
self.parser.error(str(e))
dburi = conf.get('sqlalchemy.dburi')
if not dburi:
self.parser.error("%s: No sqlalchemy.dburi found" % self.args[0])

def find_migration_modules(self):
"""Figure out what migrations exist.

They should start at 000.

"""
package = conf['pylons.package']
module = __import__(package + '.db', fromlist=['db'])
dirname = os.path.dirname(module.__file__)
glob_pattern = os.path.join(dirname, 'migration_*.py')
files = glob(glob_pattern)
files.sort()
basenames = map(os.path.basename, files)
for (i, name) in enumerate(basenames):
expected = 'migration_%03d.py' % i
if name != expected:
raise ValueError("Expected %s, got %s" % (expected, name))
self.migration_modules = []
for name in basenames:
name = name[:-len('.py')]
module = __import__('%s.db.%s' % (package, name),
fromlist=[name])
self.migration_modules.append(module)

def find_desired_revision(self):
"""Find the target revision."""
len_migration_modules = len(self.migration_modules)
if self.options.revision is None:
self.desired_revision = len_migration_modules - 1
else:
self.desired_revision = self.options.revision
if (self.desired_revision < 0 or
self.desired_revision >= len_migration_modules):
self.parser.error(
"Revision argument out of range [0, %s]" %
(len_migration_modules - 1))

def find_current_revision(self):
"""Figure out what revision we're currently at."""
if self.connection.execute(
"SHOW TABLES LIKE 'revision'").rowcount == 0:
self.current_revision = 0
else:
result = self.connection.execute(
"SELECT revision_id FROM revision")
self.current_revision = int(result.fetchone()[0])

def find_desired_migrations(self):
"""Figure out which migrations need to be applied."""
self.find_migration_range()
self.desired_migrations = [
self.migration_modules[i]
for i in self.migration_range
]

def find_migration_range(self):

"""Figure out the range of the migrations that need to be applied."""

if self.current_revision <= self.desired_revision:

# Don't reapply the current revision. Do apply the
# desired revision.

self.step = 1
self.migration_range = range(self.current_revision + self.step,
self.desired_revision + self.step)
else:

# Unapply the current revision. Don't unapply the
# desired revision.

self.step = -1
self.migration_range = range(self.current_revision,
self.desired_revision, self.step)

def print_overview(self):
"""If verbose, tell the user what's going on."""
if self.options.verbose:
print "Current revision:", self.current_revision
print "Desired revision:", self.desired_revision
print "Direction:", ("up" if self.step == 1 else "down")
print "Migrations to be applied:", self.migration_range

def apply_migration(self, migration):
"""Apply the given migration list.

migration
This is a migration module.

"""
name = migration.__name__
revision = self.parse_revision(name)
if self.options.verbose:
print "Applying migration:", name
if self.step == -1:
with self.manage_transaction():
for (up, down) in reversed(migration.migration):
self.apply_atom(down)
self.record_revision(revision - 1)
else:
undo_atoms = []
try:
with self.manage_transaction():
for (up, down) in migration.migration:
self.apply_atom(up)
undo_atoms.append(down)
self.record_revision(revision)
except Exception, e:
print >> sys.stderr, "An exception occurred:"
traceback.print_exc()
print >> sys.stderr, "Trying to back out migration:", name
with self.manage_transaction():
for down in reversed(undo_atoms):
self.apply_atom(down)
print >> sys.stderr, "Backed out migration:", name
print >> sys.stderr, "Re-raising original exception."
raise

def apply_atom(self, atom):
"""Apply the given atom. Let exceptions propagate."""
if isinstance(atom, basestring):
self.connection.execute(atom)
else:
atom(self.connection)

def parse_revision(self, s):
"""Given a string, return the revision number embedded in it.

Raise a ValueError on failure.

"""
match = re.search('(\d+)', s)
if match is None:
raise ValueError("Couldn't find a revision in: %s" % s)
return int(match.group(0))

def record_revision(self, revision):
"""Record the given revision.

The current revision is stored in a table named revision.
There's nothing to do if revision is 0.

"""
if revision != 0:
self.connection.execute("UPDATE revision SET revision_id = %s",
revision)
self.current_revision = revision

@contextmanager
def manage_transaction(self):
"""Manage a database transaction.

Usage::

with self.manage_transaction():
...

"""
transaction = self.connection.begin()
try:
yield
transaction.commit()
except:
transaction.rollback()
raise


if __name__ == '__main__':
Migrate().run()
It comes with two migrations.

multicosmic/db/migration_000.py:
"""This is the first migration.

It doesn't really do anything; it represents an empty database. It
makes sense that a database at revision 0 should be empty.

"""

__docformat__ = "restructuredtext"


migration = []
multicosmic/db/migration_001.py
"""Create the revision table with a revision_id column."""

__docformat__ = "restructuredtext"


# I'm using a creative whitespace style that makes it readable both here
# and when printed.

migration = [
("""\
CREATE TABLE revision (
revision_id INT NOT NULL
) ENGINE = INNODB""",
"""\
DROP TABLE IF EXISTS revision"""),

# Subsequent migrations don't need to manage this value. The
# migrate.py script will take care of it.

("""\
INSERT INTO revision (revision_id) VALUES (1)""",
"""\
DELETE FROM revision""")
]
Last of all, there are test cases in multicosmic/tests/functional/test_migrate_script.py:
"""Test that the migrate script works."""

# Copyright: "Shannon -jj Behrens <jjinux@gmail.com>"
# License: I am contributing this code to the Pylons project under the same license as Pylons.

from cStringIO import StringIO
import sys

from nose.tools import assert_raises
from sqlalchemy.exceptions import SQLError

from multicosmic.scripts.migrate import Migrate
from multicosmic.db.migration_001 import migration as migration_001

__docformat__ = "restructuredtext"

BASE_ARGS = ['-v', 'test.ini']


def setup_module():
_do_migration(0)


def teardown_module():
_do_migration()


def test_setup_option_parser():
migrate = Migrate(['-r1'] + BASE_ARGS)
assert migrate.options.revision == 1
assert migrate.options.verbose


def test_bad_up_migration():
orig_stderr = sys.stderr
fake_stderr = StringIO()
migration_001.append(("INSERT INTO garbage", "DELETE FROM garbage"))
sys.stderr = fake_stderr
try:
migrate = Migrate(['-r1'] + BASE_ARGS)
assert_raises(SQLError, migrate.run)
finally:
sys.stderr = orig_stderr
migration_001.pop()
assert fake_stderr.getvalue()
migrate = Migrate(['-p'] + BASE_ARGS)
migrate.run()
assert migrate.current_revision == 0


def test_bad_down_migration():
_do_migration(1)
migration_001.append(("INSERT INTO garbage", "DELETE FROM garbage"))
try:
migrate = Migrate(['-r0'] + BASE_ARGS)
assert_raises(SQLError, migrate.run)
finally:
migration_001.pop()
migrate = Migrate(['-p'] + BASE_ARGS)
migrate.run()
assert migrate.current_revision == 1


def _do_migration(revision=None):
"""Construct and run the Migrate class. Return it."""
args = BASE_ARGS
if revision is not None:
args = ['-r%s' % revision] + BASE_ARGS
migrate = Migrate(args)
migrate.run()
if revision is None:
assert migrate.current_revision > 0
else:
assert migrate.current_revision == revision
return migrate
I use nose for my tests. You can find out more about using nose with Pylons, including things to watch out for, here.

If this code works out for you, leave me a comment :)

Monday, August 06, 2007

Vim: VimOutliner


I make heavy use of a nicely indented notes file and a TODO file. Until recently, I had never used an outline editor, even though my files were basically outlines. I saw my buddy, Alex Jacobson, using his outline editor, and I decided to try out the one for Vim. Within a couple hours, I was hooked!

Actually, there are several outline plugins for Vim, but I think that VimOutliner is the best.
  • It has nice syntax highlighting for the different levels.
  • It manages Vim's folding as you would expect.
  • It understands how to put a paragraph of text under a heading and how to automatically turn on line wrapping.
  • It supports checkboxes, and it's really smart about working with them.
  • It supports inter-document linking.
  • It has a nice menu, so you don't have to memorize the documentation before getting started.
Best of all, since it's a Vim plugin, it fits right in with my blazing-fast, Vim editing skills.

Friday, August 03, 2007

Pondering Operating Systems

For a long time, my goal has been to develop a higher-level, natively-compiled programming language, and then to develop a proof-of-concept kernel in it. Well, someone else beat me to the punch.

House is a proof of concept operating system written in Haskell. It has some simple graphics, a TCP/IP stack, etc. Naturally, it's just a research project, but achieving proof of concept was my goal too.

On that subject, I'm also keeping my eye on Microsoft's Singularity. It's a microkernel, and much of it is written in C#. Unlike most microkernels, the different components do not run in separate address spaces. The VM does protection in software, rather than hardware. I had been toying with this idea too, but my buddy Mike Cheponis informed me that VM/360 did it decades ago.

Is anyone other than me bummed that BeOS never took off? I'm sadly coming to the conclusion that Linux might not ever make it on the desktop. It's just not a priority. Too many great hackers use Linux on the server with Mac laptops. There's always hope that Haiku might recreate BeOS in a completely open source way, but it would have been a lot easier if Be had simply open sourced it in the first place.

In the meantime, SkyOS thinks that there is room for another easy-to-use, featureful, proprietary OS. Apple succeeded at this. BeOS failed. It's hard for me to get excited about a new proprietary OS. I'd sooner buy a Mac (although I still haven't been fully de-Stallmanized).

Speaking of which, has anyone else noticed how few non-UNIXy, open-source operating systems there are? Maybe it's true that Systems Software Research is Irrelevant.

Well, now that House has shown that you can write a kernel in Haskell, I think I need a new goal. Maybe I'll go solve world hunger. I've heard there's a little squabble going on in the Middle East that could use some attention. Maybe I'll go write an entire operating system in modern assembly; oh wait, it's already been done ;)

Tuesday, July 17, 2007

Python: Look What the Stork Dragged In

Well, this is supposed to be "a purely technical blog concerning topics such as Python, etc.", so let me start by showing off a quick little Python utility that I had to write at a moments notice:
#!/usr/bin/env python

"""Help Gina-Marie time her contractions."""

import time

SECS_PER_MIN = 60


last_start = None
while True:
print "Press enter when the contraction starts.",
raw_input()
start = time.time()
if last_start:
print "It's been %s minutes %s seconds since last contraction." \
% divmod(int(start - last_start), SECS_PER_MIN)
last_start = start
print "Press enter when the contraction stops.",
raw_input()
stop = time.time()
print "Contraction lasted %s seconds." % int(stop - start)
print
If you want to find out more, read the comments ;)

Thursday, July 05, 2007

Humor: Haskell vs. Lisp

Haskell is such a purely functional programming language, it makes Lisp feel imperative in comparison!

Friday, June 29, 2007

Computer Science: Coping with Unknown Types

What do "void *" (a la C), polymorphism (a la C++ classes), interfaces (a la Java), generics (a la C++ templates), and duck typing (a la Python) all have to do with one another? They're all ways in which you can write code that works with types that you didn't envision when writing the code.

A "void *" in C is a pointer to something of unspecified type. You can't do very much with it unless you know what type the something is. However, you can still pass it around. You can store it in a list or tree. You can take it and later pass it back to a callback function. All of these things are useful, and, in fact, this functionality still exists in Java (albeit, it's a lot safer in Java). However, instead of casting to "void *", you cast to "Object".

Polymorphism in languages like C++ and Java let you take an object and call methods on it without necessarily knowing exactly which subclass the object is a member of. Let's suppose there is a class named Fruit with a method named peel, and let's suppose there are two subclasses named Apple and Orange. If you have a list of apples and oranges, you can loop over that list and call peel on each fruit. Even better, if someone later creates a Lemon class, and slips a few lemons into that list, your code will still know how to peel them.

However, what if you don't want to subclass Fruit? What happens if you have an object that knows how to peel itself as well as a ton of other things? Do you need to subclass multiple classes? An interface in Java (or a typeclass in Haskell) lets you say that your object knows how to peel itself, without requiring any specific subclassing. Instead, it can implement some Peelable interface, and that's close enough. Hence, instead of peeling a list of fruit, your code can now peel a list of objects that each implement the Peelable interface. Those objects might not be related at all, and they're free to implement all sorts of interfaces aside from just the Peelable interface.

Generics, which are called templates in Java and C++, let you write code and leave blanks in it that can be filled in later. Generics are an interesting subject, and the question really comes down to what kind of stuff can you leave blank?

Generics in Java are actually pretty weak. It use to be that if you wanted a list, you had to have a list of Objects (remember the "void *" trick?). You didn't know exactly what was in the list. These days, with Java templates, you can tell the compiler that you're creating a list, and that the list can only contain Apples. The list is a template, and you're "filling in the blank" with the type Apple. However, templates in Java are limited. For instance, you can't create a template that says "Create a new [blank]". (If I'm wrong, please leave a comment!)

C++ templates are more powerful. You can do all the same things that you can do in Java, but you can also do things like create a template that says "Create a new [blank]". The differences have to do with how the compiler implements templates. When you tell the compiler that you want to "fill in the blank" with an Apple, i.e. "Create a new Apple", that's called instantiating the template. By the way, this is something that happens at compile time. Now, let's suppose you have a template for lists, and you want a "list of Apple" and a "list of Orange". One way the compiler can implement this is to take the code for list and fill in all the blanks with Apple, then take the code for list and fill in all the blanks with Orange. You'd end up with two slightly different versions of the same code in the compiled binary. I don't know how modern C++ compilers do it, and feel free to call me ignorant, but it really makes me suspicious when I see how big C++ binaries are compared to C binaries ;)

Generics in functional programming languages like Haskell are even more impressive. If a function takes a fruit and then peels it, Haskell can automatically figure out that the function will work with any object that can be peeled. The impressive thing is that it can in many cases automatically infer this interface at compile time! You don't even have to tell the compiler that you're trying to write generic code. (Note, I'm handwaiving a little about when you do and don't need to use type classes.)

Duck typing (also known as latent typing) achieves the same goal, but does so using runtime checks. Hence, if you write a function that takes an object and peels it, you don't need to subclass anything or write an interface. However, at runtime, the interpreter will figure out if the object actually knows how to peel itself. On the one hand, you don't get as many compile-time safety checks, but on the other hand, it's really easy to understand. You can accept whatever objects you want, and call any methods you want, and if it doesn't actually work at runtime, you'll get a nice exception that you can respond to in a controlled manner. There's an old joke that says that C++ is like juggling chainsaws in full body armor, whereas Python is like juggling rubber chickens. Even better, you can do tricks like have the same object respond to any method. For instance, you can call any method on a proxy object, and it will just proxy that method call to the object it is acting as a proxy for. The same proxy object can proxy any object with any interface.

Ok, that was a pretty basic overview of a bunch of related language features. As I said, they're all ways in which you can write code that works with types that you didn't envision when writing the code. Now, take a minute and think about that problem and the many different ways to solve it. If you wrote a new language, how might you solve it differently? Leave me a comment below!

Wednesday, June 27, 2007

10 Reasons Big Projects Suck

Have you ever noticed that big projects inevitably get a bad rap? Here are 10 reasons why:

  1. Let's assume for a moment that there's one bug for every 100 lines of code. If a big project has 10 times as much code as a small project, it has 10 times as many bugs. In reality, because big projects are harder to understand and intrinsically harder to change quickly, it probably has more than 10 times as many bugs.

  2. If a big project implements some feature A, there is bound to be some bug in it. That proves that the big project is buggy. Furthermore, inevitably, the feature isn't exactly what you need. That means it's inflexible.

  3. If, on the other hand, the smaller project doesn't implement feature A, it can't possibly have the same bug the big project has. Hence, it's not buggy. Furthermore, since you'll need to implement feature A yourself, you'll probably implement exactly what you need. That means it's more flexible.

  4. Furthermore, there are a lot of people who don't even want feature A. That proves that the big project is bloated.

  5. If a developer is a member of a big project, he is probably already using it in production, and he doesn't much care what some young, know-it-all kid says about his code. Ever wonder why Microsoft doesn't seem to care when people criticize it? They're too busy making money!

  6. However, if a developer is a member of a small project, he can afford to make fun of the big project. No one knows who he is, so they surely can't insult his work. He has security by obscurity!

  7. Furthermore, since so many people have worked on the large project, he can insult it vehemently without feeling morally responsible for insulting another person's hard work. It's like a shoplifter who shoplifts small items from large stores thinking the large store is too big to care.

  8. Let's suppose 1 out of every 10 projects succeeds. 9 of those projects will make claims that turn out to be false. However, since they don't succeed, no one remembers. However, the 10th project will make claims that turn out to be true. It has instant credibility. Hence, it is free to make claims, and many people won't even bother to verify or question those claims...at least until it becomes a big project and people start realizing that 9 out of 10 of its claims are actually false.

  9. If you only need to implement 1 feature, you can do so in code that is very simple and direct. Now, if you need to implement 10 features, there is bound to be some duplication. Hence, you can either a) live with the duplication, or b) refactor. If you live with the duplication, your project will be plagued with bugs that need to be fixed in multiple places. (Don't repeat yourself!) However, if you refactor the code, you'll end up with code that is (necessarily) more complex than when you only needed to implement 1 feature. Younger coders may not even be able to understand the code at all. Hence, they'll just call it stupid, bloated, and overly complicated.

  10. If a project is successful, it'll make it into production. Furthermore, people will need new features in the product. In implementing those new features, it may be necessary to refactor. When you refactor, you may need to decide whether to a) keep the existing API, b) re-write the API, c) create a compatibility layer. If you keep the existing API, you'll have to somehow "tack on" the additional functionality within that API. This may result in a hideous, unintuitive API. If you rewrite the API, you'll break everyone's code. If you provide a compatibility layer, you'll end up with twice as many APIs you need to support. Hence, implementing new features is the fastest way to end up with legacy cruft!

If you know more reasons why big projects suck, post them below! :-D

Monday, June 25, 2007

Random Comments from Google Developer Day

I went to Google Developer Day. Yeah, yeah, I know, that was weeks ago, and I'm only finally blogging about it now. Better late than never! Here are some random, sparse comments:

Keynote

There were about 1500-5000 developers world wide attending this event. A ton of APIs were launched in 2006. He mentioned Yahoo Pipes. Google Mashup Editor is a mashup of mashups. I felt pretty overwhelmed pretty quickly. Gears is about offline access for Web apps. It supports all major browsers and all major platforms. It's pretty weird to see SQL in JavaScript. It's based on SQLObject. There is a managed "sync" process. Google Reader will soon work offline. They're working closely with Adobe (e.g. Apollo). It was weird to hear the Adobe guy say, "Works on Linux". Sergey has a great sense of humor.

Gears Talk

You can configure a set of URLs for it to capture for use offline. This stuff is stored in a place separate of the normal browser cache. I saw a bit of code, "rs.getFieldByName('name')". Ugh! Why must people force JavaScript to look like Java? Don't they know that they could do "rs.name"? They implemented a worker pool so you can run JavaScript in the background. These are processes, so they don't have shared state. You can pass code as strings between the processes. They're adding full text search to SQLite.

Google Infrastructure Talk

Google was still at Stanford in '97. In their current design for servers, they went back to not using cases for the servers. They're still using low end hardware. Note that GFS is not at the kernel level. They have 200+ clusters. MapReduce is not used for user search. It's more for heavy duty tasks like indexing. BigTable is pretty amazing. It's a distributed, multi-dimensional, sparse map. They have fine-grained load balancing and fast recovery. They have distributed locks and a locking service. Their largest [BigTable?] is 3000TB on several thousand machines. I asked, and he said that open sourcing GFS "isn't unthinkable".

Google Web Toolkit

Ajax lets the server be stateless. The Java IDE they're using works with Google Web Toolkit even though the Java is being compiled down to JavaScript. Even setting breakpoints works, although to do this they're using a "hosted Web browser". A big benefit of GWT is that the IDE's refactoring support can be used on code that is getting compiled to JavaScript. All the compiling is done behind the scenes, so it feels more like editing a scripting language than editing a compiled language. In general, they prefer functionality over "bling". Hence, they prefer native UI elements rather than recreating all elements from scratch. GWT does the right thing with browser history. They use property files for I18N. They can catch errors in the property files as compile time. They do have nice advanced widgets. Font size changes are handled gracefully. The functional demos were pretty impressive. GWT takes care of managing the image cache really nicely. The compiler only puts in the JS libraries you actually need. Cute quote: "Even though it's open source, we decided to document it." If you're using GWT, you don't need to be an expert in browser quirks, you just need to know Java. GWT supports inline JavaScript.

Alex Martelli's Design Patterns in Python Talk

This is the third time I've seen this talk, and this time I was able to understand everything he said ;)

Theorizing the Data: Avoiding the Capital Mistake

This was a great talk about statistical approaches to linguistics. Probability stats papers were really big at the ACM in 2006. Everyone is fighting the spam problem. The speaker emphasized that more data results in better results, which is why he went to Google. Lots of data results in good machine learning which results in more useable language translations. In trying to do automated translations, nothing matter more than statistics. Getting hints from linguists wasn't all that helpful when they tried it. It would appear that humans may learn language by having a statistical understanding of patterns; after all, there are too many rules with too many exceptions.

Tuesday, June 19, 2007

Computer Science: Smart Code Reloading

How do you reload code at a per-module level? How do you deal with the data that the module might contain?

Reloading code on the fly is something that the original Lisp machines were famous for. Erlang/OTP is famous for this too. In my own project, Aquarium, which is a Python Web application framework, I use to do this trick as well.

In Python, reloading code is relatively easy (with a bunch of caveats having to do with import "graphs" and inheritance hierarchies). However, what do you do with the data? When you reload the module, the old data in that module is lost.

I've always wondered how the Lisp guys did it. How did they cope with changes in the data format? If you have a list of tuples of length 3, what happens if the new code expects a list of tuples of length 4?

In Rails land, they have database migration scripts. Hence, you specify the entire schema as an iterative set of changes to the database, starting from an empty database. You can also back out a migration if things don't work out.

I'm going to make a hypothesis. I suspect Erlang/OTP already does it this way using Mnesia, their in-process, distributed database. First of all, you don't keep any state at the module level. In true functional style, data is on the "stack" (although how the language is implemented is something else). Data that needs to survive a module reload is stored in an in-process "database". Note, I'm using the term "database" loosely, and I'm definitely not talking about SQL. To change the data format of the data stored in the "database", you write a migration. Hence, when you reload a module, you get new code, and you migrate the old data.

(Thanks go to Alex Jacobson and Mike Cheponis for many stimulating discussions.)

Monday, June 18, 2007

Linux: Ubuntu 7.04 on a Compaq Presario C500

I got Ubuntu 7.04 working on a new Compaq Presario C500 laptop. It's running really well, and it only cost me $479 :-D

The Ubuntu installer voluntarily resized sda1 (the primary partition) to 41595mb. I'm super impressed that it knows how to resize an NTFS partition! Hence, dual-booting Ubuntu and Windows Vista was really easy. By the way, I left sda2 alone. It's 5946mb, and it contains the Compaq restore image.

By the way, does anyone else feel that Compaq computers running Windows are simply an ad delivery mechanism? The default 512mb is scarcely usable in Vista. Fortunately, it's just fine under Ubuntu.

I setup wireless using ndiswrapper.

I kept hitting the touchpad with my thumb, which was messing me up when I was typing. Hence, I did:
  • apt-get install gsynaptics
  • Set "SHMConfig" to "true" in the touchpad section of /etc/X11/xorg.conf.
  • I restarted X.
  • I ran the touchpad preferences utility at System :: Preferences :: Touchpad. I disabled tapping.
  • The next time I logged in, the sensitivity was turned down all the way. Hence, I had to use the touchpad preferences utility to turn it up again.
To get the screen to work at the right resolution, I ran "apt-get install 915resolution" and rebooted.

I've noticed that suspend does not work, but hibernate does.

Anyway, I'm happy :-D

Thursday, June 14, 2007

Ruby: I'm on DZone Again!

Wahoo! I ended up on DZone again! This time, it was for Ruby: A Python Programmer's Perspective. I'm pretty excited that I've made it onto DZone using multiple different languages! :-D

Wednesday, June 13, 2007

Ruby: A Python Programmer's Perspective

As a "language lawyer", it's fun to learn new languages and see how they differ in subtle ways. Here are some of the many ways Ruby is different from Python, etc. Most of these aren't necessarily good or bad, they're just different. Looking at the differences, it's fun to try to peek into the design decisions behind the languages. If you've noticed more interesting differences, post them below as comments!

In Ruby, Classes, modules, and constants must begin with an upper case letter. Actually, this reminds me of Haskell.

Ruby uses "end" instead of indentation. That's fine unless you're a Python programmer like me who keeps forgetting to type "end" ;)

Ruby doesn't have true keyword arguments like Python. Instead, if you pass ":symbol => value" pairs to a function, they get put into a single hash. Python can act like Ruby using the "**kargs" syntax, but Ruby cannot act like Python; it cannot explicitly declare which keyword arguments are acceptable in the function signature.

Ruby is like Perl in that the last value of a function is its implicit return value. In Python, if there isn't an explicit return statement, the return value is implicitly None.

Ruby does not make a distinction between expressions and statements like Python does. Hence, you can do:

  a = if 5 > 6
7
else
puts "hi"
end
This is like Scheme and ML.

Ruby is much more clever than Python at figuring out how to translate end of lines into statements. For instance, the following works in Ruby, but not in Python.

  a = 2 +
2
Python would require parenthesis.

I'm still trying to figure out the proper style for when you should use parenthesis in function calls and when you should leave them out in Ruby. The distinction is idiomatic.

Ruby's string interpolation syntax is '"foo #{2 + 2}"'. Python uses '"foo %s" % (2 + 2,)'.

The syntax for declaring a class method (what Java calls a static method) is strange, since Python uses "self" for something very different:

  def self.my_class_method
puts "hi"
end
I must admit that "@a" is easier to type than Python's "self.a" without any significant loss in readability.

Single quoted strings in Ruby are like single quoted strings in Perl or like raw strings in Python. They get less interpretation.

Instance variables are never directly accessible outside the class in Ruby, unlike in Python or even Java.

In Python, you may use a publically accessible member on day one and change it to a property on day two behind everyone else's back. In Ruby, you use an attribute on day one. Fortunately, the syntax is very convenient, "attr_accessor :name". This is much more succinct that explicit getters and setters in Java.

Ruby has protected and private members, which Python and Perl purposely chose to leave out. A private member in Ruby is even more private that a private member in Java. If you have two instances of the same class, in Java one instance can access the other instance's private members, but that's not true in Ruby.

Ruby uses modules as mixins instead of using multiple inheritance to support mixins.

Ruby embraces what Python calls "monkey patching".

Python programmers generally try to avoid using eval, but I don't think that's the case in Ruby.

Ruby uses to "items << item" to append to a list. Python uses "items.append(item). PHP uses "items[] = item". This is one place where every language does it differently.

Ruby has "%w { foo bar bat }", which is like Perl's "q( foo bar bat )". Python doesn't have a construct for this, but you can use "'foo bar bat'.split()".

Ruby makes heavy use of symbols, like Scheme and Erlang (which calls them atoms). Python doesn't have symbols, so strings are used instead.

Ruby uses "elsif", whereas Python uses "elif".

Ruby doesn't need a colon in control structures, but it does require an end of line or a semicolon. Hence, to do a one liner in Python, it's:

  if 11 > 10: print "yep"
whereas in Ruby it's:
  if 11 > 10; puts "yep"; end
As everyone knows, Ruby supports blocks. Personally the use of "|a, b|" to denote arguments to the block seems really strange to me. Who uses's pipes for arguments? They don't even pair like parenthesis do!

In Python, there's a much stronger emphasis on passing functions. I'm sure that it's possible in Ruby, but it's more natural to pass a block instead.

Ruby has a very different syntax for exceptions handling than most languages:

  begin
a = some_func
rescue FuncFailed
puts "I'm hosed!"
end
When you unmarshal an object in Ruby, all the class definitions have to be loaded already. Python will import them for you, assuming they can be imported.

Ruby allows function names like "empty!" and "empty?" which is clearly a matter of taste, but I like it. This is probably inspired by Scheme.

For some reason, it seems like using "help(String)" in Ruby is pretty slow, whereas using "help(str)" is pretty fast. I wonder if Ruby doesn't have the docstrings attached to the object at runtime like Python does. In Python, this stuff is always loaded unless you use "-00" for optimization.

The rest of these comments are inspired by: http://books.rubyveil.com/books/ThingsNewcomersShouldKnow

What Ruby calls "'foo'[0]", Python calls "ord('foo'[0])". What Python calls "'foo'[0]", Ruby calls "'foo'[0,1]" or "'foo'[0].chr".

In Ruby, a failed lookup in a hash returns a default value, which is usually nil. You can set a different default if you want. Python will raise an exception if you try to access a key that doesn't exist. However, in Python 2.5, you can now set a default value for the dict. Now I know where they got that idea from ;)

In Ruby, you say "(hash[key] ||= []) << value", whereas in Python, you say "hash.setdefault(key, []).append(value)."

In Python, it's "len(obj)" (i.e. len is a generic function). In Ruby, it's "obj.length" (i.e. polymorphism is used). This difference seems to happen a lot.

In Ruby, strings are mutable. Hence, "s.upcase" returns a upcase version of s, whereas "s.upcase!" actually modifies s. Python strings are immutable.

Ruby doesn't have tuples (i.e. immutable arrays).

Because Ruby doesn't have as strong a notion of immutable objects as Python does. For instance, you may use mutable objects as hash keys in Ruby. Python forbids this. If you do change the value of a key in Ruby, you may want to let the hash recalculate the hash values for all the keys via "my_hash.rehash".

Ruby will let you assign a new value to a variable defined outside your scope:

  i = 0
(0..2).each do |i|
puts "inside block: i = #{i}"
end
puts "outside block: i = #{i}" # -> 'outside block: i = 2'
This was not previously possible in Python without using a workaround. However, Python is gaining this feature with the use of a keyword similar to the "global" keyword.

Coming at it from a different angle, in Java, you use explicit variable declarations to assign scope. This is true too in Perl when you use "my". In Python, an assignment automatically sets scope. Hence, you can shadow variables in Java, Python, and Perl. Ruby tries really hard to avoid shadowing. Hence, whoever assigns to the variable first sets the scope. Shadowing can still happen in rare cases (see here), but it's a lot less likely.

Ruby assignments are expressions. Hence, you can do:

  while line = gets
puts line
end
Python purposely left this out because it's too easy to confuse "=" and "==". Hence, in Python you would write:
  for line in sys.stdin:
print line
Ruby has two sets of logical operators. They have different precedences. Hence, "a = b && c" means "a = (b && c)", whereas "a = b and c" means "(a = b) and c". I'm going to agree with Guido on this one and say this is just too confusing.

Ruby has an === operator and case statements. This feature is a lot closer to the match feature in ML languages than anything in Python:

  case my_var
when MyClass
puts "my_var is an instance of MyClass"
when /foo/
puts "my_var matches the regex"
end
This is really neat. Notice that, thankfully, Ruby doesn't require "break" like all the C inspired languages.

In Ruby, only false and nil are considered as false in a Boolean expression. In particular, 0 (zero), "" or '' (empty string), [] (empty array), and {} (empty hash) are all considered as true.

In Python, 0, "", '', [], and {} are all considered False. In general, Python has a very rich notion of "truthiness", and you can define the "truthiness" of your own objects.

In Ruby, if s is a string, you may write "s += 'a'" or "s << 'a'". The first creates a new object. The second modifies s. If you modify s, that may "surprise" other pieces of code that also have a reference to s. Python strings are simply immutable so this can't happen.

Ok, that's all for now! In my opinion, they're both great languages. If you know about more fun differences, post them below!