Tuesday, August 05, 2008

Python: Memory Conservation Tip: Temporary dbms

A dbm is an on disk hash mapping from strings to strings. The shelve module is a simple wrapper around the anydbm module that takes care of pickling the values. It's nice because it mimics the dict API so well. It's simple and useful. However, one thing that isn't so simple is trying to use a temporary file for the dbm.

The problem is that shelve uses anydb which uses whichdb. When you create a temporary file securely, it hands you an open file handle. There's no secure way to get a temporary file that isn't opened yet. Since the file already exists, whichdb tries to figure out what format it uses. Since it doesn't contain anything yet, you get a big explosion.

The solution is to use a temporary directory. The next question is, how do you make sure that temporary directory gets cleaned up without reams of code? Well, just like with temporary files, you can delete the temporary directory even if your code still has an open file handle referencing a file in the temporary directory. Don't ya just love UNIX ;)

Here's some code:
import os
import shelve
import shutil
from tempfile import mkdtemp

tmpd = mkdtemp('', 'myprogram-')
filename = os.path.join(tmpd, 'mydbm')
dbm = shelve.open(filename, flag='n')
# I can continue to use dbm for as long as I'd like.
On my system, the shelve module ends up using the dbm module which creates two files. Furthermore, my tests end up exercising this code in four different places. Despite all of that, since the tmpd is removed immediately, no matter how fast I type ls -l, I never even see the directory ;)


Anonymous said...

Nice trick, but it won't work in windows.

Shannon -jj Behrens said...

What is this "windows" thing you refer to? ;)