Skip to main content

Posts

Showing posts from October, 2008

Ruby: An Interesting Block Pattern

Ruby has blocks, which enable all sorts of interesting idioms. I'm going to show one that will be familiar to Rails enthusiasts, but was new to me.

I was reading some code in a book, and it had the following:def if_found(obj)
if obj
yield
else
render :text => "Not found.", :status => "404 Not Found"
false
end
endHere's how you call it:if_found(obj) do
# We have a valid obj. Render something with it.
endThe code in the block will only execute if the obj was found. If it wasn't found, the response will already have been taken care of.

I've been in the same situation in Python (using Pylons), and I coded something like:def handle_not_found(obj):
if not obj:
return render_404_page()
return NoneHere's how you call it:response = handle_not_found(obj)
if response:
return response
# Otherwise, continue normally.Pylons likes to return responses, whereas render in Ruby works as a side effect whose return value isn…

Python: Some Notes on lxml

I wrote a webcrawler that uses lxml, XPath, and Beautiful Soup to easily pull data from a set of poorly formatted Web pages. In summary, it works, and I'm quite happy :)

The script needs to pull data from hundreds of Web pages, but not millions, so I opted to use threads. The script actually takes the list of things to look for as a set of XPath expressions on the command line, which makes it super flexible. Let me give you some hints for the parts that I found difficult.

First of all, here's how to install it. If you're using Ubuntu, then:apt-get install libxslt1-dev libxml2-dev
# I also have python-dev, build-essentials, etc. installed.
easy_install lxml
easy_install BeautifulSoupIf you're using MacPorts, doport install py25-lxml
easy_install BeautifulSoupThe FAQ states that if you use MacPorts, you may encounter difficulties because you will have multiple versions of libxml and libxslt installed. For instance, the following may segfault:python -c "import webbrow…

Python: Permission denied: '/var/www/.python-eggs'

I have a Pylons app, and I got the following exception in my logs:The following error occurred while trying to extract file(s) to the Python egg
cache:

[Errno 13] Permission denied: '/var/www/.python-eggs'

The Python egg cache directory is currently set to:

/var/www/.python-eggs

Perhaps your account does not have write access to this directory? You can
change the cache directory by setting the PYTHON_EGG_CACHE environment
variable to point to an accessible directory.The problem is that the app was running as www-data (which was the user created for nginx and Apache). www-data's home directory is /var/www, but it doesn't have write access to it. (I'm afraid of allowing write access so that it can unpack eggs into that directory because that directory is the web root. In general, you should be careful of what you put in the web root.)

There are a few ways to address this problem. One is to make sure to always use --always-unzip when installing eggs. Another is to cr…