Thursday, February 05, 2009

Ruby: A Python Programmer's Perspective Part III

This is a somewhat random list of things that were interesting or surprising to me when I read Ruby for Rails. The previous post in the series is Ruby: A Python Programmer's Perspective Part II.

I've mentioned before that Ruby has symbols which are similar to, but distinct from, strings (which are mutable). The syntax is :foo. To create a symbol with spaces, use :"foo bar".

Ruby has two syntax for ranges. ".." includes both endpoints. "..." does not include the right endpoint and thus behaves like slices in Python do:
>> (0...2).each {|x| p x}
0
1
=> 0...2

>> (0..2).each {|x| p x}
0
1
2
=> 0..2
Since the "+" operator is just syntactic sugar for the "+" method, the following are equivalent:
>> 1 + 1
=> 2
>> 1.+(1)
=> 2
Whenever possible, I think interpreters should raise an exception if you misspell something. Hence, I found the following interesting:
>> def f
>> p @undefed_instance_variable # This prints nil.
>> p undefined_local # This raises a NameError.
>> end
=> nil
>> f
nil
NameError: undefined local variable or method `undefined_local' for main:Object
from (irb):39:in `f'
from (irb):41
nil and false are the only two objects that evaluate to false:
>> if []: puts "Still true"; end
Still true
=> nil
In Python, "truthiness" is a lot more complex. For instance, [1] is true, whereas [] is false, and in general, classes can define their own notion of truthiness.

Ruby has "==", "===", ".eql", and ".equal". "===" is for case statements. "==" is related to ">=", etc. ".eql" tests that the two objects have the "same content". Usually, "==" and ".eql" return the same thing, but their implementation is different. ".equal" is reserved for object identity. Hence:
>> "a" == "a"
=> true
>> "a".eql?("a")
=> true
>> "a".equal?("a")
=> false
>> "a" === "a"
=> true
If you want to see all of a class's instance methods, use the "instance_methods" method. If you don't want to see inherited methods, pass false to "instance_methods":
>> class C
>> def f
>> end
>> end
=> nil
>>
?> C.instance_methods
=> ["inspect", "f", "clone", "method", "public_methods",
"instance_variable_defined?", "equal?", "freeze", "methods", "respond_to?",
"dup", "instance_variables", "__id__", "object_id", "eql?", "require", "id",
"singleton_methods", "send", "taint", "frozen?", "instance_variable_get",
"__send__", "instance_of?", "to_a", "type", "protected_methods",
"instance_eval", "==", "display", "===", "instance_variable_set", "kind_of?",
"extend", "to_s", "hash", "class", "tainted?", "=~", "private_methods", "gem",
"nil?", "untaint", "is_a?"]
>>
?> C.instance_methods(false)
=> ["f"]
Using Class.instance_methods.sort is helpful too.

When you are appending one string onto the end of another string, using += does not modify the original string, whereas << does:
>> s = "foo"
=> "foo"
>> t = s
=> "foo"
>> s += "bar"
=> "foobar"
>> s
=> "foobar"
>> t
=> "foo"
>> u = t
=> "foo"
>> u << "bar"
=> "foobar"
>> u
=> "foobar"
>> t
=> "foobar"
Indexing a string returns that character's ASCII value, which is a bit surprising. Slicing does what you would expect:
=> "abc"
>> s[1]
=> 98
>> s[1,1]
=> "b"
However, indexing a string to set a character works as expected:
>> s[1] = 'B'
=> "B"
>> s
=> "aBc"
Arrays have a "concat" method. "concat" and "+" operate as you might expect. "concat" is conceptually "concat!" because it modifies the array, but it's spelled without the "!":
>> s = [1, 2]
=> [1, 2]
>> t = [3, 4]
=> [3, 4]
>> s + t
=> [1, 2, 3, 4]
>> s
=> [1, 2]
>> s.concat(t)
=> [1, 2, 3, 4]
>> s
=> [1, 2, 3, 4]
>> t
=> [3, 4]
To get both the values and indexes of an array as you loop over it, use the each_with_index method:
>> ['a', 'b', 'c'].each_with_index {|x, i| puts "#{i} => #{x}"}
0 => a
1 => b
2 => c
There are a couple ways to create a hash. The second syntax is a bit weird to my eyes:
>> {:a => :b}
=> {:a=>:b}

>> Hash[:a => :b]
=> {:a=>:b}
By default, you'll get nil if you try to lookup a key that doesn't exist in a hash. As I've mentioned before, I prefer to get exceptions by default if I misspell things. Fortunately, the fetch method exists:
>> h = {:a => :b}
=> {:a=>:b}
>> h[:c]
=> nil
>> h.fetch(:c)
IndexError: key not found
from (irb):50:in `fetch'
from (irb):50
The default nil behavior of Ruby hashes is particular frustrating to me since Ruby uses hashes for keyword arguments. It's really easy for a misspelled option to lead to a silent bug:
>> def f(arg, options)
>> p arg
>> if options[:excited]
>> puts "Oh, yeah!" # This doesn't get printed.
>> end
>> end
=> nil
>>
?> f("hello", :excitid => true) # Oops, typo.
"hello"
=> nil
I think real support for keyword arguments is better. Here's the Python:
>>> def f(arg, excited=False):
... print arg
... if excited:
... print "Yeah!"
...
>>> f("hello", excitid=True)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: f() got an unexpected keyword argument 'excitid'
You can still use the Ruby approach in Python using the "**kargs" syntax, but it's not the default.

I was a little surprised that you could connect two statements with a semicolon inside parenthesis:
>> (puts 'hi'; puts 'there') unless 100 < 10
hi
there
=> nil
Iterating over a string yields lines, not characters like it does in Python:
>> "foo\nbar".each {|line| puts line}
foo
bar
=> "foo\nbar"
When using regular expressions, the =~ operator returns the index of the match, whereas the #match method returns a MatchData object:
>> /o/ =~ 'foo'
=> 1
>> /o/.match('foo')
=> #<MatchData:0x35d5cc>
Be careful. The MatchData object stores captures starting at 1, whereas the #captures method stores captures starting at 0:
>> match = /(o)/.match('foo')
=> #<MatchData:0x357654>
>> match[1]
=> "o"
>> match.captures[0]
=> "o"
See my earlier post concerning "^" vs. "\A" in regular expressions. Ruby and Python are subtly different. Whereas I would normally use "^" in Python, to get the same behavior, I would use "\A" in Ruby.

If you want to treat a string as an array of characters, use '"foobar".split(//)'.

String#scan is a pretty cool method that repeatedly tries to match a regular expression while iterating over a string. Unlike #match, it returns all of the matches, instead of just the first:
>> s = "John Doe was the father of Jane Doe."
=> "John Doe was the father of Jane Doe."
>> s.scan(/([A-Z]\w+)\s+([A-Z]\w+)/)
=> [["John", "Doe"], ["Jane", "Doe"]]
Okay, that's it for this post, but I've got more coming :)

3 comments:

Philip Jenvey said...

Note that Python's re.findall is analogous to String#scan

Shannon -jj Behrens said...

Phil, nice! I didn't know about that.

Shannon -jj Behrens said...

The next post in the series is here:

http://jjinux.blogspot.com/2009/02/ruby-python-programmers-perspective_9838.html