Skip to main content

PyCon: Advanced Python Tutorials

I took Raymond Hettinger's Advanced Python I and II tutorials. These are my notes. See the website for more details: I and II.

Here's the source code for Python 2 and Python 3.

Raymond is the author of itertools, the set class, the key argument to the sort function, parts of the decimal module, etc.

He said nice things about "Python Essential Reference".

He said nice things about the library reference for Python. If you install Python, it'll get installed.

Read the docs for the built-in functions. It's time well-invested.

He likes Emacs and Idle. He uses a Mac.

Use the dis module to disassemble code. That's occasionally useful.

Use rlcompleter to add tab completion to the Python shell.

Use "python -m test.pystone" to test how fast your machine is.

Show "python -m turtle" to your kids.

Don't be afraid to look at the source code for a module.

He likes "itty", a tiny web framework.

The decimal module is 6000 lines long!

Idle has more stuff than I though, although I still think PyCharm is better.

He seems to use Idle to browse code and Emacs to edit code.

Use function.__name__ to get a function's name.

Use a bound method to save on method lookup in a tight loop. Notice the naming pattern:
s = []
s_append = s.append
He is very optimistic about PyPy. He thinks it'll become the defacto standard for Python use.

Here are some optimization tips:
  • Replace global lookups (and builtin lookups) by setting local aliases.
  • Use bound methods to avoid method lookups.
  • Minimize pure-python function calls inside a loop.
A new stack frame is created for every function call.

You should only need to use speedups like the above in a handful of places such as inner loops.

Listening to him explain how expensive even simple things in Python are makes me want to switch to Go ;)

Manually inline function calls in some cases.

Here's how to time code:
from timeit import Timer
print min(Timer(stmt, setup).repeat(7, 20))
"Loop invariant code motion" is an optimization technique where you move stuff outside the loop where possible.

"Vectorization" [according to him] means replacing CPython's eval-loop with a C function that does all the work for you. For instance, he suggests moving from list comprehensions to map where it makes sense.

Use to parallelize map.

He keeps plugging PyPy.

itertools.repeat(2, 100) repeats 2 over-and-over again, 100 times.

itertools.count(0) counts starting at 0.

In some cases, switching to itertools can get you most of the performance benefits that you might get by switching to C.

Here are the optimization techniques he covered: vectorize, localize, use bound methods, move loop invariants out of the loop, and reduce the number of Python function calls.

itertools now has new functions called permutations, combinations, and product (which gives you the cartesian product of two sequences).

You can use these to generate all the possible test cases given a set of states.

Think of itertools.product as the functional approach to nested for loops:
for t in product([0, 1], repeat=3): print t
is the same as:
for a in [0, 1]:
for b in [0, 1]:
for c in [0, 1]:
print (a, b, c)
Other useful things:




vars(foo) == foo.__dict__

Use dir(foo) to get the public API for foo.

sorted(vars(collections).keys()) is the same as dir(foo), but dir also removes the private methods.

"Everything in Python is based on dictionaries."

He said that Guido added OOP to Python in a weekend.

Raymond showed code that simulated classes using just functions and dicts.

"import antigravity" launches the famous XKCD cartoon on Python in a browser.

Use "" to open a URL in a browser.

ChainMap is a new tool in Python 3.3 to do a chain of lookups in a list of

"I used to be a high frequency trader. I helped destroy the world's economy. Sorry 'bout that."

Using the collections.namedtuple module is a great way to improve the readability of code.

There are many useful, valid uses for exec and eval. He criticized people who think that exec and eval are universally evil.

collections.namedtuple is based on exec.

He showed Python code generation (i.e. generating code as a string and then passing it to exec). Using a piece of Python data that acts as a DSL, you can generate some Python code and pass it to exec. You can generate code for other programming languages just as well as Python.

He thinks that showing a little bit of code is better than letting people download slides.

Here's a trick: subclass an object, and add methods for all the double under methods in order to add logging. This lets you track how the method was used. You can use this to evaluate stuff symbolically instead of arithmetically. For instance, subclass int, add methods for things like __add__, and keep track of how __add__ was called.

Polymorphism and operator overloading let you create custom classes that do additional stuff that numbers can't.

He showed function dispatch, like:
getattr(self, 'do_' + cmd)(arg)
See the cmd module.

Python's grammar is in grammar/grammar in the source code.

He showed how PLY puts Lex and Yacc expressions in docstrings. I.e. PLY uses docstrings to hold a DSL that PLY understands.

He showed loops with else clauses.

Knuth was the one who first came up with the idea of adding something like an else clause to a loop.

The idea that you shouldn't return in the middle of a function is advice from days gone by that no longer makes sense.

The nice thing about the way Python intervals works is:
s[2:5] + s[5:8] == s[2:8]
Copy a list: c = s[:]

Clear a list: del s[:]

Another way to clear a list: s[:] = []

In Python 3.X, a copy method was added to the list class. They're also adding a clear method to lists, to match all the other collections.

You can use itemgetter and attrgetter for the key function when calling list.sort. There's also methodcaller.

Use locale.strxfrm for the key function when sorting strings for locale-aware sorting.

Sort has a keyword argument named reverse.

To sort with two keys, use two passes:
s.sort(key=attrgetter('lastname'))           # Secondary key
s.sort(key=attrgetter('age'), reverse=True) # Primary key
"deque" is pronounced "deck". It gives you O(1) appends and pops from both ends.

"deque" is a "double ended queue".

He also mentioned defaultdict, counter, and OrderedDict. counter is a dict that knows how to count.

Here's how to use a namedtuple:
namedtuple('Point', 'x y z')
p = Point(10, 20)
Here's how to use a defaultdict:
d = defaultdict(list)
dict.__missing__ gets called if you lookup something that isn't in the dict. You can subclass dict and just add a __missing__ method.

idle has nice tab completion in the shell. It also has a nice menu item to lookup modules by name so you can find the source easily.

You can use __getattr__ to introduce tracing.

He pronounces "__missing__" as "dunder missing". "dunder" is an abbreviation for "double underscore".

Writing "d.x" implicitly calls __getattribute__ which works as follows:
Check the instance.
Check the class tree:
If it's a descriptor, invoke it.
Check for __getattr__:
If it exists, invoke it.
Otherwise, raise AttributeError.
OrderedDict is really helpful when you must remember the order. This helps if you're going to move to a dict temporarily and then want stuff to come back out in the same order that it went in.

Each of the methods in OrderedDict has the same big O as the respective methods in dict. (Presumably, the constants are different.)

Here is Raymond's documentation on descriptors.

Here's a descriptor:
class Desc(object):

def __get__(self, obj, objtype):
# obj will be None if the descriptor is invoked on the class.
print "Invocation!"
return obj.x + 10

class A(object):

def __init__(self, x):
self.x = x

plus_ten = Desc()

a = A(5)
If you attach a descriptor to an instance instead of a class, it won't work.

There is more than one __getattribute__ method:
A.x => type.__getattrbute__(A, 'x')
a.x => object.__getattribute__(a, 'x')
By overriding __getattribute__, you "own the dot".

"Super Considered Super" was a blog post he wrote to refute "Super Considered Harmful".

__mro__ gives you the method resolution order.

super() doesn't necessarily go to the parent class of the current class. It's all about the instance's ancestor tree. super() might go to some other part of the the instance's MRO, some part that your class doesn't necessarily know about.

Functions are descriptors. If you attach a function to a class dictionary, it'll add the magic for bound methods.

A.__dict__['func'] returns a normal function. A.func returns an unbound method. A().func returns a bound method.

Here is an example of using slots.

Here is another example of using slots:
class A(object):
__slots__ = 'x', 'y'
If you have an instance of a class that uses slots, then it won't have a __dict__ attribute.

The type metaclass controls how classes are created. It supplies them with __getattribute__.

"Descriptors are how most of the modern features of Python were built.""

At this point in the day, my brain was dead, and he was about to start talking about Unicode. I'm not sure that saving Unicode for the end of the day is the best strategy ;)

He said that "unicode" should be called "unitable".

Unicode is a dictionary of code points to strings. The glyphs are not part of Unicode. They're part of a font rendering engine.

There are more than 100k unicode code points.

Microsoft and Apple worked hard on Arial so that it has glyphs for almost every codepoint.

from unicodedata import category, name

Arabic and Chinese have their own glyphs for digits. int works correctly with all the different ways to write numbers.

There are two ways to write an umlat O because of combining characters.

Use "unicodedate.normalize('NFC', s)" to normalize the combining characters.

Arabic and Hebrew are written right-to-left--but not for numbers!

There are unicode control characters to switch which direction you're writing:
U+200E is for left-to-right
U+200F is for right-to-left
If you slice a string, you might accidentally chop off the unicode control character which causes the text to be backwards.

Just google for "bidi unicode" to get lots of help.

Most machines are little endian, but the Internet is big endian. Computers byte swap a lot, but they do it in hardware.

Code pages assume that the only people in the world are "us and the Americans." Everyone else gets question marks.

Encodings with "utf" in them do not lose information for any language. Any other encoding does.

If you use UTF-8, you lose the ability to get O(1) random access to characters in the string.

UTF-8 gives you some compression compared to fixed-width encodings, but not much.

The three main unicode problems are "many-to-one, one-to-many, and bidi."

Doubly encoding something or doubly decoding something is a super common problem.

If some characters don't display, it's probably a font problem. Try Arial.

The "one true encoding" is "UTF-8" (according to Tim Berners Lee).

UTF-8 is a superset of ASCII.

UTF-8 has holes. I.e. there are some number combinations that are not valid.

There's a lot of data in the world that is still encoded in UCS2. It's a two byte encoding.

It was a presidential order that caused us to move from EBCDIC to ASCII.

It was the Chinese government that decided UCS2 was not acceptable.

UTF-16_be is a superset of UCS2.

There are only a handful of Chinese characters that don't fit into UCS2. The treble clef is a character that won't fit in UCS2.

To figure out what encoding something is in, HTTP has headers and email has MIME types.

If a browser wants to guess at an encoding, it'll try all the encodings and look for character frequency distributions. You can fool such a browser by giving it a page that says, "which witch has which witch?"

Mojibake is when you get your characters mixed up because you guessed the encoding wrong.


jjinux said…
Python Tutorial for Beginners from Java Code Geeks.

Popular posts from this blog

Ubuntu 20.04 on a 2015 15" MacBook Pro

I decided to give Ubuntu 20.04 a try on my 2015 15" MacBook Pro. I didn't actually install it; I just live booted from a USB thumb drive which was enough to try out everything I wanted. In summary, it's not perfect, and issues with my camera would prevent me from switching, but given the right hardware, I think it's a really viable option. The first thing I wanted to try was what would happen if I plugged in a non-HiDPI screen given that my laptop has a HiDPI screen. Without sub-pixel scaling, whatever scale rate I picked for one screen would apply to the other. However, once I turned on sub-pixel scaling, I was able to pick different scale rates for the internal and external displays. That looked ok. I tried plugging in and unplugging multiple times, and it didn't crash. I doubt it'd work with my Thunderbolt display at work, but it worked fine for my HDMI displays at home. I even plugged it into my TV, and it stuck to the 100% scaling I picked for the othe

ERNOS: Erlang Networked Operating System

I've been reading Dreaming in Code lately, and I really like it. If you're not a dreamer, you may safely skip the rest of this post ;) In Chapter 10, "Engineers and Artists", Alan Kay, John Backus, and Jaron Lanier really got me thinking. I've also been thinking a lot about Minix 3 , Erlang , and the original Lisp machine . The ideas are beginning to synthesize into something cohesive--more than just the sum of their parts. Now, I'm sure that many of these ideas have already been envisioned within , LLVM , Microsoft's Singularity project, or in some other place that I haven't managed to discover or fully read, but I'm going to blog them anyway. Rather than wax philosophical, let me just dump out some ideas: Start with Minix 3. It's a new microkernel, and it's meant for real use, unlike the original Minix. "This new OS is extremely small, with the part that runs in kernel mode under 4000 lines of executable code.&quo

Haskell or Erlang?

I've coded in both Erlang and Haskell. Erlang is practical, efficient, and useful. It's got a wonderful niche in the distributed world, and it has some real success stories such as CouchDB and Haskell is elegant and beautiful. It's been successful in various programming language competitions. I have some experience in both, but I'm thinking it's time to really commit to learning one of them on a professional level. They both have good books out now, and it's probably time I read one of those books cover to cover. My question is which? Back in 2000, Perl had established a real niche for systems administration, CGI, and text processing. The syntax wasn't exactly beautiful (unless you're into that sort of thing), but it was popular and mature. Python hadn't really become popular, nor did it really have a strong niche (at least as far as I could see). I went with Python because of its elegance, but since then, I've coded both p