Monday, August 04, 2008

Python: Memory Conservation Tip: Nested Dicts

I'm working with a large amount of data, and I have a data structure that looks like:
pair_counts[(a, b)] = count
It turns out that in my situation, I can save memory by switching to:
pair_counts[a][b] = count
Naturally, the normal rules of premature optimization apply: I wrote for readability, waited until I ran out of memory, did lots of profiling, and then optimized as little as possible.

In my small test case, this dropped my memory usage from 84mb to 61mb.

3 comments:

Sidharth Shah said...

Considering a,b are string could pair[a+"\t"+b] = count be of any use? I havent profiled it, but was just wondering.

Shannon -jj Behrens said...

Compared to nesting the dicts, a+"\t"+b duplicates a for everything that it is related to:

a\tb
a\tc
a\td
b\tc

Nesting the dicts gets rid of the duplication. Of course, you never know for certain what will happen until you try it ;)

Mark said...

Assuming you don't care about edge direction, what about:

def normalize(a,b):
if(a > b):
return (b,a)
return (a,b)

pair_counts[normalize(a,b)] = count