Monday, August 04, 2008

Python: Memory Conservation Tip: Nested Dicts

I'm working with a large amount of data, and I have a data structure that looks like:
pair_counts[(a, b)] = count
It turns out that in my situation, I can save memory by switching to:
pair_counts[a][b] = count
Naturally, the normal rules of premature optimization apply: I wrote for readability, waited until I ran out of memory, did lots of profiling, and then optimized as little as possible.

In my small test case, this dropped my memory usage from 84mb to 61mb.


Sidharth Shah said...

Considering a,b are string could pair[a+"\t"+b] = count be of any use? I havent profiled it, but was just wondering.

Shannon -jj Behrens said...

Compared to nesting the dicts, a+"\t"+b duplicates a for everything that it is related to:


Nesting the dicts gets rid of the duplication. Of course, you never know for certain what will happen until you try it ;)

Mark said...

Assuming you don't care about edge direction, what about:

def normalize(a,b):
if(a > b):
return (b,a)
return (a,b)

pair_counts[normalize(a,b)] = count