I'm working with a large amount of data, and I have a data structure that looks like:
In my small test case, this dropped my memory usage from 84mb to 61mb.
pair_counts[(a, b)] = countIt turns out that in my situation, I can save memory by switching to:
pair_counts[a][b] = countNaturally, the normal rules of premature optimization apply: I wrote for readability, waited until I ran out of memory, did lots of profiling, and then optimized as little as possible.
In my small test case, this dropped my memory usage from 84mb to 61mb.
Comments
a\tb
a\tc
a\td
b\tc
Nesting the dicts gets rid of the duplication. Of course, you never know for certain what will happen until you try it ;)
def normalize(a,b):
if(a > b):
return (b,a)
return (a,b)
pair_counts[normalize(a,b)] = count