24 Million entries and I need to what?

Tod Hansmann plug.org at todandlorna.com
Sun Dec 29 01:31:03 MST 2013

On 12/28/2013 2:01 PM, Sasha Pachev wrote:
> However, the exercise of computing and comparing a lot of hashes within some
> reasonable time is very interesting. Knowing how to do it efficiently
> is a skill that will come handy at some point in your life.
In all honesty, I think this is the opposite of an interesting problem.  
This is more of an exercise for a second or third year CS student.  It's 
a trade-off between cpu time and memory on the computing side, and 
memory and storage access time on the comparison side.  The computing 
side is mostly CPU bound unless you're in a micro processor architecture 
of some sort, because the memory isn't going to matter unless your 
algorithm uses a LOT of memory (think scrypt).

As far as comparison, your data structure is going to be an array of 
strings for a small set, or maybe a hash map.  For a large set, you 
better be using a tree for indexes or you're going to have a bad time.  
What algorithm you put that data structure through will depend on your 
hardware and priorities (speed? data analysis? data set size?).  Really 
someone that's had a data structure's course should be able to bang out 
the comparison side of things in an hour or less, depending on 

This is, of course, ignoring the idea that a sysadmin would be looking 
at this from a "pipe sort to uniq" solution which we established 
wouldn't be helpful anyway.

Anyway, that's my thoughts on the subject.

-Tod Hansmann

More information about the PLUG mailing list