24 Million entries and I need to what?
plug.org at todandlorna.com
Sun Dec 29 01:31:03 MST 2013
On 12/28/2013 2:01 PM, Sasha Pachev wrote:
> However, the exercise of computing and comparing a lot of hashes within some
> reasonable time is very interesting. Knowing how to do it efficiently
> is a skill that will come handy at some point in your life.
In all honesty, I think this is the opposite of an interesting problem.
This is more of an exercise for a second or third year CS student. It's
a trade-off between cpu time and memory on the computing side, and
memory and storage access time on the comparison side. The computing
side is mostly CPU bound unless you're in a micro processor architecture
of some sort, because the memory isn't going to matter unless your
algorithm uses a LOT of memory (think scrypt).
As far as comparison, your data structure is going to be an array of
strings for a small set, or maybe a hash map. For a large set, you
better be using a tree for indexes or you're going to have a bad time.
What algorithm you put that data structure through will depend on your
hardware and priorities (speed? data analysis? data set size?). Really
someone that's had a data structure's course should be able to bang out
the comparison side of things in an hour or less, depending on
This is, of course, ignoring the idea that a sysadmin would be looking
at this from a "pipe sort to uniq" solution which we established
wouldn't be helpful anyway.
Anyway, that's my thoughts on the subject.
More information about the PLUG