24 Million entries and I need to what?
nicholas4 at gmail.com
Fri Dec 27 09:02:13 MST 2013
Good question. You could load the lot into a ruby Hash. Hash lookups
are super fast.
You could also br
On Fri, Dec 27, 2013 at 1:59 AM, S. Dale Morrey <sdalemorrey at gmail.com> wrote:
> So here's the problem...
> I'm exploring the strength of the SHA256 algorithm.
> Specifically I'm looking for the possibility of a hash collision.
> To that end I took a dictionary of common words and phrases and ran them
> through the algorithm.
> Now I've got a list with 24 million strings stored 1 to a line in a flat
> text file.
> The file is just shy of 1GB. Not too bad considering the dictionary I
> borrowed was about 700MB.
> Now I want to check for collisions in random space. I have another process
> generating other seemingly random strings and I want to check the hashes of
> those random strings against this file in the shortest amount of time per
> unit possible.
> I already used sort and now the hashes are in alphabetical order.
> So now I need to find a way to do the comparison as quickly as possible.
> If the string is a match I need to store the new string and it's
> initialization vector.
> I'm thinking grep would be good for this, but it seems to take a couple of
> seconds to come back when searching a single item. I don't see any way to
> have it read stdin and look for a list.
> I'd like to do this with posix tools, but I'm thinking I may have to write
> my own app to slurp it up into a table of some sort. A database is a
> possibility I guess, but the latency seems like it might be higher than
> some sort of in memory caching.
> Just wondering, what would be the fastest way to do this?
> PLUG: http://plug.org, #utah on irc.freenode.net
> Unsubscribe: http://plug.org/mailman/options/plug
> Don't fear the penguin.
More information about the PLUG