24 Million entries and I need to what?

Nicholas Stewart nicholas4 at gmail.com
Fri Dec 27 09:02:13 MST 2013


Good question. You could load the lot into a ruby Hash.  Hash lookups
are super fast.

You could also br
Thank you,
Nicholas Stewart


On Fri, Dec 27, 2013 at 1:59 AM, S. Dale Morrey <sdalemorrey at gmail.com> wrote:
> So here's the problem...
>
> I'm exploring the strength of the SHA256 algorithm.
> Specifically I'm looking for the possibility of a hash collision.
>
> To that end I took a dictionary of common words and phrases and ran them
> through the algorithm.
> Now I've got a list with 24 million strings stored 1 to a line in a flat
> text file.
> The file is just shy of 1GB.  Not too bad considering the dictionary I
> borrowed was about 700MB.
>
> Now I want to check for collisions in random space.  I have another process
> generating other seemingly random strings and I want to check the hashes of
> those random strings against this file in the shortest amount of time per
> unit possible.
>
> I already used sort and now the hashes are in alphabetical order.
>
> So now I need to find a way to do the comparison as quickly as possible.
> If the string is a match I need to store the new string and it's
> initialization vector.
>
> I'm thinking grep would be good for this, but it seems to take a couple of
> seconds to come back when searching a single item.  I don't see any way to
> have it read stdin and look for a list.
> I'd like to do this with posix tools, but I'm thinking I may have to write
> my own app to slurp it up into a table of some sort.  A database is a
> possibility I guess, but the latency seems like it might be higher than
> some sort of in memory caching.
>
> Just wondering, what would be the fastest way to do this?
>
> /*
> PLUG: http://plug.org, #utah on irc.freenode.net
> Unsubscribe: http://plug.org/mailman/options/plug
> Don't fear the penguin.
> */


More information about the PLUG mailing list