24 Million entries and I need to what?
levipearson at gmail.com
Fri Dec 27 15:43:00 MST 2013
On Fri, Dec 27, 2013 at 1:59 AM, S. Dale Morrey <sdalemorrey at gmail.com> wrote:
> So here's the problem...
> I'm exploring the strength of the SHA256 algorithm.
> Specifically I'm looking for the possibility of a hash collision.
Man, talk about a lot of unhelpful answers. You don't need an answer
to your question, you need to rethink what you're doing entirely.
Hash collisions are always possible when the set of outputs is smaller
than the set of inputs. You don't need to run any programs to know
this. Furthermore, the algorithm is fully deterministic, so the
probability of collisions is fixed for any fixed input parameters. But
considering the fact that the algorithm was designed to make hash
collisions as unlikely as possible, you should realize that the
collision probability is going to be vanishingly tiny.
And even if, by some cosmically unlikely chance, your 24 million
random strings happen to have a pair of distinct strings that hash to
the same value--you still have not really explored the strength of
SHA256. The strength is a probabilistic thing, and finding a
collision through sheer luck does not affect the probability at all.
The strength of a hash function is determined by the evenness of its
distribution of outputs for given inputs. A perfect hash maps its
inputs to the full range of possible outputs completely evenly, with
no output more likely than any other. For a hash function with 256
bits of output, the number of output buckets is 2^256, which is an
astronomically huge number. I'll leave it to you to calculate the
probability of generating a collision with 24 million samples given
the assumption of a perfectly even distribution, but I think you'll
find it's not worth trying.
If you are unfamiliar with the math behind the test you're trying to
run, you should take a look at Wikipedia's article on the Birthday
Paradox (http://en.wikipedia.org/wiki/Birthday_Paradox) for an
introduction. Don't just read it; run the numbers. Figure out how
much RAM, CPU time, and hard drive space you'd need to have a
reasonable chance of finding a collision (assuming SHA256 has a
perfectly even output distribution). It will be a much better use of
your time than what you're doing now, and it might even lead you to
some ideas that could actually approach the problem of measuring the
strength of SHA256 instead of just wasting electricity.
More information about the PLUG