24 Million entries and I need to what?

Levi Pearson levipearson at gmail.com
Fri Dec 27 21:28:01 MST 2013

On Fri, Dec 27, 2013 at 8:19 PM, S. Dale Morrey <sdalemorrey at gmail.com> wrote:
> Well Levi you would be quite correct if the intent were to actually seek a
> collision across 256 bits of space.  That is not what I'm going for here.
>  In my mind detecting a collision would be evidence of a flawed
> implementation of the hashing algorithm which is what my experiment is
> actually seeking to uncover.  So I'm checking the specific implementation
> to rule out a flaw, not nessecarily the algorithm.

You might try thinking about it this way:

Let's say we have an algorithm, X, that performs some mathematical
function. This means for every value in the input domain, there will
be a single value from the output co-domain that the function pairs it
with. Because this function is defined by an algorithm, we have an
unambiguous recipe for determining which value from the co-domain is
paired with some arbitrary value from the input domain. We need only
follow the algorithm!

Now, we have a computer program, Y, which purports to be a correct
implementation of algorithm X.  How might we go about verifying that
it is indeed a correct implementation?

What you have proposed is that we take some bit of knowledge about the
function defined by X, namely that it has the property P that it is
extremely unlikely that two different values from the domain will map
to the same value in the co-domain.  So if we run Y a very large
number of times, and we have no collisions (i.e., it has not produced
a counterexample to P), does this give us confidence that Y is a
correct manifestation of X?  In fact, it might make us *slightly* more
confident that it is correct, but hardly enough to matter.

Consider the program Y_fake that implements an entirely different
algorithm than X, but which also has a property P, at least to the
extent that is measurable on your equipment in a reasonable amount of
time. Your test cannot differentiate between Y_fake and Y in any sense
other than a very vague and probabilistic one, and will probably only
find differences after the expense of a great deal of time and
computing power.

I suggest you skip backwards a couple of paragraphs and revisit the
real question: Is Y a correct implementation of algorithm X?  How can
we verify it?

There are a couple of approaches one could take, but I trust you can
think of a better one than your current approach.


More information about the PLUG mailing list