# 24 Million entries and I need to what?

Mister Ed Mister.Ed at AgoraCart.com
Fri Dec 27 21:44:39 MST 2013

```On 12/27/2013 9:28 PM, Levi Pearson wrote:
> On Fri, Dec 27, 2013 at 8:19 PM, S. Dale Morrey <sdalemorrey at gmail.com> wrote:
>> Well Levi you would be quite correct if the intent were to actually seek a
>> collision across 256 bits of space.  That is not what I'm going for here.
>>   In my mind detecting a collision would be evidence of a flawed
>> implementation of the hashing algorithm which is what my experiment is
>> actually seeking to uncover.  So I'm checking the specific implementation
>> to rule out a flaw, not nessecarily the algorithm.
>
> You might try thinking about it this way:
>
> Let's say we have an algorithm, X, that performs some mathematical
> function. This means for every value in the input domain, there will
> be a single value from the output co-domain that the function pairs it
> with. Because this function is defined by an algorithm, we have an
> unambiguous recipe for determining which value from the co-domain is
> paired with some arbitrary value from the input domain. We need only
>
> Now, we have a computer program, Y, which purports to be a correct
> implementation of algorithm X.  How might we go about verifying that
> it is indeed a correct implementation?
>
> What you have proposed is that we take some bit of knowledge about the
> function defined by X, namely that it has the property P that it is
> extremely unlikely that two different values from the domain will map
> to the same value in the co-domain.  So if we run Y a very large
> number of times, and we have no collisions (i.e., it has not produced
> a counterexample to P), does this give us confidence that Y is a
> correct manifestation of X?  In fact, it might make us *slightly* more
> confident that it is correct, but hardly enough to matter.
>
> Consider the program Y_fake that implements an entirely different
> algorithm than X, but which also has a property P, at least to the
> extent that is measurable on your equipment in a reasonable amount of
> time. Your test cannot differentiate between Y_fake and Y in any sense
> other than a very vague and probabilistic one, and will probably only
> find differences after the expense of a great deal of time and
> computing power.
>
> I suggest you skip backwards a couple of paragraphs and revisit the
> real question: Is Y a correct implementation of algorithm X?  How can
> we verify it?
>
> There are a couple of approaches one could take, but I trust you can
> think of a better one than your current approach.
>
>         --Levi

Levi, if we could add the heart of a teacher to that mathematical
fundamentalist thinking, we'd have either a coding super soldier to
contend with and/or the engineering equiv of yoda: seeking the wrong
goal, you are. Rephrase the question, you should. Strong with the
numbers Levi is.

```