24 Million entries and I need to what?
adamstevenson at gmail.com
Fri Dec 27 14:01:55 MST 2013
Selecting the correct data structure is key. The fastest is going to be a
hash table with O(1). With 24 million 256 bit strings, your space
requirements are about 800-900 megs of memory.
Try using a
stl::hash_map<sha_bits, bool> // c++
HashMap< sha_bits, boolean> // java
If you need to compress the space down you can use a trie
If you want quick and dirty you could just use memcached and still probably
get the performance you want. I would avoid using a database if at all
On Fri, Dec 27, 2013 at 10:55 AM, S. Dale Morrey <sdalemorrey at gmail.com>wrote:
> I would love for you to tell me that, but still I'm trying to verify a
> particular implementation of the algorithm,
> Your CDB file idea is a good one. I'm going to investigate it further.
> On Fri, Dec 27, 2013 at 10:50 AM, Steve Meyers <steve at plug.org> wrote:
> > On 12/27/13 10:43 AM, S. Dale Morrey wrote:
> >> Yes, that is exactly what I'm doing. I'm checking the propensity for a
> >> random string of characters to have a hash collision with an existing
> >> known
> >> set of words given an unsalted hashing algorithm.
> > Can I save you the trouble by telling you how unlikely it is to happen?
> > Since your 24 million existing hashes are static, I'd look into how big a
> > CDB file would be, and then use that to check for collisions.
> > Steve
> > /*
> > PLUG: http://plug.org, #utah on irc.freenode.net
> > Unsubscribe: http://plug.org/mailman/options/plug
> > Don't fear the penguin.
> > */
> PLUG: http://plug.org, #utah on irc.freenode.net
> Unsubscribe: http://plug.org/mailman/options/plug
> Don't fear the penguin.
More information about the PLUG