24 Million entries and I need to what?

Lonnie Olson lists at kittypee.com
Fri Dec 27 13:33:12 MST 2013


On Fri, Dec 27, 2013 at 10:29 AM, Lonnie Olson <lists at kittypee.com> wrote:
> Or just use some other binary indexing method.  MySQL, Postgres, SQLite, BDB.

Proof of concept in SQLite.

# Generate hashes
for word in $(cat /usr/share/dict/words); do echo -en "$word\t"; echo
"$word" | sha256sum; done > words
# Create basic Database schema
sqlite3 words.db
CREATE TABLE hashes (id INTEGER PRIMARY KEY, hash TEXT, value TEXT);
.quit
# Insert hashes into database
awk '{print "INSERT INTO hashes (hash,value) VALUES (\"" $2 "\",\"" $1
"\");"}' words | sqlite3 words.db
# Create an index on the hash column
sqlite3 words.db
CREATE UNIQUE INDEX hash on hashes (hash);
.quit
# Query database for hash values
awk '{print "SELECT value FROM hashes WHERE hash=\"" $2 "\";"}' words
| sqlite3 words.db
# OR use an insert, and check return value for success
echo "INSERT INTO hashes (hash,value) VALUES
('f2ca1bb6c7e907d06dafe4687e579fce76b37e4e93b7605022da52e6ccc26fd2','test');"
| sqlite3 words.db || echo "COLLISION FOUND"


More information about the PLUG mailing list