Hash vs list
Levi Pearson
levi at cold.org
Fri Apr 21 20:51:59 MDT 2006
On Apr 21, 2006, at 6:20 PM, Jeff Schroeder wrote:
> I'm curious whether searching for a single value in a hash is
> generally
> (always?) faster than searching for it in a list. The programming I'm
> doing is in PHP, but I think it's a question with a general answer.
Simple answer: No, searching for a single value in a hash is not
always faster. The difference is that hash lookup is O(1) while
lookup in an unordered list is O(n). Since you said you haven't had
a data structures class, I'll assume you haven't learned O-notation
and give a little summary.
What O-notation gives you a measure of is the time-complexity of an
algorithm. This means it lets you know how the amount of time scales
up as the problem size (represented by n) grows. So, O(1) means the
amount of time stays the same regardless of n. O(n) means the time
to solve the problem grows linearly with n. So, if it takes x
milliseconds to look up a single value in an unordered list of size
1, it takes n*x to look up a single value in a list of size n. Other
common complexities are O(log n) (better than O(n) )and O(n^2) (worse
than O(n) ).
Anyway, what O-notation doesn't tell you is what the constant factors
in the algorithm are. It's all about how the algorithm scales. This
means that if you know your problem will always be a certain size,
the optimal algorithm might be different than if you know your
problem set will grow, depending on the constant factors.
Now, back to reality. Hash tables offer O(1) lookup which is
implemented by running a hashing function on the key value. The
result of this function is an index to an array. If there was a
collision in the hash table (common with large tables), there is a
small list to search through to find the desired element. If there
was no collision, the value is stored directly in the array. So, you
see there is a bit of overhead involved. There's also quite a bit of
space overhead as well, since the table array is often rather sparse.
So, for small lists, a naive search algorithm may actually be faster
than a hash lookup. This will vary widely on the hash and search
implementations, though, and certainly the hash will look better and
better as the problem size increases.
I hope that made some sense. Let me know if there's anything that
needs clarification.
--Levi
More information about the PLUG
mailing list