Hash vs list

Levi Pearson levi at cold.org
Fri Apr 21 20:51:59 MDT 2006


On Apr 21, 2006, at 6:20 PM, Jeff Schroeder wrote:

> I'm curious whether searching for a single value in a hash is  
> generally
> (always?) faster than searching for it in a list.  The programming I'm
> doing is in PHP, but I think it's a question with a general answer.

Simple answer:  No, searching for a single value in a hash is not  
always faster.  The difference is that hash lookup is O(1) while  
lookup in an unordered list is O(n).  Since you said you haven't had  
a data structures class, I'll assume you haven't learned O-notation  
and give a little summary.

What O-notation gives you a measure of is the time-complexity of an  
algorithm.  This means it lets you know how the amount of time scales  
up as the problem size (represented by n) grows.  So, O(1) means the  
amount of time stays the same regardless of n.  O(n) means the time  
to solve the problem grows linearly with n.  So, if it takes x  
milliseconds to look up a single value in an unordered list of size  
1, it takes n*x to look up a single value in a list of size n.  Other  
common complexities are O(log n) (better than O(n) )and O(n^2) (worse  
than O(n) ).

Anyway, what O-notation doesn't tell you is what the constant factors  
in the algorithm are.  It's all about how the algorithm scales.  This  
means that if you know your problem will always be a certain size,  
the optimal algorithm might be different than if you know your  
problem set will grow, depending on the constant factors.

Now, back to reality.  Hash tables offer O(1) lookup which is  
implemented by running a hashing function on the key value.  The  
result of this function is an index to an array.  If there was a  
collision in the hash table (common with large tables), there is a  
small list to search through to find the desired element.  If there  
was no collision, the value is stored directly in the array.  So, you  
see there is a bit of overhead involved.  There's also quite a bit of  
space overhead as well, since the table array is often rather sparse.

So, for small lists, a naive search algorithm may actually be faster  
than a hash lookup.  This will vary widely on the hash and search  
implementations, though, and certainly the hash will look better and  
better as the problem size increases.

I hope that made some sense.  Let me know if there's anything that  
needs clarification.

		--Levi



More information about the PLUG mailing list