Current results of "dictionary word count" programs...

Hans Fugal hans at fugal.net
Mon Mar 13 19:18:18 MST 2006


Nifty, but don't forget that you've got confounding and results should
be taken with a grain of salt. That means you should have labeled them
as "Bryan's Java version" and so on.

If you want to compare languages, you're going to have to get rid of the
counfounding factors and settle on one algorithm, with several
proficient programmers from each language reviewing to make sure there's
no stupid inefficiencies, and run it on various input sizes and in
varying orders.

I'd also like to see maximum memory footprint too, but I also don't know
how to get it. Maybe with a memory debugger like valgrind or something? 


OTOH if you want to compare algorithms, these numbers are close enough
that if you go for more varied input that gets quite big, like Jason's
KJV input, then you can see the order regardless of the constant factor
imposed by the language itself. 

On Mon, 13 Mar 2006 at 12:33 -0700, Bryan Sant wrote:
> Here are the current results for the "count the dictionary words in a
> file" programs submitted thus far.
> 
> All programs were executed on my Ubuntu 5.10 system:
> IBM Thinkpad T43 - Intel Pentium M 2.26GHz
> 
> I used /usr/share/dict/words as the dictionary (which contains 96274 words).
> I used an uncompressed copy of /usr/share/doc/bash/changelog.gz as the
> input file (which contains 42362 words).
> 
> C++ (GCC 4.0.2 with -O2)
> ------
> LOC:  34
> Best Time:  0.804
> Worst Time:  2.087
> Avg. Time:  1.44
> 
> Java (Sun 1.5)
> ------
> LOC:  35
> Best Time:  1.247
> Worst Time:  1.622
> Avg. Time:  1.54
> 
> Ruby 1.8.3 ("scripted" version)
> ------
> LOC:  18
> Best Time: 1.966
> Worst Time:  3.297
> Avg. Time:  2.35
> 
> Python 2.4.2 (bad algorithm?)
> ------
> LOC: 6
> Best Time: 31.724
> Worst Time: 32.417
> Avg. Time:  31.98
> 
> I'm still trying to get the lisp version to work (I have a load
> error).  I'd like a good PHP and Perl version as well as a better
> Python version (the python version isn't producing acurate output and
> is WAY slower than is reasonable).
> 
> -Bryan
> 
> /*
> PLUG: http://plug.org, #utah on irc.freenode.net
> Unsubscribe: http://plug.org/mailman/options/plug
> Don't fear the penguin.
> */
> 

-- 
Hans Fugal ; http://hans.fugal.net
 
There's nothing remarkable about it. All one has to do is hit the 
right keys at the right time and the instrument plays itself.
    -- Johann Sebastian Bach
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://plug.org/pipermail/plug/attachments/20060313/ba542211/attachment.bin 


More information about the PLUG mailing list