Current results of "dictionary word count" programs...
jonathan at carnageblender.com
Mon Mar 13 13:06:57 MST 2006
On Mon, 13 Mar 2006 12:33:48 -0700, "Bryan Sant" <bryan.sant at gmail.com>
> Python 2.4.2 (bad algorithm?)
> LOC: 6
> Best Time: 31.724
> Worst Time: 32.417
> Avg. Time: 31.98
> I'm still trying to get the lisp version to work (I have a load
> error). I'd like a good PHP and Perl version as well as a better
> Python version (the python version isn't producing acurate output and
> is WAY slower than is reasonable).
Ouch. Yeah, Tyler's python code is pretty screwed up. (Ab)using list
comprehensions like that means you materialize the whole data set into
memory, and using a list for lookup instead of a dict is going to cause
efficiency problems. Here's my quick-and-dirty version.
- Accepts input on stdin if no file specified on the commandline, which
makes testing via pipes easier.
- For consistency, the empty string is not considered a word (even
though it's in my dictionary). Otherwise, there is a question in my
mind as to whether " foo" should be one word or three or even two.
WORDS_FNAME = '/usr/share/dict/words'
words = dict((line.rstrip(), 0) for line in file(WORDS_FNAME))
source = len(sys.argv) > 1 and file(sys.argv) or sys.stdin
for line in source:
for word in line.strip().split():
if not word:
words[word] += 1
for word, count in words.iteritems():
if count > 0:
print word, count
C++ is history repeated as tragedy. Java is history repeated as farce. --Scott McKay
More information about the PLUG