Interesting little regex

Alan Young alansyoungiii at gmail.com
Thu Feb 23 10:10:08 MST 2006


I know, replying to myself.

Parsing the KJV Bible took about 7 seconds with this:

#!/usr/bin/perl -w

use strict;

my $text = do {
  open my $T, '<./kjv10.txt' or die "Couldn't open kjv10.txt: $!\n";
  local $/;
  <$T>;
};

my %unique;

$text =~ s{(
             (\b\w+(?:['-]+\w+)*\b)
             (??{!$unique{$^N}++?"(?=)":"(?!)"})
           )
          }{
           $1
          }xg;

print "$_ => $unique{$_}\n" for sort keys %unique;

(It was pointed out that that's not a completely fair timing ... we
had to load the 4 M file into memory.)
--
Alan



More information about the PLUG mailing list