Interesting little regex

Alan Young alansyoungiii at gmail.com
Fri Feb 24 11:46:37 MST 2006


> no wonder it took so long. you matched the null string between each pair
> of word boundaries. you need a +, not * there.

Thanks.

> i understand the boolean thing as i said previously. i was asking why
> you used it there. i see no reason if all you are doing is word
> counting.

Yeah, that's what I said.  We realized we didn't need it for the
unique words.  What we were doing originally though was pulling the
unique occurrences out of a string of text:

$a = 'abcde' x 200;

What are the unique occurrences of text in that string?  That's what
the regex was solving.

The original purpose of the regex is still valid, just what I did with
it is wrong.

>         $unique{$1}++ while $text =~ m/([\w'-]+)/g ;
>
> use the benchmark module to compare the speeds. make sure you don't do
> destructive parsing which some of your examples seem to to.

#!/usr/bin/perl -w

use strict;

use File::Slurp;
use Benchmark qw( cmpthese );

my $text = read_file( './kjv10.txt' );

my %unique;

sub substitution { $text =~ s{(([\w'-]+)(?{$unique{$^N}++}))}{$1}g ;
%unique = () }

sub while_loop1 { 1 while $text =~ m{(([\w'-]+)(?{!$unique{$^N}++}))}g
; %unique = () }

sub while_loop2 { $unique{$1}++ while $text =~ m/([\w'-]+)/g ; %unique = () }

cmpthese( -60, {
  'substitution' => \&substitution,
  'while loop 1' => \&while_loop1,
  'while loop 2' => \&while_loop2,
});

             s/iter substitution while loop 1 while loop 2
substitution   2.97           --         -33%         -61%
while loop 1   2.00          49%           --         -42%
while loop 2   1.15         159%          73%           --

--
Alan



More information about the PLUG mailing list