Interesting little regex

justin gedge jgedge at amis.com
Tue Feb 28 11:51:28 MST 2006


Jayce^ wrote:

> actually though, for any who don't know:
> In perl regex, there are flags at the end (to be fixed, perl6 has them 
> rightly in the front) that modify your engine.  One of these is the 
> underutilized x modifier.  When added, it makes whitespace 
> insignificant in your regex, so you can nicely tab/space/newline in 
> order to actually be readable...  Oh, and you can/should add comments 
> also. so:
>
> s/(foo|bar)//;
>
> can be
> s/
>     (        # either
>         foo
>       |        # or...
>         bar
>     )
>  //x;
>
>
> very simplified, but the basic point is there.  Make things readable, 
> comment, etc....
>
> -- 
> Jayce^
>
I strongly recommend using this feature to make your REGEX's easier to 
decypher when you get back to them a few months later.  I also advise 
that you get the REGEX to work first, and then add the comments.  I 
wrote a fairly lengthy REGEX to parse a Synopsys report and found that 
some of the /special/ characters in my comments were still affecting the 
interpreter.  Wish I could remember which chars gave trouble-- but be 
aware that NOT everything is treated like a comment.  Still a great 
feature-- just incrementally add the comments to make sure it still 
works for you.  Code from the REGEX looks like:

[report to parse was read into $content var]

  while($content =~ m/                       # use regex match as key 
for while loop
      Startpoint:\ (\S+)                     # $1 will be startpoint
        \s+\((.+)\)\n                        # $2 will be desc of startpoint
        \s\sEndpoint:\ (\S+)                 # $3 will be endpoint
        \s+\((.+)\)\n                        # $4 will be desc of endpoint
        \s\sPath\ Group:\ (\S+)\n            # $5 will be pathgroup
        \s\sPath\ Type:\ (\S+)\n             # $6 will be pathtype
        [\s\S]+?                             #
        slack\ \(VIOLATED\)\s+(-\d+.\d\d)\n  # $7 will be slack
        /gx ) {                              #

    $startpoint = $1;
    $start_desc = $2;
    $endpoint   = $3;
    $end_desc   = $4;
    $pathgroup  = $5;
    $pathtype   = $6;
    $slack      = $7;

    ...process data etc...}

probably the /most/ painful REGEX I've written yet, but it was part of a 
script that allowed me to parse through 100+ reports and condense the 
info down to something readable.


Justin Gedge




More information about the PLUG mailing list