manpagez: man pages & more
man Perl::Critic::Policy::RegularExpressions::ProhibitComplexRegexes(3)
Home | html | info | man
Perl::Critic::Policy::RegularExpressions::ProhibitComplexRegexes(3)



NAME

       Perl::Critic::Policy::RegularExpressions::ProhibitComplexRegexes -
       Split long regexps into smaller "qr//" chunks.


AFFILIATION

       This Policy is part of the core Perl::Critic distribution.


DESCRIPTION

       Big regexps are hard to read, perhaps even the hardest part of Perl.  A
       good practice to write digestible chunks of regexp and put them
       together.  This policy flags any regexp that is longer than "N"
       characters, where "N" is a configurable value that defaults to 60.  If
       the regexp uses the "x" flag, then the length is computed after parsing
       out any comments or whitespace.

       Unfortunately the use of descriptive (and therefore longish) variable
       names can cause regexps to be in violation of this policy, so
       interpolated variables are counted as 4 characters no matter how long
       their names actually are.


CASE STUDY

       As an example, look at the regexp used to match email addresses in
       Email::Valid::Loose (tweaked lightly to wrap for POD)

           (?x-ism:(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\]
           \000-\037\x80-\xff])|"[^\\\x80-\xff\n\015"]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015
           "]*)*")(?:(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[
           \]\000-\037\x80-\xff])|"[^\\\x80-\xff\n\015"]*(?:\\[^\x80-\xff][^\\\x80-\xff\n
           \015"]*)*")|\.)*\@(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,
           ;:".\\\[\]\000-\037\x80-\xff])|\[(?:[^\\\x80-\xff\n\015\[\]]|\\[^\x80-\xff])*\]
           )(?:\.(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\000
           -\037\x80-\xff])|\[(?:[^\\\x80-\xff\n\015\[\]]|\\[^\x80-\xff])*\]))*)

       which is constructed from the following code:

           my $esc         = '\\\\';
           my $period      = '\.';
           my $space       = '\040';
           my $open_br     = '\[';
           my $close_br    = '\]';
           my $nonASCII    = '\x80-\xff';
           my $ctrl        = '\000-\037';
           my $cr_list     = '\n\015';
           my $qtext       = qq/[^$esc$nonASCII$cr_list\"]/; # "
           my $dtext       = qq/[^$esc$nonASCII$cr_list$open_br$close_br]/;
           my $quoted_pair = qq<$esc>.qq<[^$nonASCII]>;
           my $atom_char   = qq/[^($space)<>\@,;:\".$esc$open_br$close_br$ctrl$nonASCII]/;# "
           my $atom        = qq<$atom_char+(?!$atom_char)>;
           my $quoted_str  = qq<\"$qtext*(?:$quoted_pair$qtext*)*\">; # "
           my $word        = qq<(?:$atom|$quoted_str)>;
           my $domain_ref  = $atom;
           my $domain_lit  = qq<$open_br(?:$dtext|$quoted_pair)*$close_br>;
           my $sub_domain  = qq<(?:$domain_ref|$domain_lit)>;
           my $domain      = qq<$sub_domain(?:$period$sub_domain)*>;
           my $local_part  = qq<$word(?:$word|$period)*>; # This part is modified
           $Addr_spec_re   = qr<$local_part\@$domain>;

       If you read the code from bottom to top, it is quite readable.  And,
       you can even see the one violation of RFC822 that Tatsuhiko Miyagawa
       deliberately put into Email::Valid::Loose to allow periods.  Look for
       the "|\." in the upper regexp to see that same deviation.

       One could certainly argue that the top regexp could be re-written more
       legibly with "m//x" and comments.  But the bottom version is self-
       documenting and, for example, doesn't repeat "\x80-\xff" 18 times.
       Furthermore, it's much easier to compare the second version against the
       source BNF grammar in RFC 822 to judge whether the implementation is
       sound even before running tests.


CONFIGURATION

       This policy allows regexps up to "N" characters long, where "N"
       defaults to 60.  You can override this to set it to a different number
       with the "max_characters" setting.  To do this, put entries in a
       .perlcriticrc file like this:

           [RegularExpressions::ProhibitComplexRegexes]
           max_characters = 40


CREDITS

       Initial development of this policy was supported by a grant from the
       Perl Foundation.


AUTHOR

       Chris Dolan <cdolan@cpan.org>


COPYRIGHT

       Copyright (c) 2007-2011 Chris Dolan.  Many rights reserved.

       This program is free software; you can redistribute it and/or modify it
       under the same terms as Perl itself.  The full text of this license can
       be found in the LICENSE file included with this module




erl v5.28.Perl::Critic::Policy::RegularExpressions::ProhibitComplexRegexes(3)

perl-critic 1.134.0 - Generated Tue Jun 4 14:11:52 CDT 2019
© manpagez.com 2000-2025
Individual documents may contain additional copyright information.