manpagez: man pages & more
man MCE::Grep(3)
Home | html | info | man
MCE::Grep(3)          User Contributed Perl Documentation         MCE::Grep(3)




NAME

       MCE::Grep - Parallel grep model similar to the native grep function


VERSION

       This document describes MCE::Grep version 1.846


SYNOPSIS

        ## Exports mce_grep, mce_grep_f, and mce_grep_s
        use MCE::Grep;

        ## Array or array_ref
        my @a = mce_grep { $_ % 5 == 0 } 1..10000;
        my @b = mce_grep { $_ % 5 == 0 } \@list;

        ## Important; pass an array_ref for deeply input data
        my @c = mce_grep { $_->[1] % 2 == 0 } [ [ 0, 1 ], [ 0, 2 ], ... ];
        my @d = mce_grep { $_->[1] % 2 == 0 } \@deeply_list;

        ## File path, glob ref, IO::All::{ File, Pipe, STDIO } obj, or scalar ref
        ## Workers read directly and not involve the manager process
        my @e = mce_grep_f { /pattern/ } "/path/to/file"; # efficient

        ## Involves the manager process, therefore slower
        my @f = mce_grep_f { /pattern/ } $file_handle;
        my @g = mce_grep_f { /pattern/ } $io;
        my @h = mce_grep_f { /pattern/ } \$scalar;

        ## Sequence of numbers (begin, end [, step, format])
        my @i = mce_grep_s { %_ * 3 == 0 } 1, 10000, 5;
        my @j = mce_grep_s { %_ * 3 == 0 } [ 1, 10000, 5 ];

        my @k = mce_grep_s { %_ * 3 == 0 } {
           begin => 1, end => 10000, step => 5, format => undef
        };


DESCRIPTION

       This module provides a parallel grep implementation via Many-Core
       Engine.  MCE incurs a small overhead due to passing of data. A fast
       code block will run faster natively. However, the overhead will likely
       diminish as the complexity increases for the code.

        my @m1 =     grep { $_ % 5 == 0 } 1..1000000;          ## 0.065 secs
        my @m2 = mce_grep { $_ % 5 == 0 } 1..1000000;          ## 0.194 secs

       Chunking, enabled by default, greatly reduces the overhead behind the
       scene.  The time for mce_grep below also includes the time for data
       exchanges between the manager and worker processes. More
       parallelization will be seen when the code incurs additional CPU time.

        my @m1 =     grep { /[2357][1468][9]/ } 1..1000000;    ## 0.353 secs
        my @m2 = mce_grep { /[2357][1468][9]/ } 1..1000000;    ## 0.218 secs

       Even faster is mce_grep_s; useful when input data is a range of
       numbers.  Workers generate sequences mathematically among themselves
       without any interaction from the manager process. Two arguments are
       required for mce_grep_s (begin, end). Step defaults to 1 if begin is
       smaller than end, otherwise -1.

        my @m3 = mce_grep_s { /[2357][1468][9]/ } 1, 1000000;  ## 0.165 secs

       Although this document is about MCE::Grep, the MCE::Stream module can
       write results immediately without waiting for all chunks to complete.
       This is made possible by passing the reference to an array (in this
       case @m4 and @m5).

        use MCE::Stream default_mode => 'grep';

        my @m4; mce_stream \@m4, sub { /[2357][1468][9]/ }, 1..1000000;

           ## Completed in 0.203 secs. This is amazing considering the
           ## overhead for passing data between the manager and workers.

        my @m5; mce_stream_s \@m5, sub { /[2357][1468][9]/ }, 1, 1000000;

           ## Completed in 0.120 secs. Like with mce_grep_s, specifying a
           ## sequence specification turns out to be faster due to lesser
           ## overhead for the manager process.

       A common scenario is grepping for pattern(s) inside a massive log file.
       Notice how parallelism increases as complexity increases for the
       pattern.  Testing was done against a 300 MB file containing 250k lines.

        use MCE::Grep;

        my @m; open my $LOG, "<", "/path/to/log/file" or die "$!\n";

        @m = grep { /pattern/ } <$LOG>;                      ##  0.756 secs
        @m = grep { /foobar|[2357][1468][9]/ } <$LOG>;       ## 24.681 secs

        ## Parallelism with mce_grep. This involves the manager process
        ## due to processing a file handle.

        @m = mce_grep { /pattern/ } <$LOG>;                  ##  0.997 secs
        @m = mce_grep { /foobar|[2357][1468][9]/ } <$LOG>;   ##  7.439 secs

        ## Even faster with mce_grep_f. Workers access the file directly
        ## with zero interaction from the manager process.

        my $LOG = "/path/to/file";
        @m = mce_grep_f { /pattern/ } $LOG;                  ##  0.112 secs
        @m = mce_grep_f { /foobar|[2357][1468][9]/ } $LOG;   ##  6.840 secs


PARSING HUGE FILES

       The MCE::Grep module lacks an optimization for quickly determining if a
       match is found from not knowing the pattern inside the code block. Use
       the following snippet as a template to achieve better performance.
       Also, take a look at examples/egrep.pl, included with the distribution.

        use MCE::Loop;

        MCE::Loop::init {
           max_workers => 8, use_slurpio => 1
        };

        my $pattern  = 'karl';
        my $hugefile = 'very_huge.file';

        my @result = mce_loop_f {
           my ($mce, $slurp_ref, $chunk_id) = @_;

           ## Quickly determine if a match is found.
           ## Process slurped chunk only if true.

           if ($$slurp_ref =~ /$pattern/m) {
              my @matches;

              ## The following is fast on Unix. Performance degrades
              ## drastically on Windows beyond 4 workers.

              open my $MEM_FH, '<', $slurp_ref;
              binmode $MEM_FH, ':raw';
              while (<$MEM_FH>) { push @matches, $_ if (/$pattern/); }
              close   $MEM_FH;

              ## Therefore, use the following construct on Windows.

              while ( $$slurp_ref =~ /([^\n]+\n)/mg ) {
                 my $line = $1; # save $1 to not lose the value
                 push @matches, $line if ($line =~ /$pattern/);
              }

              ## Gather matched lines.

              MCE->gather(@matches);
           }

        } $hugefile;

        print join('', @result);


OVERRIDING DEFAULTS

       The following list options which may be overridden when loading the
       module.

        use Sereal qw( encode_sereal decode_sereal );
        use CBOR::XS qw( encode_cbor decode_cbor );
        use JSON::XS qw( encode_json decode_json );

        use MCE::Grep
            max_workers => 4,                # Default 'auto'
            chunk_size => 100,               # Default 'auto'
            tmp_dir => "/path/to/app/tmp",   # $MCE::Signal::tmp_dir
            freeze => \&encode_sereal,       # \&Storable::freeze
            thaw => \&decode_sereal          # \&Storable::thaw
        ;

       From MCE 1.8 onwards, Sereal 3.015+ is loaded automatically if
       available.  Specify "Sereal => 0" to use Storable instead.

        use MCE::Grep Sereal => 0;


CUSTOMIZING MCE

       MCE::Grep->init ( options )
       MCE::Grep::init { options }

       The init function accepts a hash of MCE options. The gather option, if
       specified, is ignored due to being used internally by the module.

        use MCE::Grep;

        MCE::Grep::init {
           chunk_size => 1, max_workers => 4,

           user_begin => sub {
              print "## ", MCE->wid, " started\n";
           },

           user_end => sub {
              print "## ", MCE->wid, " completed\n";
           }
        };

        my @a = mce_grep { $_ % 5 == 0 } 1..100;

        print "\n", "@a", "\n";

        -- Output

        ## 2 started
        ## 3 started
        ## 1 started
        ## 4 started
        ## 3 completed
        ## 4 completed
        ## 1 completed
        ## 2 completed

        5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100


API DOCUMENTATION

       MCE::Grep->run ( sub { code }, list )
       mce_grep { code } list

       Input data may be defined using a list or an array reference. Unlike
       MCE::Loop, Flow, and Step, specifying a hash reference as input data
       isn't allowed.

        ## Array or array_ref
        my @a = mce_grep { /[2357]/ } 1..1000;
        my @b = mce_grep { /[2357]/ } \@list;

        ## Important; pass an array_ref for deeply input data
        my @c = mce_grep { $_->[1] =~ /[2357]/ } [ [ 0, 1 ], [ 0, 2 ], ... ];
        my @d = mce_grep { $_->[1] =~ /[2357]/ } \@deeply_list;

        ## Not supported
        my @z = mce_grep { ... } \%hash;

       MCE::Grep->run_file ( sub { code }, file )
       mce_grep_f { code } file

       The fastest of these is the /path/to/file. Workers communicate the next
       offset position among themselves with zero interaction by the manager
       process.

       "IO::All" { File, Pipe, STDIO } is supported since MCE 1.845.

        my @c = mce_grep_f { /pattern/ } "/path/to/file";  # faster
        my @d = mce_grep_f { /pattern/ } $file_handle;
        my @e = mce_grep_f { /pattern/ } $io;              # IO::All
        my @f = mce_grep_f { /pattern/ } \$scalar;

       MCE::Grep->run_seq ( sub { code }, $beg, $end [, $step, $fmt ] )
       mce_grep_s { code } $beg, $end [, $step, $fmt ]

       Sequence may be defined as a list, an array reference, or a hash
       reference.  The functions require both begin and end values to run.
       Step and format are optional. The format is passed to sprintf (% may be
       omitted below).

        my ($beg, $end, $step, $fmt) = (10, 20, 0.1, "%4.1f");

        my @f = mce_grep_s { /[1234]\.[5678]/ } $beg, $end, $step, $fmt;
        my @g = mce_grep_s { /[1234]\.[5678]/ } [ $beg, $end, $step, $fmt ];

        my @h = mce_grep_s { /[1234]\.[5678]/ } {
           begin => $beg, end => $end,
           step => $step, format => $fmt
        };

       MCE::Grep->run ( sub { code }, iterator )
       mce_grep { code } iterator

       An iterator reference may be specified for input_data. Iterators are
       described under section "SYNTAX for INPUT_DATA" at MCE::Core.

        my @a = mce_grep { $_ % 3 == 0 } make_iterator(10, 30, 2);


MANUAL SHUTDOWN

       MCE::Grep->finish
       MCE::Grep::finish

       Workers remain persistent as much as possible after running. Shutdown
       occurs automatically when the script terminates. Call finish when
       workers are no longer needed.

        use MCE::Grep;

        MCE::Grep::init {
           chunk_size => 20, max_workers => 'auto'
        };

        my @a = mce_grep { ... } 1..100;

        MCE::Grep::finish;


INDEX

       MCE(3), MCE::Core(3)


AUTHOR

       Mario E. Roy, <marioeroyA ATA gmailA DOTA com>



perl v5.28.2                      2019-08-27                      MCE::Grep(3)

mce 1.846.0 - Generated Wed Aug 28 18:08:48 CDT 2019
© manpagez.com 2000-2025
Individual documents may contain additional copyright information.