manpagez: man pages & more
info gawk
Home | html | info | man

File: gawk.info,  Node: Simple Sed,  Next: Igawk Program,  Prev: Extract Program,  Up: Miscellaneous Programs

11.3.8 A Simple Stream Editor
-----------------------------

The 'sed' utility is a "stream editor", a program that reads a stream of
data, makes changes to it, and passes it on.  It is often used to make
global changes to a large file or to a stream of data generated by a
pipeline of commands.  Although 'sed' is a complicated program in its
own right, its most common use is to perform global substitutions in the
middle of a pipeline:

     COMMAND1 < orig.data | sed 's/old/new/g' | COMMAND2 > result

   Here, 's/old/new/g' tells 'sed' to look for the regexp 'old' on each
input line and globally replace it with the text 'new' (i.e., all the
occurrences on a line).  This is similar to 'awk''s 'gsub()' function
(*note String Functions::).

   The following program, 'awksed.awk', accepts at least two
command-line arguments: the pattern to look for and the text to replace
it with.  Any additional arguments are treated as data file names to
process.  If none are provided, the standard input is used:

     # awksed.awk --- do s/foo/bar/g using just print
     #    Thanks to Michael Brennan for the idea

     function usage()
     {
         print "usage: awksed pat repl [files...]" > "/dev/stderr"
         exit 1
     }

     BEGIN {
         # validate arguments
         if (ARGC < 3)
             usage()

         RS = ARGV[1]
         ORS = ARGV[2]

         # don't use arguments as files
         ARGV[1] = ARGV[2] = ""
     }

     # look ma, no hands!
     {
         if (RT == "")
             printf "%s", $0
         else
             print
     }

   The program relies on 'gawk''s ability to have 'RS' be a regexp, as
well as on the setting of 'RT' to the actual text that terminates the
record (*note Records::).

   The idea is to have 'RS' be the pattern to look for.  'gawk'
automatically sets '$0' to the text between matches of the pattern.
This is text that we want to keep, unmodified.  Then, by setting 'ORS'
to the replacement text, a simple 'print' statement outputs the text we
want to keep, followed by the replacement text.

   There is one wrinkle to this scheme, which is what to do if the last
record doesn't end with text that matches 'RS'.  Using a 'print'
statement unconditionally prints the replacement text, which is not
correct.  However, if the file did not end in text that matches 'RS',
'RT' is set to the null string.  In this case, we can print '$0' using
'printf' (*note Printf::).

   The 'BEGIN' rule handles the setup, checking for the right number of
arguments and calling 'usage()' if there is a problem.  Then it sets
'RS' and 'ORS' from the command-line arguments and sets 'ARGV[1]' and
'ARGV[2]' to the null string, so that they are not treated as file names
(*note ARGC and ARGV::).

   The 'usage()' function prints an error message and exits.  Finally,
the single rule handles the printing scheme outlined earlier, using
'print' or 'printf' as appropriate, depending upon the value of 'RT'.

© manpagez.com 2000-2025
Individual documents may contain additional copyright information.