manpagez: man pages & more
info autoconf
Home | html | info | man

File: autoconf.info,  Node: Limitations of Usual Tools,  Prev: Limitations of Builtins,  Up: Portable Shell

11.15 Limitations of Usual Tools
================================

The small set of tools you can expect to find on any machine can still
include some limitations you should be aware of.

‘awk’
     Don't leave white space before the opening parenthesis in a user
     function call.  Posix does not allow this and GNU Awk rejects it:

          $ gawk 'function die () { print "Aaaaarg!"  }
                  BEGIN { die () }'
          gawk: cmd. line:2:         BEGIN { die () }
          gawk: cmd. line:2:                      ^ parse error
          $ gawk 'function die () { print "Aaaaarg!"  }
                  BEGIN { die() }'
          Aaaaarg!

     Posix says that if a program contains only ‘BEGIN’ actions, and
     contains no instances of ‘getline’, then the program merely
     executes the actions without reading input.  However, traditional
     Awk implementations (such as Solaris 10 ‘awk’) read and discard
     input in this case.  Portable scripts can redirect input from
     ‘/dev/null’ to work around the problem.  For example:

          awk 'BEGIN {print "hello world"}'  printf "foo\n|foo\n" | $EGREP '^(|foo|bar)$'
          |foo
          > printf "bar\nbar|\n" | $EGREP '^(foo|bar|)$'
          bar|
          > printf "foo\nfoo|\n|bar\nbar\n" | $EGREP '^(foo||bar)$'
          foo
          |bar

     For more information about what can appear in portable extended
     regular expressions, *note (grep)Problematic Expressions::.

     ‘$EGREP’ also suffers the limitations of ‘grep’ (*note Limitations
     of Usual Tools: grep.).

‘expr’
     Not all implementations obey the Posix rule that ‘--’ separates
     options from arguments; likewise, not all implementations provide
     the extension to Posix that the first argument can be treated as
     part of a valid expression rather than an invalid option if it
     begins with ‘-’.  When performing arithmetic, use ‘expr 0 + $var’
     if ‘$var’ might be a negative number, to keep ‘expr’ from
     interpreting it as an option.

     No ‘expr’ keyword starts with ‘X’, so use ‘expr X"WORD" : 'XREGEX'’
     to keep ‘expr’ from misinterpreting WORD.

     Don't use ‘length’, ‘substr’, ‘match’ and ‘index’.

‘expr’ (‘|’)
     You can use ‘|’.  Although Posix does require that ‘expr ''’ return
     the empty string, it does not specify the result when you ‘|’
     together the empty string (or zero) with the empty string.  For
     example:

          expr '' \| ''

     Posix 1003.2-1992 returns the empty string for this case, but
     traditional Unix returns ‘0’ (Solaris is one such example).  In
     Posix 1003.1-2001, the specification was changed to match
     traditional Unix's behavior (which is bizarre, but it's too late to
     fix this).  Please note that the same problem does arise when the
     empty string results from a computation, as in:

          expr bar : foo \| foo : bar

     Avoid this portability problem by avoiding the empty string.

‘expr’ (‘:’)
     Portable ‘expr’ regular expressions should use ‘\’ to escape only
     characters in the string ‘$()*.123456789[\^{}’.  For example,
     alternation, ‘\|’, is common but Posix does not require its
     support, so it should be avoided in portable scripts.  Similarly,
     ‘\+’ and ‘\?’ should be avoided.

     Portable ‘expr’ regular expressions should not begin with ‘^’.
     Patterns are automatically anchored so leading ‘^’ is not needed
     anyway.

     On the other hand, the behavior of the ‘$’ anchor is not portable
     on multi-line strings.  Posix is ambiguous whether the anchor
     applies to each line, as was done in older versions of the GNU Core
     Utilities, or whether it applies only to the end of the overall
     string, as in Coreutils 6.0 and most other implementations.

          $ baz='foo
          > bar'
          $ expr "X$baz" : 'X\(foo\)$'

          $ expr-5.97 "X$baz" : 'X\(foo\)$'
          foo

     The Posix standard is ambiguous as to whether ‘expr 'a' : '\(b\)'’
     outputs ‘0’ or the empty string.  In practice, it outputs the empty
     string on most platforms, but portable scripts should not assume
     this.  For instance, the QNX 4.25 native ‘expr’ returns ‘0’.

     One might think that a way to get a uniform behavior would be to
     use the empty string as a default value:

          expr a : '\(b\)' \| ''

     Unfortunately this behaves exactly as the original expression; see
     the ‘expr’ (‘|’) entry for more information.

     Some ancient ‘expr’ implementations (e.g., Solaris 10
     ‘/usr/ucb/expr’) have a silly length limit that causes ‘expr’ to
     fail if the matched substring is longer than 120 bytes.  In this
     case, you might want to fall back on ‘echo|sed’ if ‘expr’ fails.
     Nowadays this is of practical importance only for the rare
     installer who mistakenly puts ‘/usr/ucb’ before ‘/usr/bin’ in
     ‘PATH’ on Solaris 10.

     On Mac OS X 10.4, ‘expr’ mishandles the pattern ‘[^-]’ in some
     cases.  For example, the command
          expr Xpowerpc-apple-darwin8.1.0 : 'X[^-]*-[^-]*-\(.*\)'

     outputs ‘apple-darwin8.1.0’ rather than the correct ‘darwin8.1.0’.
     This particular case can be worked around by substituting ‘[^--]’
     for ‘[^-]’.

     Don't leave, there is some more!

     The QNX 4.25 ‘expr’, in addition of preferring ‘0’ to the empty
     string, has a funny behavior in its exit status: it's always 1 when
     parentheses are used!

          $ val=`expr 'a' : 'a'`; echo "$?: $val"
          0: 1
          $ val=`expr 'a' : 'b'`; echo "$?: $val"
          1: 0

          $ val=`expr 'a' : '\(a\)'`; echo "?: $val"
          1: a
          $ val=`expr 'a' : '\(b\)'`; echo "?: $val"
          1: 0

     In practice this can be a big problem if you are ready to catch
     failures of ‘expr’ programs with some other method (such as using
     ‘sed’), since you may get twice the result.  For instance

          $ expr 'a' : '\(a\)' || echo 'a' | sed 's/^\(a\)$/\1/'

     outputs ‘a’ on most hosts, but ‘aa’ on QNX 4.25.  A simple
     workaround consists of testing ‘expr’ and using a variable set to
     ‘expr’ or to ‘false’ according to the result.

     Tru64 ‘expr’ incorrectly treats the result as a number, if it can
     be interpreted that way:

          $ expr 00001 : '.*\(...\)'
          1

     On HP-UX 11, ‘expr’ only supports a single sub-expression.

          $ expr 'Xfoo' : 'X\(f\(oo\)*\)$'
          expr: More than one '\(' was used.

‘fgrep’
     Although Posix stopped requiring ‘fgrep’ in 2001, a few traditional
     hosts (notably Solaris 10) do not support the Posix replacement
     ‘grep -F’.  Also, some traditional implementations do not work on
     long input lines.  To work around these problems, invoke
     ‘AC_PROG_FGREP’ and then use ‘$FGREP’.

     Tru64/OSF 5.1 ‘fgrep’ does not match an empty pattern.

‘find’
     Many operands of GNU ‘find’ are not standardized by Posix and are
     missing on many platforms.  These nonportable operands include
     ‘-follow’, ‘-maxdepth’, ‘-mindepth’, ‘-printf’, and ‘,’.  See the
     Posix spec for ‘find’
     (https://pubs.opengroup.org/onlinepubs/9699919799/utilities/find.html)
     for ‘find’ operands that should be portable nowadays.

     The replacement of ‘{}’ is guaranteed only if the argument is
     exactly _{}_, not if it's only a part of an argument.  For
     instance, on HP-UX 11:

          $ touch foo
          $ find . -name foo -exec echo "{}-{}" \;
          {}-{}

     while GNU ‘find’ reports ‘./foo-./foo’.  Posix allows either
     behavior.

‘grep’
     Portable scripts can rely on the ‘grep’ options ‘-c’, ‘-l’, ‘-n’,
     and ‘-v’, but should avoid other options.  For example, don't use
     ‘-w’, as Posix does not require it and Irix 6.5.16m's ‘grep’ does
     not support it.  Also, portable scripts should not combine ‘-c’
     with ‘-l’, as Posix does not allow this.

     Some of the options required by Posix are not portable in practice.
     Don't use ‘grep -q’ to suppress output, because traditional ‘grep’
     implementations (e.g., Solaris 10) do not support ‘-q’.  Don't use
     ‘grep -s’ to suppress output either, because Posix says ‘-s’ does
     not suppress output, only some error messages; also, the ‘-s’
     option of traditional ‘grep’ behaved like ‘-q’ does in most modern
     implementations.  Instead, redirect the standard output and
     standard error (in case the file doesn't exist) of ‘grep’ to
     ‘/dev/null’.  Check the exit status of ‘grep’ to determine whether
     it found a match.

     The QNX4 implementation fails to count lines with ‘grep -c '$'’,
     but works with ‘grep -c '^'’.  Other alternatives for counting
     lines are to use ‘sed -n '$='’ or ‘wc -l’.

     Some traditional ‘grep’ implementations do not work on long input
     lines.  On AIX the default ‘grep’ silently truncates long lines on
     the input before matching.

     Also, traditional implementations do not support multiple regexps
     with ‘-e’: they either reject ‘-e’ entirely (e.g., Solaris 10) or
     honor only the last pattern (e.g., IRIX 6.5 and NeXT). To work
     around these problems, invoke ‘AC_PROG_GREP’ and then use ‘$GREP’.

     Another possible workaround for the multiple ‘-e’ problem is to
     separate the patterns by newlines, for example:

          grep 'foo
          bar' in.txt

     except that this fails with traditional ‘grep’ implementations and
     with OpenBSD 3.8 ‘grep’.

     Traditional ‘grep’ implementations (e.g., Solaris 10) do not
     support the ‘-E’ or ‘-F’ options.  To work around these problems,
     invoke ‘AC_PROG_EGREP’ and then use ‘$EGREP’, and similarly for
     ‘AC_PROG_FGREP’ and ‘$FGREP’.  Even if you are willing to require
     support for Posix ‘grep’, your script should not use both ‘-E’ and
     ‘-F’, since Posix does not allow this combination.

     Portable ‘grep’ regular expressions should use ‘\’ only to escape
     characters in the string ‘$()*.123456789[\^{}’.  For example,
     alternation, ‘\|’, is common but Posix does not require its support
     in basic regular expressions, so it should be avoided in portable
     scripts.  Solaris and HP-UX ‘grep’ do not support it.  Similarly,
     the following escape sequences should also be avoided: ‘\<’, ‘\>’,
     ‘\+’, ‘\?’, ‘\`’, ‘\'’, ‘\B’, ‘\b’, ‘\S’, ‘\s’, ‘\W’, and ‘\w’.
     For more information about what can appear in portable regular
     expressions, *note (grep)Problematic Expressions::.

     Posix does not specify the behavior of ‘grep’ on binary files.  An
     example where this matters is using BSD ‘grep’ to search text that
     includes embedded ANSI escape sequences for colored output to
     terminals (‘\033[m’ is the sequence to restore normal output); the
     behavior depends on whether input is seekable:

          $ printf 'esc\033[mape\n' > sample
          $ grep . sample
          Binary file sample matches
          $ cat sample | grep .
          escape

‘join’
     On NetBSD, ‘join -a 1 file1 file2’ mistakenly behaves like ‘join -a
     1 -a 2 1 file1 file2’, resulting in a usage warning; the workaround
     is to use ‘join -a1 file1 file2’ instead.

     On platforms with the BusyBox tools, the ‘join’ command is entirely
     missing.  As a workaround, you can simulate special cases of the
     ‘join’ command using an ‘awk’ script.  For an example, see
     .

‘ln’
     The ‘-f’ option is portable nowadays.

     Symbolic links are not available on some systems; use ‘$(LN_S)’ as
     a portable substitute.

     For versions of the DJGPP before 2.04, ‘ln’ emulates symbolic links
     to executables by generating a stub that in turn calls the real
     program.  This feature also works with nonexistent files like in
     the Posix spec.  So ‘ln -s file link’ generates ‘link.exe’, which
     attempts to call ‘file.exe’ if run.  But this feature only works
     for executables, so ‘cp -p’ is used instead for these systems.
     DJGPP versions 2.04 and later have full support for symbolic links.

‘ls’
     The portable options are ‘-acdilrtu’.  Current practice is for ‘-l’
     to output both owner and group, even though ancient versions of
     ‘ls’ omitted the group.

     On ancient hosts, ‘ls foo’ sent the diagnostic ‘foo not found’ to
     standard output if ‘foo’ did not exist.  Hence a shell command like
     ‘sources=`ls *.c 2>/dev/null`’ did not always work, since it was
     equivalent to ‘sources='*.c not found'’ in the absence of ‘.c’
     files.  This is no longer a practical problem, since current ‘ls’
     implementations send diagnostics to standard error.

     The behavior of ‘ls’ on a directory that is being concurrently
     modified is not always predictable, because of a data race where
     cached information returned by ‘readdir’ does not match the current
     directory state.  In fact, MacOS 10.5 has an intermittent bug where
     ‘readdir’, and thus ‘ls’, sometimes lists a file more than once if
     other files were added or removed from the directory immediately
     prior to the ‘ls’ call.  Since ‘ls’ already sorts its output, the
     duplicate entries can be avoided by piping the results through
     ‘uniq’.

‘mkdir’
     Combining the ‘-m’ and ‘-p’ options, as in ‘mkdir -m go-w -p DIR’,
     often leads to trouble.  FreeBSD ‘mkdir’ incorrectly attempts to
     change the permissions of DIR even if it already exists.  HP-UX
     11.23 and IRIX 6.5 ‘mkdir’ often assign the wrong permissions to
     any newly-created parents of DIR.

     Posix does not clearly specify whether ‘mkdir -p foo’ should
     succeed when ‘foo’ is a symbolic link to an already-existing
     directory.  The GNU ‘mkdir’ succeeds, but Solaris 10 ‘mkdir’ fails.

     Traditional ‘mkdir -p’ implementations suffer from race conditions.
     For example, if you invoke ‘mkdir -p a/b’ and ‘mkdir -p a/c’ at the
     same time, both processes might detect that ‘a’ is missing, one
     might create ‘a’, then the other might try to create ‘a’ and fail
     with a ‘File exists’ diagnostic.  Solaris 10 ‘mkdir’ is vulnerable,
     and other traditional Unix systems are probably vulnerable too.
     This possible race is harmful in parallel builds when several Make
     rules call ‘mkdir -p’ to construct directories.  You may use
     ‘install-sh -d’ as a safe replacement, for example by setting
     ‘MKDIR_P='/path/to/install-sh -d'’ in the environment of
     ‘configure’, assuming the package distributes ‘install-sh’.

‘mkfifo’
‘mknod’
     The GNU Coding Standards state that ‘mknod’ is safe to use on
     platforms where it has been tested to exist; but it is generally
     portable only for creating named FIFOs, since device numbers are
     platform-specific.  Autotest uses ‘mkfifo’ to implement parallel
     testsuites.  Posix states that behavior is unspecified when opening
     a named FIFO for both reading and writing; on at least Cygwin, this
     results in failure on any attempt to read or write to that file
     descriptor.

‘mktemp’
     Shell scripts can use temporary files safely with ‘mktemp’, but it
     does not exist on all systems.  A portable way to create a safe
     temporary file name is to create a temporary directory with mode
     700 and use a file inside this directory.  Both methods prevent
     attackers from gaining control, though ‘mktemp’ is far less likely
     to fail gratuitously under attack.

     Here is sample code to create a new temporary directory ‘$dir’
     safely:

          # Create a temporary directory $dir in $TMPDIR (default /tmp).
          # Use mktemp if possible; otherwise fall back on mkdir,
          # with $RANDOM to make collisions less likely.
          : "${TMPDIR:=/tmp}"
          {
            dir=`
              (umask 077 && mktemp -d "$TMPDIR/fooXXXXXX") 2>/dev/null
            ` &&
            test -d "$dir"
          } || {
            dir=$TMPDIR/foo$$-$RANDOM
            (umask 077 && mkdir "$dir")
          } || exit $?

‘mv’
     The only portable options are ‘-f’ and ‘-i’.

     Moving individual files between file systems is portable (it was in
     Unix version 6), but it is not always atomic: when doing ‘mv new
     existing’, there's a critical section where neither the old nor the
     new version of ‘existing’ actually exists.

     On some systems moving files from ‘/tmp’ can sometimes cause
     undesirable (but perfectly valid) warnings, even if you created
     these files.  This is because ‘/tmp’ belongs to a group that
     ordinary users are not members of, and files created in ‘/tmp’
     inherit the group of ‘/tmp’.  When the file is copied, ‘mv’ issues
     a diagnostic without failing:

          $ touch /tmp/foo
          $ mv /tmp/foo .
          error→mv: ./foo: set owner/group (was: 100/0): Operation not permitted
          $ echo $?
          0
          $ ls foo
          foo

     This annoying behavior conforms to Posix, unfortunately.

     Moving directories across mount points is not portable, use ‘cp’
     and ‘rm’.

     DOS variants cannot rename or remove open files, and do not support
     commands like ‘mv foo bar >foo’, even though this is perfectly
     portable among Posix hosts.

‘od’

     In MacOS X versions prior to 10.4.3, ‘od’ does not support the
     standard Posix options ‘-A’, ‘-j’, ‘-N’, or ‘-t’, or the XSI
     option, ‘-s’.  The only supported Posix option is ‘-v’, and the
     only supported XSI options are those in ‘-bcdox’.  The BSD
     ‘hexdump’ program can be used instead.

     In some versions of some operating systems derived from Solaris 11,
     ‘od’ prints decimal byte values padded with zeros rather than with
     spaces:

          $ printf '#!' | od -A n -t d1 -N 2
                   035 033

     instead of

          $ printf '#!' | od -A n -t d1 -N 2
                    35  33

     We have observed this on both OpenIndiana and OmniOS; Illumos may
     also be affected.  As a workaround, you can use octal output
     (option ‘-t o1’).

‘rm’
     The ‘-f’ and ‘-r’ options are portable.

     It is not portable to invoke ‘rm’ without options or operands.  On
     the other hand, Posix now requires ‘rm -f’ to silently succeed when
     there are no operands (useful for constructs like ‘rm -rf
     $filelist’ without first checking if ‘$filelist’ was empty).  But
     this was not always portable; at least NetBSD ‘rm’ built before
     2008 would fail with a diagnostic.

     A file might not be removed even if its parent directory is
     writable and searchable.  Many Posix hosts cannot remove a mount
     point, a named stream, a working directory, or a last link to a
     file that is being executed.

     DOS variants cannot rename or remove open files, and do not support
     commands like ‘rm foo >foo’, even though this is perfectly portable
     among Posix hosts.

‘rmdir’
     Just as with ‘rm’, some platforms refuse to remove a working
     directory.

‘sed’
     Patterns should not include the separator (unless escaped), even as
     part of a character class.  In conformance with Posix, the Cray
     ‘sed’ rejects ‘s/[^/]*$//’: use ‘s%[^/]*$%%’.  Even when escaped,
     patterns should not include separators that are also used as ‘sed’
     metacharacters.  For example, GNU sed 4.0.9 rejects ‘s,x\{1\,\},,’,
     while sed 4.1 strips the backslash before the comma before
     evaluating the basic regular expression.

     Avoid empty patterns within parentheses (i.e., ‘\(\)’).  Posix does
     not require support for empty patterns, and Unicos 9 ‘sed’ rejects
     them.

     Unicos 9 ‘sed’ loops endlessly on patterns like ‘.*\n.*’.

     Sed scripts should not use branch labels longer than 7 characters
     and should not contain comments; AIX 5.3 ‘sed’ rejects indented
     comments.  HP-UX sed has a limit of 99 commands (not counting ‘:’
     commands) and 48 labels, which cannot be circumvented by using more
     than one script file.  It can execute up to 19 reads with the ‘r’
     command per cycle.  Solaris ‘/usr/ucb/sed’ rejects usages that
     exceed a limit of about 6000 bytes for the internal representation
     of commands.

     Avoid redundant ‘;’, as some ‘sed’ implementations, such as NetBSD
     1.4.2's, incorrectly try to interpret the second ‘;’ as a command:

          $ echo a | sed 's/x/x/;;s/x/x/'
          sed: 1: "s/x/x/;;s/x/x/": invalid command code ;

     Some ‘sed’ implementations have a buffer limited to 4000 bytes, and
     this limits the size of input lines, output lines, and internal
     buffers that can be processed portably.  Likewise, not all ‘sed’
     implementations can handle embedded ‘NUL’ or a missing trailing
     newline.

     Remember that ranges within a bracket expression of a regular
     expression are only well-defined in the ‘C’ (or ‘POSIX’) locale.
     Meanwhile, support for character classes like ‘[[:upper:]]’ is not
     yet universal, so if you cannot guarantee the setting of ‘LC_ALL’,
     it is better to spell out a range ‘[ABCDEFGHIJKLMNOPQRSTUVWXYZ]’
     than to rely on ‘[A-Z]’.

     Additionally, Posix states that regular expressions are only
     well-defined on characters.  Unfortunately, there exist platforms
     such as MacOS X 10.5 where not all 8-bit byte values are valid
     characters, even though that platform has a single-byte ‘C’ locale.
     And Posix allows the existence of a multi-byte ‘C’ locale, although
     that does not yet appear to be a common implementation.  At any
     rate, it means that not all bytes will be matched by the regular
     expression ‘.’:

          $ printf '\200\n' | LC_ALL=C sed -n /./p | wc -l
          0
          $ printf '\200\n' | LC_ALL=en_US.ISO8859-1 sed -n /./p | wc -l
          1

     Portable ‘sed’ regular expressions should use ‘\’ only to escape
     characters in the string ‘$()*.123456789[\^n{}’.  For example,
     alternation, ‘\|’, is common but Posix does not require its
     support, so it should be avoided in portable scripts.  Solaris
     ‘sed’ does not support alternation; e.g., ‘sed '/a\|b/d'’ deletes
     only lines that contain the literal string ‘a|b’.  Similarly, ‘\+’
     and ‘\?’ should be avoided.

     Anchors (‘^’ and ‘$’) inside groups are not portable.

     Nested parentheses in patterns (e.g., ‘\(\(a*\)b*)\)’) are quite
     portable to current hosts, but was not supported by some ancient
     ‘sed’ implementations like SVR3.

     Some ‘sed’ implementations, e.g., Solaris, restrict the special
     role of the asterisk ‘*’ to one-character regular expressions and
     back-references, and the special role of interval expressions
     ‘\{M\}’, ‘\{M,\}’, or ‘\{M,N\}’ to one-character regular
     expressions.  This may lead to unexpected behavior:

          $ echo '1*23*4' | /usr/bin/sed 's/\(.\)*/x/g'
          x2x4
          $ echo '1*23*4' | /usr/xpg4/bin/sed 's/\(.\)*/x/g'
          x

     The ‘-e’ option is mostly portable.  However, its argument cannot
     start with ‘a’, ‘c’, or ‘i’, as this runs afoul of a Tru64 5.1 bug.
     Also, its argument cannot be empty, as this fails on AIX 5.3.  Some
     people prefer to use ‘-e’:

          sed -e 'COMMAND-1' \
              -e 'COMMAND-2'

     as opposed to the equivalent:

          sed '
            COMMAND-1
            COMMAND-2
          '

     The following usage is sometimes equivalent:

          sed 'COMMAND-1;COMMAND-2'

     but Posix says that this use of a semicolon has undefined effect if
     COMMAND-1's verb is ‘{’, ‘a’, ‘b’, ‘c’, ‘i’, ‘r’, ‘t’, ‘w’, ‘:’, or
     ‘#’, so you should use semicolon only with simple scripts that do
     not use these verbs.

     Posix up to the 2008 revision requires the argument of the ‘-e’
     option to be a syntactically complete script.  GNU ‘sed’ allows to
     pass multiple script fragments, each as argument of a separate ‘-e’
     option, that are then combined, with newlines between the
     fragments, and a future Posix revision may allow this as well.
     This approach is not portable with script fragments ending in
     backslash; for example, the ‘sed’ programs on Solaris 10, HP-UX 11,
     and AIX don't allow splitting in this case:

          $ echo a | sed -n -e 'i\
          0'
          0
          $ echo a | sed -n -e 'i\' -e 0
          Unrecognized command: 0

     In practice, however, this technique of joining fragments through
     ‘-e’ works for multiple ‘sed’ functions within ‘{’ and ‘}’, even if
     that is not specified by Posix:

          $ echo a | sed -n -e '/a/{' -e s/a/b/ -e p -e '}'
          b

     Commands inside { } brackets are further restricted.  Posix 2008
     says that they cannot be preceded by addresses, ‘!’, or ‘;’, and
     that each command must be followed immediately by a newline,
     without any intervening blanks or semicolons.  The closing bracket
     must be alone on a line, other than white space preceding or
     following it.  However, a future version of Posix may standardize
     the use of addresses within brackets.

     Contrary to yet another urban legend, you may portably use ‘&’ in
     the replacement part of the ‘s’ command to mean "what was matched".
     All descendants of Unix version 7 ‘sed’ (at least; we don't have
     first hand experience with older ‘sed’ implementations) have
     supported it.

     Posix requires that you must not have any white space between ‘!’
     and the following command.  It is OK to have blanks between the
     address and the ‘!’.  For instance, on Solaris:

          $ echo "foo" | sed -n '/bar/ ! p'
          error→Unrecognized command: /bar/ ! p
          $ echo "foo" | sed -n '/bar/! p'
          error→Unrecognized command: /bar/! p
          $ echo "foo" | sed -n '/bar/ !p'
          foo

     Posix also says that you should not combine ‘!’ and ‘;’.  If you
     use ‘!’, it is best to put it on a command that is delimited by
     newlines rather than ‘;’.

     Also note that Posix requires that the ‘b’, ‘t’, ‘r’, and ‘w’
     commands be followed by exactly one space before their argument.
     On the other hand, no white space is allowed between ‘:’ and the
     subsequent label name.

     If a sed script is specified on the command line and ends in an
     ‘a’, ‘c’, or ‘i’ command, the last line of inserted text should be
     followed by a newline.  Otherwise some ‘sed’ implementations (e.g.,
     OpenBSD 3.9) do not append a newline to the inserted text.

     Many ‘sed’ implementations (e.g., MacOS X 10.4, OpenBSD 3.9,
     Solaris 10 ‘/usr/ucb/sed’) strip leading white space from the text
     of ‘a’, ‘c’, and ‘i’ commands.  Prepend a backslash to work around
     this incompatibility with Posix:

          $ echo flushleft | sed 'a\
          >    indented
          > '
          flushleft
          indented
          $ echo foo | sed 'a\
          > \   indented
          > '
          flushleft
             indented

     Posix requires that with an empty regular expression, the last
     non-empty regular expression from either an address specification
     or substitution command is applied.  However, busybox 1.6.1
     complains when using a substitution command with a replacement
     containing a back-reference to an empty regular expression; the
     workaround is repeating the regular expression.

          $ echo abc | busybox sed '/a\(b\)c/ s//\1/'
          sed: No previous regexp.
          $ echo abc | busybox sed '/a\(b\)c/ s/a\(b\)c/\1/'
          b

     Portable scripts should be aware of the inconsistencies and options
     for handling word boundaries, as these are not specified by POSIX.

                          \<      \b      [[:<:]]
          Solaris 10      yes     no      no
          Solaris XPG4    yes     no      error
          NetBSD 5.1      no      no      yes
          FreeBSD 9.1     no      no      yes
          GNU             yes     yes     error
          busybox         yes     yes     error

‘sed’ (‘t’)
     Some old systems have ‘sed’ that "forget" to reset their ‘t’ flag
     when starting a new cycle.  For instance on MIPS RISC/OS, and on
     IRIX 5.3, if you run the following ‘sed’ script (the line numbers
     are not actual part of the texts):

          s/keep me/kept/g  # a
          t end             # b
          s/.*/deleted/g    # c
          :end              # d

     on

          delete me         # 1
          delete me         # 2
          keep me           # 3
          delete me         # 4

     you get

          deleted
          delete me
          kept
          deleted

     instead of

          deleted
          deleted
          kept
          deleted

     Why?  When processing line 1, (c) matches, therefore sets the ‘t’
     flag, and the output is produced.  When processing line 2, the ‘t’
     flag is still set (this is the bug).  Command (a) fails to match,
     but ‘sed’ is not supposed to clear the ‘t’ flag when a substitution
     fails.  Command (b) sees that the flag is set, therefore it clears
     it, and jumps to (d), hence you get ‘delete me’ instead of
     ‘deleted’.  When processing line (3), ‘t’ is clear, (a) matches, so
     the flag is set, hence (b) clears the flags and jumps.  Finally,
     since the flag is clear, line 4 is processed properly.

     There are two things one should remember about ‘t’ in ‘sed’.
     Firstly, always remember that ‘t’ jumps if _some_ substitution
     succeeded, not only the immediately preceding substitution.
     Therefore, always use a fake ‘t clear’ followed by a ‘:clear’ on
     the next line, to reset the ‘t’ flag where needed.

     Secondly, you cannot rely on ‘sed’ to clear the flag at each new
     cycle.

     One portable implementation of the script above is:

          t clear
          :clear
          s/keep me/kept/g
          t end
          s/.*/deleted/g
          :end

‘sed’ (‘w’)

     When a script contains multiple commands to write lines to the same
     output file, BusyBox ‘sed’ mistakenly opens a separate output
     stream for each command.  This can cause one of the commands to
     "win" and the others to "lose", in the sense that their output is
     discarded.  For example:

          sed -n -e '
            /a/w xxx
            /b/w xxx
          ' <

© manpagez.com 2000-2024
Individual documents may contain additional copyright information.