[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
7 Reporting Bugs
Email bug reports to bug-sed@gnu.org. Also, please include the output of ‘sed --version’ in the body of your report if at all possible.
Please do not send a bug report like this:
while building frobme-1.3.4
$ configure
error--> sed: file sedscr line 1: Unknown option to 's'
If GNU sed
doesn’t configure your favorite package, take a
few extra minutes to identify the specific problem and make a stand-alone
test case. Unlike other programs such as C compilers, making such test
cases for sed
is quite simple.
A stand-alone test case includes all the data necessary to perform the
test, and the specific invocation of sed
that causes the problem.
The smaller a stand-alone test case is, the better. A test case should
not involve something as far removed from sed
as “try to configure
frobme-1.3.4”. Yes, that is in principle enough information to look
for the bug, but that is not a very practical prospect.
Here are a few commonly reported bugs that are not bugs.
N
command on the last line-
Most versions of
sed
exit without printing anything when theN
command is issued on the last line of a file. GNUsed
prints pattern space before exiting unless of course the-n
command switch has been specified. This choice is by design.For example, the behavior of
sed N foo bar
would depend on whether foo has an even or an odd number of lines(10). Or, when writing a script to read the next few lines following a pattern match, traditional implementations of
sed
would force you to write something like/foo/{ $!N; $!N; $!N; $!N; $!N; $!N; $!N; $!N; $!N }
instead of just
/foo/{ N;N;N;N;N;N;N;N;N; }
In any case, the simplest workaround is to use
$d;N
in scripts that rely on the traditional behavior, or to set thePOSIXLY_CORRECT
variable to a non-empty value. - Regex syntax clashes (problems with backslashes)
-
sed
uses the POSIX basic regular expression syntax. According to the standard, the meaning of some escape sequences is undefined in this syntax; notable in the case ofsed
are\|
,\+
,\?
,\`
,\'
,\<
,\>
,\b
,\B
,\w
, and\W
.As in all GNU programs that use POSIX basic regular expressions,
sed
interprets these escape sequences as special characters. So,x\+
matches one or more occurrences of ‘x’.abc\|def
matches either ‘abc’ or ‘def’.This syntax may cause problems when running scripts written for other
sed
s. Somesed
programs have been written with the assumption that\|
and\+
match the literal characters|
and+
. Such scripts must be modified by removing the spurious backslashes if they are to be used with modern implementations ofsed
, like GNUsed
.On the other hand, some scripts use s|abc\|def||g to remove occurrences of either
abc
ordef
. While this worked untilsed
4.0.x, newer versions interpret this as removing the stringabc|def
. This is again undefined behavior according to POSIX, and this interpretation is arguably more robust: oldersed
s, for example, required that the regex matcher parsed\/
as/
in the common case of escaping a slash, which is again undefined behavior; the new behavior avoids this, and this is good because the regex matcher is only partially under our control.In addition, this version of
sed
supports several escape characters (some of which are multi-character) to insert non-printable characters in scripts (\a
,\c
,\d
,\o
,\r
,\t
,\v
,\x
). These can cause similar problems with scripts written for othersed
s. - ‘-i’ clobbers read-only files
-
In short, ‘sed -i’ will let you delete the contents of a read-only file, and in general the ‘-i’ option (see section Invocation) lets you clobber protected files. This is not a bug, but rather a consequence of how the Unix filesystem works.
The permissions on a file say what can happen to the data in that file, while the permissions on a directory say what can happen to the list of files in that directory. ‘sed -i’ will not ever open for writing a file that is already on disk. Rather, it will work on a temporary file that is finally renamed to the original name: if you rename or delete files, you’re actually modifying the contents of the directory, so the operation depends on the permissions of the directory, not of the file. For this same reason,
sed
does not let you use ‘-i’ on a writeable file in a read-only directory, and will break hard or symbolic links when ‘-i’ is used on such a file. 0a
does not work (gives an error)-
There is no line 0. 0 is a special address that is only used to treat addresses like
0,/RE/
as active when the script starts: if you write1,/abc/d
and the first line includes the word ‘abc’, then that match would be ignored because address ranges must span at least two lines (barring the end of the file); but what you probably wanted is to delete every line up to the first one including ‘abc’, and this is obtained with0,/abc/d
. [a-z]
is case insensitive-
You are encountering problems with locales. POSIX mandates that
[a-z]
uses the current locale’s collation order – in C parlance, that means usingstrcoll(3)
instead ofstrcmp(3)
. Some locales have a case-insensitive collation order, others don’t.Another problem is that
[a-z]
tries to use collation symbols. This only happens if you are on the GNU system, using GNU libc’s regular expression matcher instead of compiling the one supplied with GNU sed. In a Danish locale, for example, the regular expression^[a-z]$
matches the string ‘aa’, because this is a single collating symbol that comes after ‘a’ and before ‘b’; ‘ll’ behaves similarly in Spanish locales, or ‘ij’ in Dutch locales.To work around these problems, which may cause bugs in shell scripts, set the
LC_COLLATE
andLC_CTYPE
environment variables to ‘C’. s/.*//
does not clear pattern space-
This happens if your input stream includes invalid multibyte sequences. POSIX mandates that such sequences are not matched by ‘.’, so that ‘s/.*//’ will not clear pattern space as you would expect. In fact, there is no way to clear sed’s buffers in the middle of the script in most multibyte locales (including UTF-8 locales). For this reason, GNU
sed
provides a ‘z’ command (for ‘zap’) as an extension.To work around these problems, which may cause bugs in shell scripts, set the
LC_COLLATE
andLC_CTYPE
environment variables to ‘C’.
[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This document was generated on January 5, 2013 using texi2html 5.0.