[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
18.1.1 Handling Multibyte and Varying-Width Characters
diff
, diff3
and sdiff
treat each line of
input as a string of unibyte characters. This can mishandle multibyte
characters in some cases. For example, when asked to ignore spaces,
diff
does not properly ignore a multibyte space character.
Also, diff
currently assumes that each byte is one column
wide, and this assumption is incorrect in some locales, e.g., locales
that use UTF-8 encoding. This causes problems with the ‘-y’ or
‘--side-by-side’ option of diff
.
These problems need to be fixed without unduly affecting the performance of the utilities in unibyte environments.
The IBM GNU/Linux Technology Center Internationalization Team has
proposed some patches to support internationalized diff
http://oss.software.ibm.com/developer/opensource/linux/patches/i18n/diffutils-2.7.2-i18n-0.1.patch.gz.
Unfortunately, these patches are incomplete and are to an older
version of diff
, so more work needs to be done in this area.