[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
6.14.12 Handling of Unicode byte order marks.
This section documents the finer points of Guile’s handling of Unicode byte order marks (BOMs). A byte order mark (U+FEFF) is typically found at the start of a UTF-16 or UTF-32 stream, to allow readers to reliably determine the byte order. Occasionally, a BOM is found at the start of a UTF-8 stream, but this is much less common and not generally recommended.
Guile attempts to handle BOMs automatically, and in accordance with the
recommendations of the Unicode Standard, when the port encoding is set
to UTF-8
, UTF-16
, or UTF-32
. In brief, Guile
automatically writes a BOM at the start of a UTF-16 or UTF-32 stream,
and automatically consumes one from the start of a UTF-8, UTF-16, or
UTF-32 stream.
As specified in the Unicode Standard, a BOM is only handled specially at
the start of a stream, and only if the port encoding is set to
UTF-8
, UTF-16
or UTF-32
. If the port encoding is
set to UTF-16BE
, UTF-16LE
, UTF-32BE
, or
UTF-32LE
, then BOMs are not handled specially, and none of
the special handling described in this section applies.
- To ensure that Guile will properly detect the byte order of a UTF-16 or UTF-32 stream, you must perform a textual read before any writes, seeks, or binary I/O. Guile will not attempt to read a BOM unless a read is explicitly requested at the start of the stream.
-
If a textual write is performed before the first read, then an arbitrary
byte order will be chosen. Currently, big endian is the default on all
platforms, but that may change in the future. If you wish to explicitly
control the byte order of an output stream, set the port encoding to
UTF-16BE
,UTF-16LE
,UTF-32BE
, orUTF-32LE
, and explicitly write a BOM (#\xFEFF
) if desired. -
If
set-port-encoding!
is called in the middle of a stream, Guile treats this as a new logical “start of stream” for purposes of BOM handling, and will forget about any BOMs that had previously been seen. Therefore, it may choose a different byte order than had been used previously. This is intended to support multiple logical text streams embedded within a larger binary stream. - Binary I/O operations are not guaranteed to update Guile’s notion of whether the port is at the “start of the stream”, nor are they guaranteed to produce or consume BOMs.
- For ports that support seeking (e.g. normal files), the input and output streams are considered linked: if the user reads first, then a BOM will be consumed (if appropriate), but later writes will not produce a BOM. Similarly, if the user writes first, then later reads will not consume a BOM.
- For ports that do not support seeking (e.g. pipes, sockets, and terminals), the input and output streams are considered independent for purposes of BOM handling: the first read will consume a BOM (if appropriate), and the first write will also produce a BOM (if appropriate). However, the input and output streams will always use the same byte order.
-
Seeks to the beginning of a file will set the “start of stream” flags.
Therefore, a subsequent textual read or write will consume or produce a
BOM. However, unlike
set-port-encoding!
, if a byte order had already been chosen for the port, it will remain in effect after a seek, and cannot be changed by the presence of a BOM. Seeks anywhere other than the beginning of a file clear the “start of stream” flags.
[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This document was generated on April 20, 2013 using texi2html 5.0.