[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
6.1.10 Unicode (UCS-2) Strings
UCS-2 strings cannot be read by the standard reader but UTF-8 strings
can. The special syntax for UTF-8 is described by the
regular expression:
#u"([^]|\")*"
.
The library functions for Unicode string processing are:
- bigloo procedure: make-ucs2-string k
- bigloo procedure: make-ucs2-string k char
- bigloo procedure: ucs2-string k …
- bigloo procedure: ucs2-string-length s-ucs2
- bigloo procedure: ucs2-string-ref s-ucs2 k
- bigloo procedure: ucs2-string-set! s-ucs2 k char
- bigloo procedure: ucs2-string=? s-ucs2a s-ucs2b
- bigloo procedure: ucs2-string-ci=? s-ucs2a s-ucs2b
- bigloo procedure: ucs2-string<? s-ucs2a s-ucs2b
- bigloo procedure: ucs2-string>? s-ucs2a s-ucs2b
- bigloo procedure: ucs2-string<=? s-ucs2a s-ucs2b
- bigloo procedure: ucs2-string>=? s-ucs2a s-ucs2b
- bigloo procedure: ucs2-string-ci<? s-ucs2a s-ucs2b
- bigloo procedure: ucs2-string-ci>? s-ucs2a s-ucs2b
- bigloo procedure: ucs2-string-ci<=? s-ucs2a s-ucs2b
- bigloo procedure: ucs2-string-ci>=? s-ucs2a s-ucs2b
- bigloo procedure: subucs2-string s-ucs2 start end
- bigloo procedure: ucs2-string-append s-ucs2 …
- bigloo procedure: ucs2-string->list s-ucs2
- bigloo procedure: list->ucs2-string chars
- bigloo procedure: ucs2-string-copy s-ucs2
- bigloo procedure: ucs2-string-fill! s-ucs2 char
Stores char in every element of the given s-ucs2 and returns an unspecified value.
- bigloo procedure: ucs2-string-downcase s-ucs2
Builds a newly allocated ucs2-string with lower case letters.
- bigloo procedure: ucs2-string-upcase s-ucs2
Builds a new allocated ucs2-string with upper case letters.
- bigloo procedure: ucs2-string->utf8-string s-ucs2
- bigloo procedure: utf8-string->ucs2-string string
Convert UCS-2 strings to (or from) UTF-8 encoded ascii strings.
- bigloo procedure: utf8-string? string [strict
#f
] Returns
#t
if and only if the argument string is a well formed UTF8 string. Otherwise returns#f
.If the optional argument strict is
#t
, half utf16-surrogates are rejected. The optional argument strict defaults to#f
.
- bigloo procedure: utf8-string-length string
Returns the number of characters of an UTF8 string. It raises an error if the string is not a well formed UTF8 string (i.e., it does satisfies the
utf8-string?
predicate.
- bigloo procedure: utf8-string-ref string i
Returns the character (represented as an UTF8 string) at the position i in string.
- library procedure: utf8-substring string start [end]
-
string must be a string, and start and end must be exact integers satisfying:
0 <= START <= END <= (string-length STRING)
The optional argument end defaults to
(utf8-string-length STRING)
.utf8-substring
returns a newly allocated string formed from the characters of STRING beginning with index START (inclusive) and ending with index END (exclusive).If the argument string is not a well formed UTF8 string an error is raised. Otherwise, the result is also a well formed UTF8 string.
- bigloo procedure: iso-latin->utf8 string
- bigloo procedure: iso-latin->utf8! string
- bigloo procedure: utf8->iso-latin string
- bigloo procedure: utf8->iso-latin! string
- bigloo procedure: utf8->iso-latin-15 string
- bigloo procedure: utf8->iso-latin-15! string
Encode and decode iso-latin strings into utf8. The functions
iso-latin->utf8-string!
,utf8->iso-latin!
andutf8->iso-latin-15!
may return, as result, the string they receive as argument.
- bigloo procedure: cp1252->utf8 string
- bigloo procedure: cp1252->utf8! string
- bigloo procedure: utf8->cp1252 string
- bigloo procedure: utf8->cp1252! string
Encode and decode cp1252 strings into utf8. The functions
cp1252->utf8-string!
andutf8->cp1252!
may return, as result, the string they receive as argument.
- bigloo procedure: 8bits->utf8 string table
- bigloo procedure: 8bits->utf8! string table
- bigloo procedure: utf8->8bits string invtable
- bigloo procedure: utf8->8bits! string inv-table
These are the general conversion routines used internally by
iso-latin->utf8
andcp1252->utf8
. They convert any 8 bits string into its equivalent UTF-8 representation and vice versa.The argument table should be either
#f
, which means that the basic (i.e., iso-latin-1) 8bits -> UTF8 converion is used, or it must be a vector of at maximun 127 entries containing strings of characters. This table contains the encodings for the 8 bits characters whose code range from 128 to 255.The table is not required to be complete. That is, it is not required to give the whole character encoding set. Only the characters that need a non-iso-latin canonical representation must be given. For instance, the CP1252 table can be defined as:
(define cp1252 '#("\xe2\x82\xac" ;; 0x80 "" ;; 0x81 "\xe2\x80\x9a" ;; 0x82 "\xc6\x92" ;; 0x83 "\xe2\x80\x9e" ;; 0x84 "\xe2\x80\xa6" ;; 0x85 "\xe2\x80\xa0" ;; 0x86 "\xe2\x80\xa1" ;; 0x87 "\xcb\x86" ;; 0x88 "\xe2\x80\xb0" ;; 0x89 "\xc5\xa0" ;; 0x8a "\xe2\x80\xb9" ;; 0x8b "\xc5\x92" ;; 0x8c "" ;; 0x8d "\xc5\xbd" ;; 0x8e "" ;; 0x8f "" ;; 0x90 "\xe2\x80\x98" ;; 0x91 "\xe2\x80\x99" ;; 0x92 "\xe2\x80\x9c" ;; 0x93 "\xe2\x80\x9d" ;; 0x94 "\xe2\x80\xa2" ;; 0x95 "\xe2\x80\x93" ;; 0x96 "\xe2\x80\x94" ;; 0x97 "\xcb\x9c" ;; 0x98 "\xe2\x84\xa2" ;; 0x99 "\xc5\xa1" ;; 0x9a "\xe2\x80\xba" ;; 0x9b "\xc5\x93" ;; 0x9c "" ;; 0x9d "\xc5\xbe" ;; 0x9e "\xc5\xb8")) ;; 0x9f
The argument inv-table is a inverse table that can be build from a table and using the function
inverse-utf8-table
.
- procedure: inverse-utf8-table vector
Inverse an UTF8 table into an object suitable for
utf8->8bits
andutf8->8bits!
.
[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This document was generated on March 31, 2014 using texi2html 5.0.