[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Appendix B On Label Separators
Some strings contains characters whose NFKC normalized form contain the ASCII dot (0x2E, “.”). Examples of these characters are U+2024 (ONE DOT LEADER) and U+248C (DIGIT FIVE FULL STOP). The strings have the interesting property that their IDNA ToASCII output will contain embedded dots. For example:
ToASCII (hi U+248C com) = hi5.com ToASCII (räksmörgås U+2024 com) = xn--rksmrgs.com-l8as9u
This demonstrate the two general cases: The first where the ASCII dot
is part of an output that do not begin with the IDN prefix
xn--
. The second example illustrate when the dot is part of
IDN prefixed with xn--
.
The input strings are, from the DNS point of view, a single label.
The IDNA algorithm translate one label at a time. Thus, the output is
expected to be only one label. What is important here is to make sure
the DNS resolver receives the correct query. The DNS protocol does
not use the dot to delimit labels on the wire, rather it uses
length-value pairs. Thus the correct query would be for
{7}hi5.com
and {22}xn--rksmrgs.com-l8as9u
respectively.
Some implementations (1) have decided that
these inputs strings are potentially confusing for the user. The
string hi U+248C com
looks like hi5.com
on systems that
support Unicode properly. These implementations do not follow RFC
3490. They yield:
ToASCII (hi U+248C com) = hi5.com ToASCII (räksmörgås U+2024 com) = xn--rksmrgs-5wao1o.com
The DNS query they perform are {3}hi5{3}com
and
{18}xn--rksmrgs-5wao1o{3}com
respectively. Arguably, this
leads to a better user experience, and suggests that the IDNA
specification is sub-optimal in this area.
[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This document was generated on February 1, 2012 using texi2html 5.0.