File: gettext.info, Node: No string concatenation, Next: No embedded URLs, Prev: Split at paragraphs, Up: Preparing Strings
4.3.4 No string concatenation
-----------------------------
Hardcoded string concatenation is sometimes used to construct English
strings:
strcpy (s, "Replace ");
strcat (s, object1);
strcat (s, " with ");
strcat (s, object2);
strcat (s, "?");
In order to present to the translator only entire sentences, and also
because in some languages the translator might want to swap the order of
‘object1’ and ‘object2’, it is necessary to change this to use a format
string:
sprintf (s, "Replace %s with %s?", object1, object2);
String concatenation operator
-----------------------------
In many programming languages, a particular operator denotes string
concatenation at runtime (or possibly at compile time, if the compiler
supports that).
• In C++, string concatenation of ‘std::string’ objects is denoted by
the ‘+’ operator.
• In Python, string concatenation is denoted by the ‘+’ operator.
• In Java, string concatenation is denoted by the ‘+’ operator.
• In C#, string concatenation is denoted by the ‘+’ operator.
• In JavaScript and TypeScript, string concatenation is denoted by
the ‘+’ operator.
• In Go, string concatenation is denoted by the ‘+’ operator.
• In Ruby, string concatenation is denoted by the ‘+’ operator.
• In Shell, string concatenation is denoted by mere juxtaposition of
strings.
• In awk, string concatenation is denoted by mere juxtaposition of
strings.
• In Lua, string concatenation is denoted by the ‘..’ operator.
• In Modula-2, string concatenation is denoted by the ‘+’ operator.
• In D, string concatenation is denoted by the ‘~’ operator.
• In OCaml, string concatenation is denoted by the ‘^’ operator.
• In Smalltalk, string concatenation is denoted by the ‘,’ operator.
• In Vala, string concatenation is denoted by the ‘+’ operator.
• In Perl, string concatenation is denoted by the ‘.’ operator.
• In PHP, string concatenation is denoted by the ‘.’ operator.
So, for example, in Java, you would change
System.out.println("Replace "+object1+" with "+object2+"?");
into a statement involving a format string:
System.out.println(
MessageFormat.format("Replace {0} with {1}?",
new Object[] { object1, object2 }));
Similarly, in C#, you would change
Console.WriteLine("Replace "+object1+" with "+object2+"?");
into a statement involving a format string:
Console.WriteLine(
String.Format("Replace {0} with {1}?", object1, object2));
Strings with embedded expressions
---------------------------------
In some programming languages, it is possible to have strings with
embedded expressions. The expressions can refer to variables of the
program. The value of such an expression is converted to a string and
inserted in place of the expression; but no formatting function is
called.
• In Python, _f-strings_ can contain expressions. Such as ‘f"Hello,
{name}!"’.
• In C#, since C# 6.0, _interpolated strings_ can contain
expressions. Such as ‘$"Hello, {name}!"’.
• In JavaScript, since ES6, and in TypeScript, _template literals_
can contain expressions. Such as ‘`Hello, ${name}!`’.
• In Ruby, _interpolated strings_ can contain expressions. Such as
‘"Hello, #{name}!"’.
• In Shell language, double-quoted strings can contain references to
variables, along with default values and string operations. Such
as ‘"Hello, $name!"’ or ‘"Hello, ${name}!"’.
• In D, _interpolation expression sequences_ can contain expressions.
Such as ‘i"Hello, $(name)!"’.
• In Tcl, strings are subject to _variable substitution_. Such as
‘"Hello, $name!"’.
• In Perl, _interpolated strings_ can contain expressions. Such as
‘"Hello, $name!"’.
• In PHP, string literals are subject to _variable parsing_. Such as
‘"Hello, $name!"’.
These cases are effectively string concatenation as well, just with a
different syntax.
So, for example, in Python, you would change
print (f'Replace {object1.name} with {object2.name}?')
into a statement involving a format string:
print ('Replace %(name1)s with %(name2)s?'
% { 'name1': object1.name, 'name2': object2.name })
or equivalently
print ('Replace {name1} with {name2}?'
.format(name1 = object1.name, name2 = object2.name))
And in JavaScript, you would change
print (`Replace ${object1.name} with ${object2.name}?`)
into a statement involving a format string:
print ('Replace %s with %s?'.format(object1.name, object2.name))
Specifically in JavaScript, an alternative is to use a _tagged_ template
literal:
print (TAG`Replace ${object1.name} with ${object2.name}?`)
and pass an option ‘--tag=TAG:FORMAT’ to ‘xgettext’.
Format strings with embedded named references
---------------------------------------------
Format strings with embedded named references are different: They are
suitable for internationalization, because it is possible to insert a
call to the ‘gettext’ function (that will return a translated format
string) _before_ the argument values are inserted in place of the
placeholders.
The format string types that allow embedded named references are:
• *note Shell format strings: sh-format.
• In Python, those *note Python format strings: python-format. that
take a dictionary as argument, and the *note Python brace format
strings: python-format.
• In Ruby, those *note Ruby format strings: ruby-format. that take a
hash table as argument.
• In Perl, the *note Perl brace format strings: perl-format.
The ‘’ macros
-------------------------
A similar case is compile time concatenation of strings. The ISO C
99 include file ‘’ contains a macro ‘PRId64’ that can be
used as a formatting directive for outputting an ‘int64_t’ integer
through ‘printf’. It expands to a constant string, usually "d" or "ld"
or "lld" or something like this, depending on the platform. Assume you
have code like
printf ("The amount is %0" PRId64 "\n", number);
The ‘gettext’ tools and library have special support for these
‘’ macros. You can therefore simply write
printf (gettext ("The amount is %0" PRId64 "\n"), number);
The PO file will contain the string "The amount is %0\n". The
translators will provide a translation containing "%0" as well,
and at runtime the ‘gettext’ function's result will contain the
appropriate constant string, "d" or "ld" or "lld".
This works only for the predefined ‘’ macros. If you
have defined your own similar macros, let's say ‘MYPRId64’, that are not
known to ‘xgettext’, the solution for this problem is to change the code
like this:
char buf1[100];
sprintf (buf1, "%0" MYPRId64, number);
printf (gettext ("The amount is %s\n"), buf1);
This means, you put the platform dependent code in one statement, and
the internationalization code in a different statement. Note that a
buffer length of 100 is safe, because all available hardware integer
types are limited to 128 bits, and to print a 128 bit integer one needs
at most 54 characters, regardless whether in decimal, octal or
hexadecimal.