manpagez: man pages & more
info gdb
Home | html | info | man
[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

8.17 Character Sets

If the program you are debugging uses a different character set to represent characters and strings than the one No value for GDBN uses itself, No value for GDBN can automatically translate between the character sets for you. The character set No value for GDBN uses we call the host character set; the one the inferior program uses we call the target character set.

For example, if you are running No value for GDBN on a GNU/Linux system, which uses the ISO Latin 1 character set, but you are using No value for GDBN's remote protocol (see section Debugging Remote Programs) to debug a program running on an IBM mainframe, which uses the EBCDIC character set, then the host character set is Latin-1, and the target character set is EBCDIC. If you give No value for GDBN the command set target-charset EBCDIC-US, then No value for GDBN translates between EBCDIC and Latin 1 as you print character or string values, or use character and string literals in expressions.

No value for GDBN has no way to automatically recognize which character set the inferior program uses; you must tell it, using the set target-charset command, described below.

Here are the commands for controlling No value for GDBN's character set support:

set target-charset charset

Set the current target character set to charset. We list the character set names No value for GDBN recognizes below, but if you type set target-charset followed by <TAB><TAB>, No value for GDBN will list the target character sets it supports.

set host-charset charset

Set the current host character set to charset.

By default, No value for GDBN uses a host character set appropriate to the system it is running on; you can override that default using the set host-charset command.

No value for GDBN can only use certain character sets as its host character set. We list the character set names No value for GDBN recognizes below, and indicate which can be host character sets, but if you type set target-charset followed by <TAB><TAB>, No value for GDBN will list the host character sets it supports.

set charset charset

Set the current host and target character sets to charset. As above, if you type set charset followed by <TAB><TAB>, No value for GDBN will list the name of the character sets that can be used for both host and target.

show charset

Show the names of the current host and target charsets.

show host-charset

Show the name of the current host charset.

show target-charset

Show the name of the current target charset.

No value for GDBN currently includes support for the following character sets:

ASCII

Seven-bit U.S. ASCII. No value for GDBN can use this as its host character set.

ISO-8859-1

The ISO Latin 1 character set. This extends ASCII with accented characters needed for French, German, and Spanish. No value for GDBN can use this as its host character set.

EBCDIC-US
IBM1047

Variants of the EBCDIC character set, used on some of IBM's mainframe operating systems. (GNU/Linux on the S/390 uses U.S. ASCII.) No value for GDBN cannot use these as its host character set.

Note that these are all single-byte character sets. More work inside No value for GDBN is needed to support multi-byte or variable-width character encodings, like the UTF-8 and UCS-2 encodings of Unicode.

Here is an example of No value for GDBN's character set support in action. Assume that the following source code has been placed in the file ‘charset-test.c’:

 
#include <stdio.h>

char ascii_hello[]
  = {72, 101, 108, 108, 111, 44, 32, 119,
     111, 114, 108, 100, 33, 10, 0};
char ibm1047_hello[]
  = {200, 133, 147, 147, 150, 107, 64, 166,
     150, 153, 147, 132, 90, 37, 0};

main ()
{
  printf ("Hello, world!\n");
}

In this program, ascii_hello and ibm1047_hello are arrays containing the string ‘Hello, world!’ followed by a newline, encoded in the ASCII and IBM1047 character sets.

We compile the program, and invoke the debugger on it:

 
$ gcc -g charset-test.c -o charset-test
$ gdb -nw charset-test
GNU gdb 2001-12-19-cvs
Copyright 2001 Free Software Foundation, Inc.
…
(No value for GDBP)

We can use the show charset command to see what character sets No value for GDBN is currently using to interpret and display characters and strings:

 
(No value for GDBP) show charset
The current host and target character set is `ISO-8859-1'.
(No value for GDBP)

For the sake of printing this manual, let's use ASCII as our initial character set:

 
(No value for GDBP) set charset ASCII
(No value for GDBP) show charset
The current host and target character set is `ASCII'.
(No value for GDBP)

Let's assume that ASCII is indeed the correct character set for our host system — in other words, let's assume that if No value for GDBN prints characters using the ASCII character set, our terminal will display them properly. Since our current target character set is also ASCII, the contents of ascii_hello print legibly:

 
(No value for GDBP) print ascii_hello
$1 = 0x401698 "Hello, world!\n"
(No value for GDBP) print ascii_hello[0]
$2 = 72 'H'
(No value for GDBP)

No value for GDBN uses the target character set for character and string literals you use in expressions:

 
(No value for GDBP) print '+'
$3 = 43 '+'
(No value for GDBP)

The ASCII character set uses the number 43 to encode the ‘+’ character.

No value for GDBN relies on the user to tell it which character set the target program uses. If we print ibm1047_hello while our target character set is still ASCII, we get jibberish:

 
(No value for GDBP) print ibm1047_hello
$4 = 0x4016a8 "\310\205\223\223\226k@\246\226\231\223\204Z%"
(No value for GDBP) print ibm1047_hello[0]
$5 = 200 '\310'
(No value for GDBP)

If we invoke the set target-charset followed by <TAB><TAB>, No value for GDBN tells us the character sets it supports:

 
(No value for GDBP) set target-charset
ASCII       EBCDIC-US   IBM1047     ISO-8859-1
(No value for GDBP) set target-charset

We can select IBM1047 as our target character set, and examine the program's strings again. Now the ASCII string is wrong, but No value for GDBN translates the contents of ibm1047_hello from the target character set, IBM1047, to the host character set, ASCII, and they display correctly:

 
(No value for GDBP) set target-charset IBM1047
(No value for GDBP) show charset
The current host character set is `ASCII'.
The current target character set is `IBM1047'.
(No value for GDBP) print ascii_hello
$6 = 0x401698 "\110\145%%?\054\040\167?\162%\144\041\012"
(No value for GDBP) print ascii_hello[0]
$7 = 72 '\110'
(No value for GDBP) print ibm1047_hello
$8 = 0x4016a8 "Hello, world!\n"
(No value for GDBP) print ibm1047_hello[0]
$9 = 200 'H'
(No value for GDBP)

As above, No value for GDBN uses the target character set for character and string literals you use in expressions:

 
(No value for GDBP) print '+'
$10 = 78 '+'
(No value for GDBP)

The IBM1047 character set uses the number 78 to encode the ‘+’ character.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]
© manpagez.com 2000-2024
Individual documents may contain additional copyright information.