Top |
Simple XML Subset Parser * @Short_description: parses a subset of XMLSimple XML Subset Parser * @Short_description: parses a subset of XML |
Functions
Types and Values
enum | GMarkupError |
#define | G_MARKUP_ERROR |
enum | GMarkupParseFlags |
GMarkupParseContext | |
struct | GMarkupParser |
enum | GMarkupCollectType |
Description
The "GMarkup" parser is intended to parse a simple markup format that's a subset of XML. This is a small, efficient, easy-to-use parser. It should not be used if you expect to interoperate with other applications generating full-scale XML. However, it's very useful for application data files, config files, etc. where you know your application will be the only one writing the file. Full-scale XML parsers should be able to parse the subset used by GMarkup, so you can easily migrate to full-scale XML at a later time if the need arises.
GMarkup is not guaranteed to signal an error on all invalid XML; the parser may accept documents that an XML parser would not. However, XML documents which are not well-formed (which is a weaker condition than being valid. See the XML specification for definitions of these terms.) are not considered valid GMarkup documents.
Simplifications to XML include:
Only UTF-8 encoding is allowed
No user-defined entities
Processing instructions, comments and the doctype declaration are "passed through" but are not interpreted in any way
No DTD or validation
The markup format does support:
Elements
Attributes
5 standard entities: & < > " '
Character references
Sections marked as CDATA
Functions
g_markup_escape_text ()
gchar * g_markup_escape_text (const gchar *text
,gssize length
);
Escapes text so that the markup parser will parse it verbatim. Less than, greater than, ampersand, etc. are replaced with the corresponding entities. This function would typically be used when writing out a file to be parsed with the markup parser.
Note that this function doesn't protect whitespace and line endings from being processed according to the XML rules for normalization of line endings and attribute values.
Note also that this function will produce character references in the range of &x1; ... &x1f; for all control sequences except for tabstop, newline and carriage return. The character references in this range are not valid XML 1.0, but they are valid XML 1.1 and will be accepted by the GMarkup parser.
g_markup_printf_escaped ()
gchar * g_markup_printf_escaped (const char *format
,...
);
Formats arguments according to format
, escaping
all string and character arguments in the fashion
of g_markup_escape_text()
. This is useful when you
want to insert literal strings into XML-style markup
output, without having to worry that the strings
might themselves contain markup.
1 2 3 4 5 6 7 8 9 |
const char *store = "Fortnum & Mason"; const char *item = "Tea"; char *output; output = g_markup_printf_escaped ("<purchase>" "<store>%s</store>" "<item>%s</item>" "</purchase>", store, item); |
Since 2.4
g_markup_vprintf_escaped ()
gchar * g_markup_vprintf_escaped (const char *format
,va_list args
);
Formats the data in args
according to format
, escaping
all string and character arguments in the fashion
of g_markup_escape_text()
. See g_markup_printf_escaped()
.
Since 2.4
g_markup_parse_context_end_parse ()
gboolean g_markup_parse_context_end_parse (GMarkupParseContext *context
,GError **error
);
Signals to the GMarkupParseContext that all data has been
fed into the parse context with g_markup_parse_context_parse()
.
This function reports an error if the document isn't complete, for example if elements are still open.
g_markup_parse_context_free ()
void
g_markup_parse_context_free (GMarkupParseContext *context
);
Frees a GMarkupParseContext.
This function can't be called from inside one of the GMarkupParser functions or while a subparser is pushed.
g_markup_parse_context_get_position ()
void g_markup_parse_context_get_position (GMarkupParseContext *context
,gint *line_number
,gint *char_number
);
Retrieves the current line number and the number of the character on that line. Intended for use in error messages; there are no strict semantics for what constitutes the "current" line number other than "the best number we could come up with for error messages."
g_markup_parse_context_get_element ()
const gchar *
g_markup_parse_context_get_element (GMarkupParseContext *context
);
Retrieves the name of the currently open element.
If called from the start_element or end_element handlers this will
give the element_name as passed to those functions. For the parent
elements, see g_markup_parse_context_get_element_stack()
.
Since 2.2
g_markup_parse_context_get_element_stack ()
const GSList *
g_markup_parse_context_get_element_stack
(GMarkupParseContext *context
);
Retrieves the element stack from the internal state of the parser.
The returned GSList is a list of strings where the first item is
the currently open tag (as would be returned by
g_markup_parse_context_get_element()
) and the next item is its
immediate parent.
This function is intended to be used in the start_element and
end_element handlers where g_markup_parse_context_get_element()
would merely return the name of the element that is being
processed.
Since 2.16
g_markup_parse_context_get_user_data ()
gpointer
g_markup_parse_context_get_user_data (GMarkupParseContext *context
);
Returns the user_data associated with context
.
This will either be the user_data that was provided to
g_markup_parse_context_new()
or to the most recent call
of g_markup_parse_context_push()
.
Returns
the provided user_data. The returned data belongs to
the markup context and will be freed when
g_markup_parse_context_free()
is called.
Since 2.18
g_markup_parse_context_new ()
GMarkupParseContext * g_markup_parse_context_new (const GMarkupParser *parser
,GMarkupParseFlags flags
,gpointer user_data
,GDestroyNotify user_data_dnotify
);
Creates a new parse context. A parse context is used to parse marked-up documents. You can feed any number of documents into a context, as long as no errors occur; once an error occurs, the parse context can't continue to parse text (you have to free it and create a new parse context).
Parameters
parser |
||
flags |
one or more GMarkupParseFlags |
|
user_data |
user data to pass to GMarkupParser functions |
|
user_data_dnotify |
user data destroy notifier called when the parse context is freed |
g_markup_parse_context_parse ()
gboolean g_markup_parse_context_parse (GMarkupParseContext *context
,const gchar *text
,gssize text_len
,GError **error
);
Feed some data to the GMarkupParseContext.
The data need not be valid UTF-8; an error will be signaled if it's invalid. The data need not be an entire document; you can feed a document into the parser incrementally, via multiple calls to this function. Typically, as you receive data from a network connection or file, you feed each received chunk of data into this function, aborting the process if an error occurs. Once an error is reported, no further data may be fed to the GMarkupParseContext; all errors are fatal.
Parameters
context |
||
text |
chunk of text to parse |
|
text_len |
length of |
|
error |
return location for a GError |
g_markup_parse_context_push ()
void g_markup_parse_context_push (GMarkupParseContext *context
,const GMarkupParser *parser
,gpointer user_data
);
Temporarily redirects markup data to a sub-parser.
This function may only be called from the start_element handler of
a GMarkupParser. It must be matched with a corresponding call to
g_markup_parse_context_pop()
in the matching end_element handler
(except in the case that the parser aborts due to an error).
All tags, text and other data between the matching tags is
redirected to the subparser given by parser
. user_data
is used
as the user_data for that parser. user_data
is also passed to the
error callback in the event that an error occurs. This includes
errors that occur in subparsers of the subparser.
The end tag matching the start tag for which this call was made is
handled by the previous parser (which is given its own user_data)
which is why g_markup_parse_context_pop()
is provided to allow "one
last access" to the user_data
provided to this function. In the
case of error, the user_data
provided here is passed directly to
the error callback of the subparser and g_markup_parse_context_pop()
should not be called. In either case, if user_data
was allocated
then it ought to be freed from both of these locations.
This function is not intended to be directly called by users interested in invoking subparsers. Instead, it is intended to be used by the subparsers themselves to implement a higher-level interface.
As an example, see the following implementation of a simple parser that counts the number of tags encountered.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
typedef struct { gint tag_count; } CounterData; static void counter_start_element (GMarkupParseContext *context, const gchar *element_name, const gchar **attribute_names, const gchar **attribute_values, gpointer user_data, GError **error) { CounterData *data = user_data; data->tag_count++; } static void counter_error (GMarkupParseContext *context, GError *error, gpointer user_data) { CounterData *data = user_data; g_slice_free (CounterData, data); } static GMarkupParser counter_subparser = { counter_start_element, NULL, NULL, NULL, counter_error }; |
In order to allow this parser to be easily used as a subparser, the following interface is provided:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
void start_counting (GMarkupParseContext *context) { CounterData *data = g_slice_new (CounterData); data->tag_count = 0; g_markup_parse_context_push (context, &counter_subparser, data); } gint end_counting (GMarkupParseContext *context) { CounterData *data = g_markup_parse_context_pop (context); int result; result = data->tag_count; g_slice_free (CounterData, data); return result; } |
The subparser would then be used as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
static void start_element (context, element_name, ...) { if (strcmp (element_name, "count-these") == 0) start_counting (context); // else, handle other tags... } static void end_element (context, element_name, ...) { if (strcmp (element_name, "count-these") == 0) g_print ("Counted %d tags\n", end_counting (context)); // else, handle other tags... } |
Since 2.18
g_markup_parse_context_pop ()
gpointer
g_markup_parse_context_pop (GMarkupParseContext *context
);
Completes the process of a temporary sub-parser redirection.
This function exists to collect the user_data allocated by a
matching call to g_markup_parse_context_push()
. It must be called
in the end_element handler corresponding to the start_element
handler during which g_markup_parse_context_push()
was called.
You must not call this function from the error callback -- the
user_data
is provided directly to the callback in that case.
This function is not intended to be directly called by users interested in invoking subparsers. Instead, it is intended to be used by the subparsers themselves to implement a higher-level interface.
Since 2.18
g_markup_parse_context_ref ()
GMarkupParseContext *
g_markup_parse_context_ref (GMarkupParseContext *context
);
Increases the reference count of context
.
Since 2.36
g_markup_parse_context_unref ()
void
g_markup_parse_context_unref (GMarkupParseContext *context
);
Decreases the reference count of context
. When its reference count
drops to 0, it is freed.
Since 2.36
g_markup_collect_attributes ()
gboolean g_markup_collect_attributes (const gchar *element_name
,const gchar **attribute_names
,const gchar **attribute_values
,GError **error
,GMarkupCollectType first_type
,const gchar *first_attr
,...
);
Collects the attributes of the element from the data passed to the GMarkupParser start_element function, dealing with common error conditions and supporting boolean values.
This utility function is not required to write a parser but can save a lot of typing.
The element_name
, attribute_names
, attribute_values
and error
parameters passed to the start_element callback should be passed
unmodified to this function.
Following these arguments is a list of "supported" attributes to collect.
It is an error to specify multiple attributes with the same name. If any
attribute not in the list appears in the attribute_names
array then an
unknown attribute error will result.
The GMarkupCollectType field allows specifying the type of collection to perform and if a given attribute must appear or is optional.
The attribute name is simply the name of the attribute to collect.
The pointer should be of the appropriate type (see the descriptions
under GMarkupCollectType) and may be NULL
in case a particular
attribute is to be allowed but ignored.
This function deals with issuing errors for missing attributes
(of type G_MARKUP_ERROR_MISSING_ATTRIBUTE
), unknown attributes
(of type G_MARKUP_ERROR_UNKNOWN_ATTRIBUTE
) and duplicate
attributes (of type G_MARKUP_ERROR_INVALID_CONTENT
) as well
as parse errors for boolean-valued attributes (again of type
G_MARKUP_ERROR_INVALID_CONTENT
). In all of these cases FALSE
will be returned and error
will be set as appropriate.
Parameters
element_name |
the current tag name |
|
attribute_names |
the attribute names |
|
attribute_values |
the attribute values |
|
error |
||
first_type |
the GMarkupCollectType of the first attribute |
|
first_attr |
the name of the first attribute |
|
... |
a pointer to the storage location of the first attribute
(or |
Since 2.16
Types and Values
enum GMarkupError
Error codes returned by markup parsing.
Members
text being parsed was not valid UTF-8 |
||
document contained nothing, or only whitespace |
||
document was ill-formed |
||
error should be set by GMarkupParser functions; element wasn't known |
||
error should be set by GMarkupParser functions; attribute wasn't known |
||
error should be set by GMarkupParser functions; content was invalid |
||
error should be set by GMarkupParser functions; a required attribute was missing |
G_MARKUP_ERROR
#define G_MARKUP_ERROR g_markup_error_quark ()
Error domain for markup parsing. Errors in this domain will be from the GMarkupError enumeration. See GError for information on error domains.
enum GMarkupParseFlags
Flags that affect the behaviour of the parser.
Members
flag you should not use |
||
When this flag is set, CDATA marked
sections are not passed literally to the |
||
Normally errors caught by GMarkup itself have line/column information prefixed to them to let the caller know the location of the error. When this flag is set the location information is also prefixed to errors generated by the GMarkupParser implementation functions |
||
Ignore (don't report) qualified attributes and tags, along with their contents. A qualified attribute or tag is one that contains ':' in its name (ie: is in another namespace). Since: 2.40. |
GMarkupParseContext
typedef struct _GMarkupParseContext GMarkupParseContext;
A parse context is used to parse a stream of bytes that you expect to contain marked-up text.
See g_markup_parse_context_new()
, GMarkupParser, and so
on for more details.
struct GMarkupParser
struct GMarkupParser { /* Called for open tags <foo bar="baz"> */ void (*start_element) (GMarkupParseContext *context, const gchar *element_name, const gchar **attribute_names, const gchar **attribute_values, gpointer user_data, GError **error); /* Called for close tags </foo> */ void (*end_element) (GMarkupParseContext *context, const gchar *element_name, gpointer user_data, GError **error); /* Called for character data */ /* text is not nul-terminated */ void (*text) (GMarkupParseContext *context, const gchar *text, gsize text_len, gpointer user_data, GError **error); /* Called for strings that should be re-saved verbatim in this same * position, but are not otherwise interpretable. At the moment * this includes comments and processing instructions. */ /* text is not nul-terminated. */ void (*passthrough) (GMarkupParseContext *context, const gchar *passthrough_text, gsize text_len, gpointer user_data, GError **error); /* Called on error, including one set by other * methods in the vtable. The GError should not be freed. */ void (*error) (GMarkupParseContext *context, GError *error, gpointer user_data); };
Any of the fields in GMarkupParser can be NULL
, in which case they
will be ignored. Except for the error
function, any of these callbacks
can set an error; in particular the G_MARKUP_ERROR_UNKNOWN_ELEMENT
,
G_MARKUP_ERROR_UNKNOWN_ATTRIBUTE
, and G_MARKUP_ERROR_INVALID_CONTENT
errors are intended to be set from these callbacks. If you set an error
from a callback, g_markup_parse_context_parse()
will report that error
back to its caller.
Members
Callback to invoke when the opening tag of an element is seen. |
||
Callback to invoke when the closing tag of an element
is seen. Note that this is also called for empty tags like
|
||
Callback to invoke when some text is seen (text is always
inside an element). Note that the text of an element may be spread
over multiple calls of this function. If the
|
||
Callback to invoke for comments, processing instructions
and doctype declarations; if you're re-writing the parsed document,
write the passthrough text back out in the same position. If the
|
||
Callback to invoke when an error occurs. |
enum GMarkupCollectType
A mixed enumerated type and flags field. You must specify one type
(string, strdup, boolean, tristate). Additionally, you may optionally
bitwise OR the type with the flag G_MARKUP_COLLECT_OPTIONAL
.
It is likely that this enum will be extended in the future to support other types.
Members
used to terminate the list of attributes to collect |
||
collect the string pointer directly from
the attribute_values[] array. Expects a parameter of type (const
char **). If |
||
as with |
||
expects a parameter of type (gboolean *)
and parses the attribute value as a boolean. Sets |
||
as with |
||
can be bitwise ORed with the other fields. If present, allows the attribute not to appear. A default value is set depending on what value type is used |