File: xmltok.el.html
This implements an XML 1.0 parser. It also implements the XML
Namespaces Recommendation. It is designed to be conforming, but it
works a bit differently from a normal XML parser. An XML document
consists of the prolog and an instance. The prolog is parsed as a
single unit using xmltok-forward-prolog. The instance is
considered as a sequence of tokens, where a token is something like
a start-tag, a comment, a chunk of data or a CDATA section. The
tokenization of the instance is stateless: the tokenization of one
part of the instance does not depend on tokenization of the
preceding part of the instance. This allows the instance to be
parsed incrementally. The main entry point is xmltok-forward:
this can be called at any point in the instance provided it is
between tokens.
This is a non-validating XML 1.0 processor. It does not resolve parameter entities (including the external DTD subset) and it does not resolve external general entities.
It is non-conformant by design in the following respects.
1. It expects the client to detect aspects of well-formedness that
are not internal to a single token, specifically checking that
end-tags match start-tags and that the instance contains exactly
one element.
2. It expects the client to detect duplicate attributes. Detection
of duplicate attributes after expansion of namespace prefixes
requires the namespace processing state. Detection of duplicate
attributes before expansion of namespace prefixes does not, but is
redundant given that the client will do detection of duplicate
attributes after expansion of namespace prefixes.
3. It allows the client to recover from well-formedness errors.
This is essential for use in applications where the document is
being parsed during the editing process.
4. It does not support documents that do not conform to the lexical
requirements of the XML Namespaces Recommendation (e.g. a document
with a colon in an entity name).
There are also a number of things that have not yet been implemented that make it non-conformant.
1. It does not implement default attributes. ATTLIST declarations
are parsed, but no checking is done on the content of attribute
value literals specifying default attribute values, and default
attribute values are not reported to the client.
2. It does not implement internal entities containing elements. If
an internal entity is referenced and parsing its replacement text
yields one or more tags, then it will skip the reference and
report this to the client.
3. It does not check the syntax of public identifiers in the DTD.
4. It allows some non-ASCII characters in certain situations where
it should not. For example, it only enforces XML 1.0's
restrictions on name characters strictly for ASCII characters. The
problem here is XML's character model is based squarely on Unicode,
whereas Emacs's is not (as of version 21). It is not clear what
the right thing to do is.
Defined variables (6)
xmltok-attributes | List containing attributes of last scanned element. |
xmltok-dtd | Information about the DTD used by ‘xmltok-forward’. |
xmltok-errors | List of errors detected by ‘xmltok-forward’ and ‘xmltok-forward-prolog’. |
xmltok-namespace-attributes | List containing namespace declarations of last scanned element. |
xmltok-replacement | String containing replacement for a character or entity reference. |
xmltok-standalone | Non-nil if there was an XML declaration specifying standalone="yes". |