File: xmltok.el.html

This implements an XML 1.0 parser. It also implements the XML Namespaces Recommendation. It is designed to be conforming, but it works a bit differently from a normal XML parser. An XML document consists of the prolog and an instance. The prolog is parsed as a single unit using xmltok-forward-prolog. The instance is considered as a sequence of tokens, where a token is something like a start-tag, a comment, a chunk of data or a CDATA section. The tokenization of the instance is stateless: the tokenization of one part of the instance does not depend on tokenization of the preceding part of the instance. This allows the instance to be parsed incrementally. The main entry point is xmltok-forward: this can be called at any point in the instance provided it is between tokens.

This is a non-validating XML 1.0 processor. It does not resolve parameter entities (including the external DTD subset) and it does not resolve external general entities.

It is non-conformant by design in the following respects.

1. It expects the client to detect aspects of well-formedness that
are not internal to a single token, specifically checking that end-tags match start-tags and that the instance contains exactly one element.

2. It expects the client to detect duplicate attributes. Detection
of duplicate attributes after expansion of namespace prefixes requires the namespace processing state. Detection of duplicate attributes before expansion of namespace prefixes does not, but is redundant given that the client will do detection of duplicate attributes after expansion of namespace prefixes.

3. It allows the client to recover from well-formedness errors.
This is essential for use in applications where the document is being parsed during the editing process.

4. It does not support documents that do not conform to the lexical
requirements of the XML Namespaces Recommendation (e.g. a document with a colon in an entity name).

There are also a number of things that have not yet been implemented that make it non-conformant.

1. It does not implement default attributes. ATTLIST declarations
are parsed, but no checking is done on the content of attribute value literals specifying default attribute values, and default attribute values are not reported to the client.

2. It does not implement internal entities containing elements. If
an internal entity is referenced and parsing its replacement text yields one or more tags, then it will skip the reference and report this to the client.

3. It does not check the syntax of public identifiers in the DTD.

4. It allows some non-ASCII characters in certain situations where
it should not. For example, it only enforces XML 1.0's restrictions on name characters strictly for ASCII characters. The problem here is XML's character model is based squarely on Unicode, whereas Emacs's is not (as of version 21). It is not clear what the right thing to do is.

Defined variables (6)

xmltok-attributesList containing attributes of last scanned element.
xmltok-dtdInformation about the DTD used by ‘xmltok-forward’.
xmltok-errorsList of errors detected by ‘xmltok-forward’ and ‘xmltok-forward-prolog’.
xmltok-namespace-attributesList containing namespace declarations of last scanned element.
xmltok-replacementString containing replacement for a character or entity reference.
xmltok-standaloneNon-nil if there was an XML declaration specifying standalone="yes".

Defined functions (36)

xmltok-add-attribute()
xmltok-add-error(MESSAGE &optional START END)
xmltok-add-prolog-region(TYPE START END)
xmltok-append-entity-def(D1 D2)
xmltok-attribute-local-name(ATT)
xmltok-attribute-name-colon(ATT)
xmltok-attribute-name-end(ATT)
xmltok-attribute-name-start(ATT)
xmltok-attribute-prefix(ATT)
xmltok-attribute-raw-normalized-value(ATT)
xmltok-attribute-refs(ATT)
xmltok-attribute-value(ATT)
xmltok-attribute-value-end(ATT)
xmltok-attribute-value-start(ATT)
xmltok-char-number(START END)
xmltok-define-entity(NAME VALUE)
xmltok-error-end(ERR)
xmltok-error-message(ERR)
xmltok-error-start(ERR)
xmltok-forward-prolog()
xmltok-get-declared-encoding-position(&optional LIMIT)
xmltok-handle-entity(START END &optional ATTRIBUTEP)
xmltok-handle-nested-entity(START END)
xmltok-make-attribute(NAME-BEGIN NAME-COLON NAME-END &optional VALUE-BEGIN VALUE-END RAW-NORMALIZED-VALUE)
xmltok-make-error(MESSAGE START END)
xmltok-merge-attributes()
xmltok-normalize-attribute(ATT)
xmltok-parse-entity(NAME-DEF)
xmltok-prolog-region-type(REQUIRED)
xmltok-require-next-token(&rest TYPES)
xmltok-require-token(&rest TYPES)
xmltok-save(&rest BODY)
xmltok-scan-after-amp(ENTITY-HANDLER)
xmltok-scan-char-ref(START END BASE)
xmltok-unicode-to-char(ARGUMENT)
xmltok-valid-char-p(N)

Defined faces (0)