Function: sgml-xml-auto-coding-function
sgml-xml-auto-coding-function is a byte-compiled function defined in
mule.el.gz.
Signature
(sgml-xml-auto-coding-function SIZE)
Documentation
Determine whether the buffer is XML, and if so, its encoding.
This function is intended to be added to auto-coding-functions.
Source Code
;; Defined in /usr/src/emacs/lisp/international/mule.el.gz
;;; Built-in auto-coding-functions:
(defun sgml-xml-auto-coding-function (size)
"Determine whether the buffer is XML, and if so, its encoding.
This function is intended to be added to `auto-coding-functions'."
(setq size (+ (point) size))
(when (re-search-forward "\\`[[:space:]\n]*<\\?xml" size t)
(let ((end (save-excursion
;; This is a hack.
(re-search-forward "[\"']\\s-*\\?>" size t))))
(when end
(if (re-search-forward "encoding=[\"']\\(.+?\\)[\"']" end t)
(let* ((match (match-string 1))
(sym-name (downcase match))
(sym-name
;; https://www.w3.org/TR/xml/#charencoding says:
;; "Entities encoded in UTF-16 MUST [...] begin
;; with the Byte Order Mark." The trick below is
;; based on the fact that utf-16be/le don't
;; specify BOM, while utf-16-be/le do.
(cond
((equal sym-name "utf-16le") "utf-16-le")
((equal sym-name "utf-16be") "utf-16-be")
(t sym-name)))
(sym (intern sym-name)))
(if (coding-system-p sym)
;; If the encoding tag is UTF-8 and the buffer's
;; encoding is one of the variants of UTF-8, use the
;; buffer's encoding. This allows, e.g., saving an
;; XML file as UTF-8 with BOM when the tag says UTF-8.
(let ((sym-type (coding-system-type sym))
(bfcs-type
(coding-system-type buffer-file-coding-system)))
;; If the buffer is unibyte, its encoding is
;; immaterial (it is just the default value of
;; buffer-file-coding-system), so we ignore it.
;; This situation happens when this function is
;; called as part of visiting a file, as opposed
;; to when saving a buffer to a file.
(if (and enable-multibyte-characters
;; 'charset' and 'iso-2022' will signal
;; an error in coding-system-equal, since
;; they aren't coding-systems. So test
;; that up front.
(not (equal sym-type 'charset))
(not (equal sym-type 'iso-2022))
(coding-system-equal 'utf-8 sym-type)
(coding-system-equal 'utf-8 bfcs-type))
buffer-file-coding-system
sym))
(message "Warning: unknown coding system \"%s\"" match)
nil))
;; Files without an encoding tag should be UTF-8. But users
;; may be naive about encodings, and have saved the file from
;; another editor that does not help them get the encoding right.
;; Detect the encoding and warn the user if it is detected as
;; something other than UTF-8.
(let ((detected
(with-coding-priority '(utf-8)
(coding-system-base
(detect-coding-region (point-min) size t)))))
;; Pure ASCII always comes back as undecided.
(if (memq detected
'(utf-8 utf-8-with-signature utf-8-hfs undecided))
'utf-8
(warn "File contents detected as %s.
Consider adding an encoding attribute to the xml declaration,
or saving as utf-8, as mandated by the xml specification." detected)
detected)))))))