Function: sgml-xml-auto-coding-function

sgml-xml-auto-coding-function is a byte-compiled function defined in mule.el.gz.

Signature

(sgml-xml-auto-coding-function SIZE)

Documentation

Determine whether the buffer is XML, and if so, its encoding.

This function is intended to be added to auto-coding-functions.

Source Code

;; Defined in /usr/src/emacs/lisp/international/mule.el.gz
;;; Built-in auto-coding-functions:

(defun sgml-xml-auto-coding-function (size)
  "Determine whether the buffer is XML, and if so, its encoding.
This function is intended to be added to `auto-coding-functions'."
  (setq size (+ (point) size))
  (when (re-search-forward "\\`[[:space:]\n]*<\\?xml" size t)
    (let ((end (save-excursion
		 ;; This is a hack.
		 (re-search-forward "[\"']\\s-*\\?>" size t))))
      (when end
	(if (re-search-forward "encoding=[\"']\\(.+?\\)[\"']" end t)
	    (let* ((match (match-string 1))
                   (sym-name (downcase match))
                   (sym-name
                    ;; https://www.w3.org/TR/xml/#charencoding says:
                    ;; "Entities encoded in UTF-16 MUST [...] begin
                    ;; with the Byte Order Mark."  The trick below is
                    ;; based on the fact that utf-16be/le don't
                    ;; specify BOM, while utf-16-be/le do.
                    (cond
                     ((equal sym-name "utf-16le") "utf-16-le")
                     ((equal sym-name "utf-16be") "utf-16-be")
                     (t sym-name)))
		   (sym (intern sym-name)))
	      (if (coding-system-p sym)
                  ;; If the encoding tag is UTF-8 and the buffer's
                  ;; encoding is one of the variants of UTF-8, use the
                  ;; buffer's encoding.  This allows, e.g., saving an
                  ;; XML file as UTF-8 with BOM when the tag says UTF-8.
                  (let ((sym-type (coding-system-type sym))
                        (bfcs-type
                         (coding-system-type buffer-file-coding-system)))
                    ;; If the buffer is unibyte, its encoding is
                    ;; immaterial (it is just the default value of
                    ;; buffer-file-coding-system), so we ignore it.
                    ;; This situation happens when this function is
                    ;; called as part of visiting a file, as opposed
                    ;; to when saving a buffer to a file.
                    (if (and enable-multibyte-characters
                             ;; 'charset' and 'iso-2022' will signal
                             ;; an error in coding-system-equal, since
                             ;; they aren't coding-systems.  So test
                             ;; that up front.
                             (not (equal sym-type 'charset))
                             (not (equal sym-type 'iso-2022))
                             (coding-system-equal 'utf-8 sym-type)
                             (coding-system-equal 'utf-8 bfcs-type))
                        buffer-file-coding-system
		      sym))
		(message "Warning: unknown coding system \"%s\"" match)
		nil))
          ;; Files without an encoding tag should be UTF-8. But users
          ;; may be naive about encodings, and have saved the file from
          ;; another editor that does not help them get the encoding right.
          ;; Detect the encoding and warn the user if it is detected as
          ;; something other than UTF-8.
	  (let ((detected
                 (with-coding-priority '(utf-8)
                   (coding-system-base
                    (detect-coding-region (point-min) size t)))))
            ;; Pure ASCII always comes back as undecided.
            (if (memq detected
                      '(utf-8 utf-8-with-signature utf-8-hfs undecided))
                'utf-8
              (warn "File contents detected as %s.
  Consider adding an encoding attribute to the xml declaration,
  or saving as utf-8, as mandated by the xml specification." detected)
              detected)))))))