Function: xml-find-file-coding-system

xml-find-file-coding-system is a byte-compiled function defined in mule.el.gz.

Signature

(xml-find-file-coding-system ARGS)

Documentation

Determine the coding system of an XML file without a declaration.

Strictly speaking, the file should be utf-8, but mistakes are made, and there are genuine cases where XML fragments are saved, with the encoding properly specified in a master document, or added by processing software.

Source Code

;; Defined in /usr/src/emacs/lisp/international/mule.el.gz
(defun xml-find-file-coding-system (args)
  "Determine the coding system of an XML file without a declaration.
Strictly speaking, the file should be utf-8, but mistakes are
made, and there are genuine cases where XML fragments are saved,
with the encoding properly specified in a master document, or
added by processing software."
  (if (eq (car args) 'insert-file-contents)
      (let ((detected
             (with-coding-priority '(utf-8)
               (coding-system-base
                (detect-coding-region (point-min) (point-max) t))))
            (bom (list (char-after 1) (char-after 2))))
        (cond
         ((equal bom '(#xFE #xFF))
          'utf-16be-with-signature)
         ((equal bom '(#xFF #xFE))
          'utf-16le-with-signature)
         ;; Pure ASCII always comes back as undecided.
         ((memq detected '(utf-8 undecided))
          'utf-8)
         ((eq detected 'utf-16le-with-signature) 'utf-16le-with-signature)
         ((eq detected 'utf-16be-with-signature) 'utf-16be-with-signature)
         (t
          (warn "File contents detected as %s.
  Consider adding an xml declaration with the encoding specified,
  or saving as utf-8, as mandated by the xml specification." detected)
          detected)))
    ;; Don't interfere with the user's wishes for saving the buffer.
    ;; We did what we could when the buffer was created to ensure the
    ;; correct encoding was used, or the user was warned, so any
    ;; non-conformity here is deliberate on the part of the user.
    'undecided))