Variable: htmlize-convert-nonascii-to-entities

htmlize-convert-nonascii-to-entities is a customizable variable defined in htmlize.el.

Value

t

Documentation

Whether non-ASCII characters should be converted to HTML entities.

When this is non-nil, characters with codes in the 128-255 range will be considered Latin 1 and rewritten as "&#CODE;". Characters with codes above 255 will be converted to "&#UCS;", where UCS denotes the Unicode code point of the character. If the code point cannot be determined, the character will be copied unchanged, as would be the case if the option were nil.

When the option is nil, the non-ASCII characters are copied to HTML without modification. In that case, the web server and/or the browser must be set to understand the encoding that was used when saving the buffer. (You might also want to specify it by setting htmlize-html-charset.)

Note that in an HTML entity "&#CODE;", CODE is always a UCS code point, which has nothing to do with the charset the page is in. For example,
"©" *always* refers to the copyright symbol, regardless of charset
specified by the META tag or the charset sent by the HTTP server. In other words, "©" is exactly equivalent to "©".

For most people htmlize will work fine with this option left at the default setting; don't change it unless you know what you're doing.

Source Code

;; Defined in ~/.emacs.d/elpa/htmlize-20250724.1703/htmlize.el
(defcustom htmlize-convert-nonascii-to-entities t
  "Whether non-ASCII characters should be converted to HTML entities.

When this is non-nil, characters with codes in the 128-255 range will be
considered Latin 1 and rewritten as \"&#CODE;\".  Characters with codes
above 255 will be converted to \"&#UCS;\", where UCS denotes the Unicode
code point of the character.  If the code point cannot be determined,
the character will be copied unchanged, as would be the case if the
option were nil.

When the option is nil, the non-ASCII characters are copied to HTML
without modification.  In that case, the web server and/or the browser
must be set to understand the encoding that was used when saving the
buffer.  (You might also want to specify it by setting
`htmlize-html-charset'.)

Note that in an HTML entity \"&#CODE;\", CODE is always a UCS code point,
which has nothing to do with the charset the page is in.  For example,
\"©\" *always* refers to the copyright symbol, regardless of charset
specified by the META tag or the charset sent by the HTTP server.  In
other words, \"©\" is exactly equivalent to \"©\".

For most people htmlize will work fine with this option left at the
default setting; don't change it unless you know what you're doing."
  :type 'sexp
  :group 'htmlize)