Function: define-charset
define-charset is a byte-compiled function defined in mule.el.gz.
Signature
(define-charset NAME DOCSTRING &rest PROPS)
Documentation
Define NAME (symbol) as a charset with DOCSTRING.
The remaining arguments must come in pairs ATTRIBUTE VALUE. ATTRIBUTE
may be any symbol. The following have special meanings, and one of
:code-offset, :map, :subset, :superset must be specified.
:short-name
VALUE must be a short string to identify the charset. If omitted, NAME is used.
:long-name
VALUE must be a string longer than :short-name to identify the charset. If omitted, the value of the :short-name attribute is used.
:dimension
VALUE must be an integer 0, 1, 2, or 3, specifying the dimension of code-points of the charsets. If omitted, it is calculated from the value of the :code-space attribute.
:code-space
VALUE must be a vector of length at most 8 specifying the byte code
range of each dimension in this format:
[ MIN-1 MAX-1 MIN-2 MAX-2 ... ]
where MIN-N is the minimum byte value of Nth dimension of code-point,
MAX-N is the maximum byte value of that.
:min-code
VALUE must be an integer specifying the minimum code point of the charset. If omitted, it is calculated from :code-space. VALUE may be a cons (HIGH . LOW), where HIGH is the most significant 16 bits of the code point and LOW is the least significant 16 bits.
:max-code
VALUE must be an integer specifying the maximum code point of the charset. If omitted, it is calculated from :code-space. VALUE may be a cons (HIGH . LOW), where HIGH is the most significant 16 bits of the code point and LOW is the least significant 16 bits.
:iso-final-char
VALUE must be a character in the range 32 to 127 (inclusive) specifying the final char of the charset for ISO-2022 encoding. If omitted, the charset can't be encoded by ISO-2022 based coding-systems.
:iso-revision-number
VALUE must be an integer in the range 0..63, specifying the revision number of the charset for ISO-2022 encoding.
:emacs-mule-id
VALUE must be an integer of 0, 129..255. If omitted, the charset
can't be encoded by coding-systems of type emacs-mule.
:ascii-compatible-p
VALUE must be nil or t (default nil). If VALUE is t, the charset is compatible with ASCII, i.e. the first 128 code points map to ASCII.
:supplementary-p
VALUE must be nil or t. If the VALUE is t, the charset is supplementary, which means it is used only as a parent or a subset of some other charset, or it is provided just for backward compatibility.
:invalid-code
VALUE must be a nonnegative integer that can be used as an invalid code point of the charset. If the minimum code is 0 and the maximum code is greater than Emacs's maximum integer value, :invalid-code should not be omitted.
:code-offset
VALUE must be an integer added to the index number of a character to get the corresponding character code.
:map
VALUE must be vector or string.
If it is a vector, the format is [ CODE-1 CHAR-1 CODE-2 CHAR-2 ... ], where CODE-n is a code-point of the charset, and CHAR-n is the corresponding character code.
If it is a string, it is a name of file that contains the above
information. Each line of the file must be this format:
0xXXX 0xYYY
where XXX is a hexadecimal representation of CODE-n and YYY is a
hexadecimal representation of CHAR-n. A line starting with # is a
comment line.
:subset
VALUE must be a list:
( PARENT MIN-CODE MAX-CODE OFFSET )
PARENT is a parent charset. MIN-CODE and MAX-CODE specify the range
of characters inherited from the parent. OFFSET is an integer value
to add to a code point of the parent charset to get the corresponding
code point of this charset.
:superset
VALUE must be a list of parent charsets. The charset inherits
characters from them. Each element of the list may be a cons (PARENT
. OFFSET), where PARENT is a parent charset, and OFFSET is an offset
value to add to a code point of PARENT to get the corresponding code
point of this charset.
:unify-map
VALUE must be vector or string.
If it is a vector, the format is [ CODE-1 CHAR-1 CODE-2 CHAR-2 ... ], where CODE-n is a code-point of the charset, and CHAR-n is the corresponding Unicode character code.
If it is a string, it is a name of file that contains the above information. The file format is the same as what described for :map attribute.
Probably introduced at or before Emacs version 23.1.
Source Code
;; Defined in /usr/src/emacs/lisp/international/mule.el.gz
(defun define-charset (name docstring &rest props)
"Define NAME (symbol) as a charset with DOCSTRING.
The remaining arguments must come in pairs ATTRIBUTE VALUE. ATTRIBUTE
may be any symbol. The following have special meanings, and one of
`:code-offset', `:map', `:subset', `:superset' must be specified.
`:short-name'
VALUE must be a short string to identify the charset. If omitted,
NAME is used.
`:long-name'
VALUE must be a string longer than `:short-name' to identify the
charset. If omitted, the value of the `:short-name' attribute is used.
`:dimension'
VALUE must be an integer 0, 1, 2, or 3, specifying the dimension of
code-points of the charsets. If omitted, it is calculated from the
value of the `:code-space' attribute.
`:code-space'
VALUE must be a vector of length at most 8 specifying the byte code
range of each dimension in this format:
[ MIN-1 MAX-1 MIN-2 MAX-2 ... ]
where MIN-N is the minimum byte value of Nth dimension of code-point,
MAX-N is the maximum byte value of that.
`:min-code'
VALUE must be an integer specifying the minimum code point of the
charset. If omitted, it is calculated from `:code-space'. VALUE may
be a cons (HIGH . LOW), where HIGH is the most significant 16 bits of
the code point and LOW is the least significant 16 bits.
`:max-code'
VALUE must be an integer specifying the maximum code point of the
charset. If omitted, it is calculated from `:code-space'. VALUE may
be a cons (HIGH . LOW), where HIGH is the most significant 16 bits of
the code point and LOW is the least significant 16 bits.
`:iso-final-char'
VALUE must be a character in the range 32 to 127 (inclusive)
specifying the final char of the charset for ISO-2022 encoding. If
omitted, the charset can't be encoded by ISO-2022 based
coding-systems.
`:iso-revision-number'
VALUE must be an integer in the range 0..63, specifying the revision
number of the charset for ISO-2022 encoding.
`:emacs-mule-id'
VALUE must be an integer of 0, 129..255. If omitted, the charset
can't be encoded by coding-systems of type `emacs-mule'.
`:ascii-compatible-p'
VALUE must be nil or t (default nil). If VALUE is t, the charset is
compatible with ASCII, i.e. the first 128 code points map to ASCII.
`:supplementary-p'
VALUE must be nil or t. If the VALUE is t, the charset is
supplementary, which means it is used only as a parent or a
subset of some other charset, or it is provided just for backward
compatibility.
`:invalid-code'
VALUE must be a nonnegative integer that can be used as an invalid
code point of the charset. If the minimum code is 0 and the maximum
code is greater than Emacs's maximum integer value, `:invalid-code'
should not be omitted.
`:code-offset'
VALUE must be an integer added to the index number of a character to
get the corresponding character code.
`:map'
VALUE must be vector or string.
If it is a vector, the format is [ CODE-1 CHAR-1 CODE-2 CHAR-2 ... ],
where CODE-n is a code-point of the charset, and CHAR-n is the
corresponding character code.
If it is a string, it is a name of file that contains the above
information. Each line of the file must be this format:
0xXXX 0xYYY
where XXX is a hexadecimal representation of CODE-n and YYY is a
hexadecimal representation of CHAR-n. A line starting with `#' is a
comment line.
`:subset'
VALUE must be a list:
( PARENT MIN-CODE MAX-CODE OFFSET )
PARENT is a parent charset. MIN-CODE and MAX-CODE specify the range
of characters inherited from the parent. OFFSET is an integer value
to add to a code point of the parent charset to get the corresponding
code point of this charset.
`:superset'
VALUE must be a list of parent charsets. The charset inherits
characters from them. Each element of the list may be a cons (PARENT
. OFFSET), where PARENT is a parent charset, and OFFSET is an offset
value to add to a code point of PARENT to get the corresponding code
point of this charset.
`:unify-map'
VALUE must be vector or string.
If it is a vector, the format is [ CODE-1 CHAR-1 CODE-2 CHAR-2 ... ],
where CODE-n is a code-point of the charset, and CHAR-n is the
corresponding Unicode character code.
If it is a string, it is a name of file that contains the above
information. The file format is the same as what described for `:map'
attribute."
(declare (indent defun))
(when (vectorp (car props))
;; Old style code:
;; (define-charset CHARSET-ID CHARSET-SYMBOL INFO-VECTOR)
;; Convert the argument to make it fit with the current style.
(let ((vec (car props)))
(setq props (convert-define-charset-argument name vec)
name docstring
docstring (aref vec 8))))
(let ((attrs (mapcar 'list '(:dimension
:code-space
:min-code
:max-code
:iso-final-char
:iso-revision-number
:emacs-mule-id
:ascii-compatible-p
:supplementary-p
:invalid-code
:code-offset
:map
:subset
:superset
:unify-map
:plist))))
;; If :dimension is omitted, get the dimension from :code-space.
(let ((dimension (plist-get props :dimension)))
(or dimension
(let ((code-space (plist-get props :code-space)))
(setq dimension (if code-space (/ (length code-space) 2) 4))
(setq props (plist-put props :dimension dimension)))))
(let ((code-space (plist-get props :code-space)))
(or code-space
(let ((dimension (plist-get props :dimension)))
(setq code-space (make-vector 8 0))
(dotimes (i dimension)
(aset code-space (1+ (* i 2)) #xFF))
(setq props (plist-put props :code-space code-space)))))
;; If :emacs-mule-id is specified, update emacs-mule-charset-table.
(let ((emacs-mule-id (plist-get props :emacs-mule-id)))
(if (integerp emacs-mule-id)
(aset emacs-mule-charset-table emacs-mule-id name)))
(dolist (slot attrs)
(setcdr slot (plist-get props (car slot))))
;; Make sure that the value of :code-space is a vector of 8
;; elements.
(let* ((slot (assq :code-space attrs))
(val (cdr slot))
(len (length val)))
(if (< len 8)
(setcdr slot
(vconcat val (make-vector (- 8 len) 0)))))
;; Add :name and :docstring properties to PROPS.
(setq props
(cons :name (cons name (cons :docstring (cons docstring props)))))
(or (plist-get props :short-name)
(plist-put props :short-name (symbol-name name)))
(or (plist-get props :long-name)
(plist-put props :long-name (plist-get props :short-name)))
(plist-put props :base name)
(setcdr (assq :plist attrs) props)
(apply 'define-charset-internal name (mapcar 'cdr attrs))))