Function: define-charset

define-charset is a byte-compiled function defined in mule.el.gz.

Signature

(define-charset NAME DOCSTRING &rest PROPS)

Documentation

Define NAME (symbol) as a charset with DOCSTRING.

The remaining arguments must come in pairs ATTRIBUTE VALUE. ATTRIBUTE may be any symbol. The following have special meanings, and one of
:code-offset, :map, :subset, :superset must be specified.

:short-name

VALUE must be a short string to identify the charset. If omitted, NAME is used.

:long-name

VALUE must be a string longer than :short-name to identify the charset. If omitted, the value of the :short-name attribute is used.

:dimension

VALUE must be an integer 0, 1, 2, or 3, specifying the dimension of code-points of the charsets. If omitted, it is calculated from the value of the :code-space attribute.

:code-space

VALUE must be a vector of length at most 8 specifying the byte code range of each dimension in this format:
[ MIN-1 MAX-1 MIN-2 MAX-2 ... ]
where MIN-N is the minimum byte value of Nth dimension of code-point, MAX-N is the maximum byte value of that.

:min-code

VALUE must be an integer specifying the minimum code point of the charset. If omitted, it is calculated from :code-space. VALUE may be a cons (HIGH . LOW), where HIGH is the most significant 16 bits of the code point and LOW is the least significant 16 bits.

:max-code

VALUE must be an integer specifying the maximum code point of the charset. If omitted, it is calculated from :code-space. VALUE may be a cons (HIGH . LOW), where HIGH is the most significant 16 bits of the code point and LOW is the least significant 16 bits.

:iso-final-char

VALUE must be a character in the range 32 to 127 (inclusive) specifying the final char of the charset for ISO-2022 encoding. If omitted, the charset can't be encoded by ISO-2022 based coding-systems.

:iso-revision-number

VALUE must be an integer in the range 0..63, specifying the revision number of the charset for ISO-2022 encoding.

:emacs-mule-id

VALUE must be an integer of 0, 129..255. If omitted, the charset can't be encoded by coding-systems of type emacs-mule.

:ascii-compatible-p

VALUE must be nil or t (default nil). If VALUE is t, the charset is compatible with ASCII, i.e. the first 128 code points map to ASCII.

:supplementary-p

VALUE must be nil or t. If the VALUE is t, the charset is supplementary, which means it is used only as a parent or a subset of some other charset, or it is provided just for backward compatibility.

:invalid-code

VALUE must be a nonnegative integer that can be used as an invalid code point of the charset. If the minimum code is 0 and the maximum code is greater than Emacs's maximum integer value, :invalid-code should not be omitted.

:code-offset

VALUE must be an integer added to the index number of a character to get the corresponding character code.

:map

VALUE must be vector or string.

If it is a vector, the format is [ CODE-1 CHAR-1 CODE-2 CHAR-2 ... ], where CODE-n is a code-point of the charset, and CHAR-n is the corresponding character code.

If it is a string, it is a name of file that contains the above
information. Each line of the file must be this format:
0xXXX 0xYYY
where XXX is a hexadecimal representation of CODE-n and YYY is a hexadecimal representation of CHAR-n. A line starting with # is a comment line.

:subset

VALUE must be a list:
( PARENT MIN-CODE MAX-CODE OFFSET )
PARENT is a parent charset. MIN-CODE and MAX-CODE specify the range of characters inherited from the parent. OFFSET is an integer value to add to a code point of the parent charset to get the corresponding code point of this charset.

:superset

VALUE must be a list of parent charsets. The charset inherits characters from them. Each element of the list may be a cons (PARENT
. OFFSET), where PARENT is a parent charset, and OFFSET is an offset
value to add to a code point of PARENT to get the corresponding code point of this charset.

:unify-map

VALUE must be vector or string.

If it is a vector, the format is [ CODE-1 CHAR-1 CODE-2 CHAR-2 ... ], where CODE-n is a code-point of the charset, and CHAR-n is the corresponding Unicode character code.

If it is a string, it is a name of file that contains the above information. The file format is the same as what described for :map attribute.

Probably introduced at or before Emacs version 23.1.

Source Code

;; Defined in /usr/src/emacs/lisp/international/mule.el.gz
(defun define-charset (name docstring &rest props)
  "Define NAME (symbol) as a charset with DOCSTRING.
The remaining arguments must come in pairs ATTRIBUTE VALUE.  ATTRIBUTE
may be any symbol.  The following have special meanings, and one of
`:code-offset', `:map', `:subset', `:superset' must be specified.

`:short-name'

VALUE must be a short string to identify the charset.  If omitted,
NAME is used.

`:long-name'

VALUE must be a string longer than `:short-name' to identify the
charset.  If omitted, the value of the `:short-name' attribute is used.

`:dimension'

VALUE must be an integer 0, 1, 2, or 3, specifying the dimension of
code-points of the charsets.  If omitted, it is calculated from the
value of the `:code-space' attribute.

`:code-space'

VALUE must be a vector of length at most 8 specifying the byte code
range of each dimension in this format:
	[ MIN-1 MAX-1 MIN-2 MAX-2 ... ]
where MIN-N is the minimum byte value of Nth dimension of code-point,
MAX-N is the maximum byte value of that.

`:min-code'

VALUE must be an integer specifying the minimum code point of the
charset.  If omitted, it is calculated from `:code-space'.  VALUE may
be a cons (HIGH . LOW), where HIGH is the most significant 16 bits of
the code point and LOW is the least significant 16 bits.

`:max-code'

VALUE must be an integer specifying the maximum code point of the
charset.  If omitted, it is calculated from `:code-space'.  VALUE may
be a cons (HIGH . LOW), where HIGH is the most significant 16 bits of
the code point and LOW is the least significant 16 bits.

`:iso-final-char'

VALUE must be a character in the range 32 to 127 (inclusive)
specifying the final char of the charset for ISO-2022 encoding.  If
omitted, the charset can't be encoded by ISO-2022 based
coding-systems.

`:iso-revision-number'

VALUE must be an integer in the range 0..63, specifying the revision
number of the charset for ISO-2022 encoding.

`:emacs-mule-id'

VALUE must be an integer of 0, 129..255.  If omitted, the charset
can't be encoded by coding-systems of type `emacs-mule'.

`:ascii-compatible-p'

VALUE must be nil or t (default nil).  If VALUE is t, the charset is
compatible with ASCII, i.e. the first 128 code points map to ASCII.

`:supplementary-p'

VALUE must be nil or t.  If the VALUE is t, the charset is
supplementary, which means it is used only as a parent or a
subset of some other charset, or it is provided just for backward
compatibility.

`:invalid-code'

VALUE must be a nonnegative integer that can be used as an invalid
code point of the charset.  If the minimum code is 0 and the maximum
code is greater than Emacs's maximum integer value, `:invalid-code'
should not be omitted.

`:code-offset'

VALUE must be an integer added to the index number of a character to
get the corresponding character code.

`:map'

VALUE must be vector or string.

If it is a vector, the format is [ CODE-1 CHAR-1 CODE-2 CHAR-2 ... ],
where CODE-n is a code-point of the charset, and CHAR-n is the
corresponding character code.

If it is a string, it is a name of file that contains the above
information.   Each line of the file must be this format:
	0xXXX 0xYYY
where XXX is a hexadecimal representation of CODE-n and YYY is a
hexadecimal representation of CHAR-n.  A line starting with `#' is a
comment line.

`:subset'

VALUE must be a list:
	( PARENT MIN-CODE MAX-CODE OFFSET )
PARENT is a parent charset.  MIN-CODE and MAX-CODE specify the range
of characters inherited from the parent.  OFFSET is an integer value
to add to a code point of the parent charset to get the corresponding
code point of this charset.

`:superset'

VALUE must be a list of parent charsets.  The charset inherits
characters from them.  Each element of the list may be a cons (PARENT
. OFFSET), where PARENT is a parent charset, and OFFSET is an offset
value to add to a code point of PARENT to get the corresponding code
point of this charset.

`:unify-map'

VALUE must be vector or string.

If it is a vector, the format is [ CODE-1 CHAR-1 CODE-2 CHAR-2 ... ],
where CODE-n is a code-point of the charset, and CHAR-n is the
corresponding Unicode character code.

If it is a string, it is a name of file that contains the above
information.  The file format is the same as what described for `:map'
attribute."
  (when (vectorp (car props))
    ;; Old style code:
    ;;   (define-charset CHARSET-ID CHARSET-SYMBOL INFO-VECTOR)
    ;; Convert the argument to make it fit with the current style.
    (let ((vec (car props)))
      (setq props (convert-define-charset-argument name vec)
	    name docstring
	    docstring (aref vec 8))))
  (let ((attrs (mapcar 'list '(:dimension
			       :code-space
			       :min-code
			       :max-code
			       :iso-final-char
			       :iso-revision-number
			       :emacs-mule-id
			       :ascii-compatible-p
			       :supplementary-p
			       :invalid-code
			       :code-offset
			       :map
			       :subset
			       :superset
			       :unify-map
			       :plist))))

    ;; If :dimension is omitted, get the dimension from :code-space.
    (let ((dimension (plist-get props :dimension)))
      (or dimension
	  (let ((code-space (plist-get props :code-space)))
	    (setq dimension (if code-space (/ (length code-space) 2) 4))
	    (setq props (plist-put props :dimension dimension)))))

    (let ((code-space (plist-get props :code-space)))
      (or code-space
	  (let ((dimension (plist-get props :dimension)))
	    (setq code-space (make-vector 8 0))
	    (dotimes (i dimension)
	      (aset code-space (1+ (* i 2)) #xFF))
	    (setq props (plist-put props :code-space code-space)))))

    ;; If :emacs-mule-id is specified, update emacs-mule-charset-table.
    (let ((emacs-mule-id (plist-get props :emacs-mule-id)))
      (if (integerp emacs-mule-id)
	  (aset emacs-mule-charset-table emacs-mule-id name)))

    (dolist (slot attrs)
      (setcdr slot (purecopy (plist-get props (car slot)))))

    ;; Make sure that the value of :code-space is a vector of 8
    ;; elements.
    (let* ((slot (assq :code-space attrs))
	   (val (cdr slot))
	   (len (length val)))
      (if (< len 8)
	  (setcdr slot
		  (vconcat val (make-vector (- 8 len) 0)))))

    ;; Add :name and :docstring properties to PROPS.
    (setq props
	  (cons :name (cons name (cons :docstring (cons (purecopy docstring) props)))))
    (or (plist-get props :short-name)
	(plist-put props :short-name (symbol-name name)))
    (or (plist-get props :long-name)
	(plist-put props :long-name (plist-get props :short-name)))
    (plist-put props :base name)
    ;; We can probably get a worthwhile amount in purespace.
    (setq props
	  (mapcar (lambda (elt)
		    (if (stringp elt)
			(purecopy elt)
		      elt))
		  props))
    (setcdr (assq :plist attrs) props)

    (apply 'define-charset-internal name (mapcar 'cdr attrs))))