Function: rx--normalise-char-pattern
rx--normalise-char-pattern is a byte-compiled function defined in
rx.el.gz.
Signature
(rx--normalise-char-pattern FORM)
Documentation
Normalize FORM as a pattern matching a single-character.
Characters become strings, any forms and character classes become
rx--char-alt forms, user-definitions and eval forms are expanded,
and or, not and intersection forms are normalized recursively.
A rx--char-alt form is shaped (rx--char-alt INTERVALS . CLASSES)
where INTERVALS is a sorted list of disjoint nonadjacent intervals,
each a cons of characters, and CLASSES an unordered list of unique
name-normalised character classes.
Source Code
;; Defined in /usr/src/emacs/lisp/emacs-lisp/rx.el.gz
;; FIXME: flatten nested `or' patterns when performing char-pattern combining.
;; The only reason for not flattening is to ensure regexp-opt processing
;; (which we do for entire `or' patterns, not subsequences), but we
;; obviously want to translate
;; (or "a" space (or "b" (+ nonl) word) "c")
;; -> (or (in "ab" space) (+ nonl) (in "c" word))
;; FIXME: normalise `seq', both the construct and implicit sequences,
;; so that they are flattened, adjacent strings concatenated, and
;; empty strings removed. That would give more opportunities for regexp-opt:
;; (or "a" (seq "ab" (seq "c" "d") "")) -> (or "a" "abcd")
;; FIXME: Since `rx--normalise-char-pattern' recurses through `or', `not' and
;; `intersection', we may end up normalising subtrees multiple times
;; which wastes time (but should be idempotent).
;; One way to avoid this is to aggressively normalise the entire tree
;; before translating anything at all, but we must then recurse through
;; all constructs and probably copy them.
;; Such normalisation could normalise synonyms, eliminate `minimal-match'
;; and `maximal-match' and convert affected `1+' to either `+' or `+?' etc.
;; We would also consolidate the user-def lookup, both modern and legacy,
;; in one place.
(defun rx--normalise-char-pattern (form)
"Normalize FORM as a pattern matching a single-character.
Characters become strings, `any' forms and character classes become
`rx--char-alt' forms, user-definitions and `eval' forms are expanded,
and `or', `not' and `intersection' forms are normalized recursively.
A `rx--char-alt' form is shaped (rx--char-alt INTERVALS . CLASSES)
where INTERVALS is a sorted list of disjoint nonadjacent intervals,
each a cons of characters, and CLASSES an unordered list of unique
name-normalised character classes."
(defvar rx--builtin-forms)
(defvar rx--builtin-symbols)
(cond ((consp form)
(let ((op (car form))
(body (cdr form)))
(cond ((memq op '(or |))
;; Normalise the constructor to `or' and the args recursively.
(cons 'or (mapcar #'rx--normalise-char-pattern body)))
;; Convert `any' forms and char classes now so that we
;; don't need to do it later on.
((memq op '(any in char))
(cons 'rx--char-alt (rx--parse-any body)))
((memq op '(not intersection))
(cons op (mapcar #'rx--normalise-char-pattern body)))
((eq op 'eval)
(rx--normalise-char-pattern (rx--expand-eval body)))
((memq op rx--builtin-forms) form)
((let ((expanded (rx--expand-def-form form)))
(and expanded
(rx--normalise-char-pattern expanded))))
(t form))))
;; FIXME: Should we expand legacy definitions from
;; `rx-constituents' here as well?
((symbolp form)
(cond ((let ((class (assq form rx--char-classes)))
(and class
`(rx--char-alt nil . (,(cdr class))))))
((memq form rx--builtin-symbols) form)
((let ((expanded (rx--expand-def-symbol form)))
(and expanded
(rx--normalise-char-pattern expanded))))
(t form)))
((characterp form)
(char-to-string form))
(t form)))