File: regexp-opt.el.html

The "opt" in "regexp-opt" stands for "optim\\\\(al\\\\|i[sz]e\\\\)".

This package generates a regexp from a given list of strings (which matches one of those strings) so that the regexp generated by:

(regexp-opt strings)

is equivalent to, but more efficient than, the regexp generated by:

(mapconcat 'regexp-quote strings "\\\\|")

For example:

(let ((strings '("cond" "if" "when" "unless" "while"
"let" "let*" "progn" "prog1" "prog2"
"save-restriction" "save-excursion" "save-window-excursion"
"save-current-buffer" "save-match-data"
"catch" "throw" "unwind-protect" "condition-case")))
  (concat "(" (regexp-opt strings t) "\\\\>"))
 => "(\\\\(c\\\\(atch\\\\|ond\\\\(ition-case\\\\)?\\\\)\\\\|if\\\\|let\\\\*?\\\\|prog[12n]\\\\|save-\\\\(current-buffer\\\\|excursion\\\\|match-data\\\\|restriction\\\\|window-excursion\\\\)\\\\|throw\\\\|un\\\\(less\\\\|wind-protect\\\\)\\\\|wh\\\\(en\\\\|ile\\\\)\\\\)\\\\>"

Searching using the above example regexp-opt regexp takes approximately two-thirds of the time taken using the equivalent mapconcat regexp.

Since this package was written to produce efficient regexps, not regexps efficiently, it is probably not a good idea to in-line too many calls in your code, unless you use the following trick with eval-when-compile:

(defvar definition-regexp
  (eval-when-compile
    (concat "^("
            (regexp-opt '("defun" "defsubst" "defmacro" "defalias"
                          "defvar" "defconst") t)
            "\\\\>")))

The byte-compile code will be as if you had defined the variable thus:

(defvar definition-regexp
  "^(\\\\(def\\\\(alias\\\\|const\\\\|macro\\\\|subst\\\\|un\\\\|var\\\\)\\\\)\\\\>")

Note that if you use this trick for all instances of regexp-opt and regexp-opt-depth in your code, regexp-opt.el would only have to be loaded at compile time. But note also that using this trick means that should regexp-opt.el be changed, perhaps to fix a bug or to add a feature to improve the efficiency of regexp-opt regexps, you would have to recompile your code for such changes to have effect in your code.

Originally written for font-lock.el, from an idea from Stig's hl319.el, with thanks for ideas also to Michael Ernst, Bob Glickstein, Dan Nicolaescu and Stefan Monnier. No doubt regexp-opt doesn't always produce optimal regexps, so code, ideas or any other information to improve things are welcome.

One possible improvement would be to compile '("aa" "ab" "ba" "bb") into "[ab][ab]" rather than "a[ab]\\\\|b[ab]". I'm not sure it's worth it but if someone knows how to do it without going through too many contortions, I'm all ears.

Defined variables (0)

Defined functions (4)

regexp-opt(STRINGS &optional PAREN)
regexp-opt-charset(CHARS)
regexp-opt-depth(REGEXP)
regexp-opt-group(STRINGS &optional PAREN LAX)

Defined faces (0)