File: regexp-opt.el.html
The "opt" in "regexp-opt" stands for "optim\\\\(al\\\\|i[sz]e\\\\)".
This package generates a regexp from a given list of strings (which matches one of those strings) so that the regexp generated by:
(regexp-opt strings)
is equivalent to, but more efficient than, the regexp generated by:
(mapconcat 'regexp-quote strings "\\\\|")
For example:
(let ((strings '("cond" "if" "when" "unless" "while"
"let" "let*" "progn" "prog1" "prog2"
"save-restriction" "save-excursion" "save-window-excursion"
"save-current-buffer" "save-match-data"
"catch" "throw" "unwind-protect" "condition-case")))
(concat "(" (regexp-opt strings t) "\\\\>"))
=> "(\\\\(c\\\\(atch\\\\|ond\\\\(ition-case\\\\)?\\\\)\\\\|if\\\\|let\\\\*?\\\\|prog[12n]\\\\|save-\\\\(current-buffer\\\\|excursion\\\\|match-data\\\\|restriction\\\\|window-excursion\\\\)\\\\|throw\\\\|un\\\\(less\\\\|wind-protect\\\\)\\\\|wh\\\\(en\\\\|ile\\\\)\\\\)\\\\>"
Searching using the above example regexp-opt regexp takes approximately
two-thirds of the time taken using the equivalent mapconcat regexp.
Since this package was written to produce efficient regexps, not regexps
efficiently, it is probably not a good idea to in-line too many calls in
your code, unless you use the following trick with eval-when-compile:
(defvar definition-regexp
(eval-when-compile
(concat "^("
(regexp-opt '("defun" "defsubst" "defmacro" "defalias"
"defvar" "defconst") t)
"\\\\>")))
The byte-compile code will be as if you had defined the variable thus:
(defvar definition-regexp
"^(\\\\(def\\\\(alias\\\\|const\\\\|macro\\\\|subst\\\\|un\\\\|var\\\\)\\\\)\\\\>")
Note that if you use this trick for all instances of regexp-opt and
regexp-opt-depth in your code, regexp-opt.el would only have to be loaded
at compile time. But note also that using this trick means that should
regexp-opt.el be changed, perhaps to fix a bug or to add a feature to
improve the efficiency of regexp-opt regexps, you would have to recompile
your code for such changes to have effect in your code.
Originally written for font-lock.el, from an idea from Stig's hl319.el, with
thanks for ideas also to Michael Ernst, Bob Glickstein, Dan Nicolaescu and
Stefan Monnier.
No doubt regexp-opt doesn't always produce optimal regexps, so code, ideas
or any other information to improve things are welcome.
One possible improvement would be to compile '("aa" "ab" "ba" "bb") into "[ab][ab]" rather than "a[ab]\\\\|b[ab]". I'm not sure it's worth it but if someone knows how to do it without going through too many contortions, I'm all ears.
Defined variables (0)
Defined functions (4)
regexp-opt | (STRINGS &optional PAREN) |
regexp-opt-charset | (CHARS) |
regexp-opt-depth | (REGEXP) |
regexp-opt-group | (STRINGS &optional PAREN LAX) |