Function: c-syntactic-re-search-forward

c-syntactic-re-search-forward is a byte-compiled function defined in cc-engine.el.gz.

Signature

(c-syntactic-re-search-forward REGEXP &optional BOUND NOERROR PAREN-LEVEL NOT-INSIDE-TOKEN LOOKBEHIND-SUBMATCH)

Documentation

Like re-search-forward, but only report matches that are found in syntactically significant text. I.e. matches in comments, macros or string literals are ignored. The start point is assumed to be outside any comment, macro or string literal, or else the content of that region is taken as syntactically significant text.

NOERROR, in addition to the values nil, t, and <anything else> used in re-search-forward can also take the values before-literal and after-literal. In these cases, when BOUND is also given and is inside a literal, and a search fails, point will be left, respectively before or after the literal. Be aware that with after-literal, if a string or comment is unclosed at the end of the buffer, point may be left there, even though it is inside a literal there.

If PAREN-LEVEL is non-nil, an additional restriction is added to ignore matches in nested paren sexps. The search will also not go outside the current list sexp, which has the effect that if the point should be moved to BOUND when no match is found (i.e. NOERROR is neither nil nor t), then it will be at the closing paren if the end of the current list sexp is encountered first.

If NOT-INSIDE-TOKEN is non-nil, matches in the middle of tokens are ignored. Things like multicharacter operators and special symbols
(e.g. "`()" in Pike) are handled but currently not floating point
constants.

If LOOKBEHIND-SUBMATCH is non-nil, it's taken as a number of a subexpression in REGEXP. The end of that submatch is used as the position to check for syntactic significance. If LOOKBEHIND-SUBMATCH isn't used or if that subexpression didn't match then the start position of the whole match is used instead. The "look behind" subexpression is never tested before the starting position, so it might be a good idea to include \= as a match alternative in it.

Optimization note: Matches might be missed if the "look behind" subexpression can match the end of nonwhite syntactic whitespace, i.e. the end of comments or cpp directives. This since the function skips over such things before resuming the search. It's on the other hand not safe to assume that the "look behind" subexpression never matches syntactic whitespace.

Bug: Unbalanced parens inside cpp directives are currently not handled correctly (i.e. they don't get ignored as they should) when PAREN-LEVEL is set.

Note that this function might do hidden buffer changes. See the comment at the start of cc-engine.el for more info.

Source Code

;; Defined in /usr/src/emacs/lisp/progmodes/cc-engine.el.gz
;; Tools for doing searches restricted to syntactically relevant text.

(defun c-syntactic-re-search-forward (regexp &optional bound noerror
				      paren-level not-inside-token
				      lookbehind-submatch)
  "Like `re-search-forward', but only report matches that are found
in syntactically significant text.  I.e. matches in comments, macros
or string literals are ignored.  The start point is assumed to be
outside any comment, macro or string literal, or else the content of
that region is taken as syntactically significant text.

NOERROR, in addition to the values nil, t, and <anything else>
used in `re-search-forward' can also take the values
`before-literal' and `after-literal'.  In these cases, when BOUND
is also given and is inside a literal, and a search fails, point
will be left, respectively before or after the literal.  Be aware
that with `after-literal', if a string or comment is unclosed at
the end of the buffer, point may be left there, even though it is
inside a literal there.

If PAREN-LEVEL is non-nil, an additional restriction is added to
ignore matches in nested paren sexps.  The search will also not go
outside the current list sexp, which has the effect that if the point
should be moved to BOUND when no match is found (i.e. NOERROR is
neither nil nor t), then it will be at the closing paren if the end of
the current list sexp is encountered first.

If NOT-INSIDE-TOKEN is non-nil, matches in the middle of tokens are
ignored.  Things like multicharacter operators and special symbols
\(e.g. \"`()\" in Pike) are handled but currently not floating point
constants.

If LOOKBEHIND-SUBMATCH is non-nil, it's taken as a number of a
subexpression in REGEXP.  The end of that submatch is used as the
position to check for syntactic significance.  If LOOKBEHIND-SUBMATCH
isn't used or if that subexpression didn't match then the start
position of the whole match is used instead.  The \"look behind\"
subexpression is never tested before the starting position, so it
might be a good idea to include \\=\\= as a match alternative in it.

Optimization note: Matches might be missed if the \"look behind\"
subexpression can match the end of nonwhite syntactic whitespace,
i.e. the end of comments or cpp directives.  This since the function
skips over such things before resuming the search.  It's on the other
hand not safe to assume that the \"look behind\" subexpression never
matches syntactic whitespace.

Bug: Unbalanced parens inside cpp directives are currently not handled
correctly (i.e. they don't get ignored as they should) when
PAREN-LEVEL is set.

Note that this function might do hidden buffer changes.  See the
comment at the start of cc-engine.el for more info."

  (or bound (setq bound (point-max)))
  (if paren-level (setq paren-level -1))

  ;;(message "c-syntactic-re-search-forward %s %s %S" (point) bound regexp)

  (let ((start (point))
	tmp
	;; Start position for the last search.
	search-pos
	;; The `parse-partial-sexp' state between the start position
	;; and the point.
	state
	;; The current position after the last state update.  The next
	;; `parse-partial-sexp' continues from here.
	(state-pos (point))
	;; The position at which to check the state and the state
	;; there.  This is separate from `state-pos' since we might
	;; need to back up before doing the next search round.
	check-pos check-state
	;; Last position known to end a token.
	(last-token-end-pos (point-min))
	;; Set when a valid match is found.
	found)

    (condition-case err
	(while
	    (and
	     (progn
	       (setq search-pos (point))
	       (if (re-search-forward regexp bound noerror)
		   t
		 ;; Without the following, when PAREN-LEVEL is non-nil, and
		 ;; NOERROR is not nil or t, and the very first search above
		 ;; has just failed, point would end up at BOUND rather than
		 ;; just before the next close paren.
		 (when (and (eq search-pos start)
			    paren-level
			    (not (memq noerror '(nil t))))
		   (setq state (parse-partial-sexp start bound -1))
		   (if (eq (car state) -1)
		       (setq bound (1- (point)))))
		 nil))

	     (progn
	       (setq state (parse-partial-sexp
			    state-pos (match-beginning 0) paren-level nil state)
		     state-pos (point))
	       (if (setq check-pos (and lookbehind-submatch
					(or (not paren-level)
					    (>= (car state) 0))
					(match-end lookbehind-submatch)))
		   (setq check-state (parse-partial-sexp
				      state-pos check-pos paren-level nil state))
		 (setq check-pos state-pos
		       check-state state))

	       ;; NOTE: If we got a look behind subexpression and get
	       ;; an insignificant match in something that isn't
	       ;; syntactic whitespace (i.e. strings or in nested
	       ;; parentheses), then we can never skip more than a
	       ;; single character from the match start position
	       ;; (i.e. `state-pos' here) before continuing the
	       ;; search.  That since the look behind subexpression
	       ;; might match the end of the insignificant region in
	       ;; the next search.

	       (cond
		((elt check-state 7)
		 ;; Match inside a line comment.  Skip to eol.  Use
		 ;; `re-search-forward' instead of `skip-chars-forward' to get
		 ;; the right bound behavior.
		 (re-search-forward "[\n\r]" bound noerror))

		((elt check-state 4)
		 ;; Match inside a block comment.  Skip to the '*/'.
		 (search-forward "*/" bound noerror))

		((and (not (elt check-state 5))
		      (eq (char-before check-pos) ?/)
		      (not (c-get-char-property (1- check-pos) 'syntax-table))
		      (memq (char-after check-pos) '(?/ ?*)))
		 ;; Match in the middle of the opener of a block or line
		 ;; comment.
		 (if (= (char-after check-pos) ?/)
		     (re-search-forward "[\n\r]" bound noerror)
		   (search-forward "*/" bound noerror)))

		;; The last `parse-partial-sexp' above might have
		;; stopped short of the real check position if the end
		;; of the current sexp was encountered in paren-level
		;; mode.  The checks above are always false in that
		;; case, and since they can do better skipping in
		;; lookbehind-submatch mode, we do them before
		;; checking the paren level.

		((and paren-level
		      (/= (setq tmp (car check-state)) 0))
		 ;; Check the paren level first since we're short of the
		 ;; syntactic checking position if the end of the
		 ;; current sexp was encountered by `parse-partial-sexp'.
		 (if (> tmp 0)

		     ;; Inside a nested paren sexp.
		     (if lookbehind-submatch
			 ;; See the NOTE above.
			 (progn (goto-char state-pos) t)
		       ;; Skip out of the paren quickly.
		       (setq state (parse-partial-sexp state-pos bound 0 nil state)
			     state-pos (point)))

		   ;; Have exited the current paren sexp.
		   (if noerror
		       (progn
			 ;; The last `parse-partial-sexp' call above
			 ;; has left us just after the closing paren
			 ;; in this case, so we can modify the bound
			 ;; to leave the point at the right position
			 ;; upon return.
			 (setq bound (1- (point)))
			 nil)
		     (signal 'search-failed (list regexp)))))

		((setq tmp (elt check-state 3))
		 ;; Match inside a string.
		 (if (or lookbehind-submatch
			 (not (integerp tmp)))
		     ;; See the NOTE above.
		     (progn (goto-char state-pos) t)
		   ;; Skip to the end of the string before continuing.
		   (let ((ender (make-string 1 tmp)) (continue t))
		     (while (if (search-forward ender bound noerror)
				(progn
				  (setq state (parse-partial-sexp
					       state-pos (point) nil nil state)
					state-pos (point))
				  (elt state 3))
			      (setq continue nil)))
		     continue)))

		((save-excursion
		   (save-match-data
		     (c-beginning-of-macro start)))
		 ;; Match inside a macro.  Skip to the end of it.
		 (c-end-of-macro)
		 (cond ((<= (point) bound) t)
		       (noerror nil)
		       (t (signal 'search-failed (list regexp)))))

		((and not-inside-token
		      (or (< check-pos last-token-end-pos)
			  (< check-pos
			     (save-excursion
			       (goto-char check-pos)
			       (save-match-data
				 (c-end-of-current-token last-token-end-pos))
			       (setq last-token-end-pos (point))))))
		 ;; Inside a token.
		 (if lookbehind-submatch
		     ;; See the NOTE above.
		     (goto-char state-pos)
		   (goto-char (min last-token-end-pos bound))))

		(t
		 ;; A real match.
		 (setq found t)
		 nil)))

	     ;; Should loop to search again, but take care to avoid
	     ;; looping on the same spot.
	     (or (/= search-pos (point))
		 (if (= (point) bound)
		     (if noerror
			 nil
		       (signal 'search-failed (list regexp)))
		   (forward-char)
		   t))))

      (error
       (goto-char start)
       (signal (car err) (cdr err))))

    ;;(message "c-syntactic-re-search-forward done %s" (or (match-end 0) (point)))

    (if found
	(progn
	  (goto-char (match-end 0))
	  (match-end 0))

      ;; Search failed.  Set point as appropriate.
      (cond
       ((eq noerror t)
	(goto-char start))
       ((not (memq noerror '(before-literal after-literal)))
	(goto-char bound))
       (t (setq state (parse-partial-sexp state-pos bound nil nil state))
	  (if (or (elt state 3) (elt state 4))
	      (if (eq noerror 'before-literal)
		  (goto-char (elt state 8))
		(parse-partial-sexp bound (point-max) nil nil
				    state 'syntax-table))
	    (goto-char bound))))

      nil)))