File: `lex.el.html`

This file handles the creation of lexical analyzers for different languages in Emacs Lisp. The purpose of a lexical analyzer is to convert a buffer into a list of lexical tokens. Each token contains the token class (such as 'number, 'symbol, 'IF, etc) and the location in the buffer it was found. Optionally, a token also contains a string representing what is at the designated buffer location.

Tokens are pushed onto a token stream, which is basically a list of all the lexical tokens from the analyzed region. The token stream is then handed to the grammar which parsers the file.

; How it works

Each analyzer specifies a condition and forms. These conditions and forms are assembled into a function by define-lex that does the lexical analysis.

In the lexical analyzer created with define-lex, each condition is tested for a given point. When the condition is true, the forms run.

The forms can push a lexical token onto the token stream. The analyzer forms also must move the current analyzer point. If the analyzer point is moved without pushing a token, then the matched syntax is effectively ignored, or skipped.

Thus, starting at the beginning of a region to be analyzed, each condition is tested. One will match, and a lexical token might be pushed, and the point is moved to the end of the lexical token identified. At the new position, the process occurs again until the end of the specified region is reached.

; How to use semantic-lex

To create a lexer for a language, use the define-lex macro.

The define-lex macro accepts a list of lexical analyzers. Each analyzer is created with define-lex-analyzer, or one of the derivative macros. A single analyzer defines a regular expression to match text in a buffer, and a short segment of code to create one lexical token.

Each analyzer has a NAME, DOC, a CONDITION, and possibly some FORMS. The NAME is the name used in define-lex. The DOC describes what the analyzer should do.

The CONDITION evaluates the text at the current point in the current buffer. If CONDITION is true, then the FORMS will be executed.

The purpose of the FORMS is to push new lexical tokens onto the list of tokens for the current buffer, and to move point after the matched text.

Some macros for creating one analyzer are:

  define-lex-analyzer - A generic analyzer associating any style of
             condition to forms.
  define-lex-regex-analyzer - Matches a regular expression.
  define-lex-simple-regex-analyzer - Matches a regular expressions,
             and pushes the match.
  define-lex-block-analyzer - Matches list syntax, and defines
             handles open/close delimiters.

These macros are used by the grammar compiler when lexical information is specified in a grammar:
  define-lex- * -type-analyzer - Matches syntax specified in
             a grammar, and pushes one token for it. The * would
             be sexp for things like lists or strings, and
             string for things that need to match some special
             string, such as "\\\\." where a literal match is needed.

; Lexical Tables

There are tables of different symbols managed in semantic-lex.el. They are:

  Lexical keyword table - A Table of symbols declared in a grammar
          file with the %keyword declaration.
          Keywords are used by semantic-lex-symbol-or-keyword(var)/semantic-lex-symbol-or-keyword(fun)
          to create lexical tokens based on the keyword.

  Lexical type table - A table of symbols declared in a grammar
          file with the %type declaration.
          The grammar compiler uses the type table to create new
          lexical analyzers. These analyzers are then used to when
          a new lexical analyzer is made for a language.

; Lexical Types

A lexical type defines a kind of lexical analyzer that will be automatically generated from a grammar file based on some predetermined attributes. For now these two attributes are recognized :

* matchdatatype : define the kind of lexical analyzer. That is :

- regexp : define a regexp analyzer (see
define-lex-regex-type-analyzer)

- string : define a string analyzer (see
define-lex-string-type-analyzer)

- block : define a block type analyzer (see
define-lex-block-type-analyzer)

- sexp : define a sexp analyzer (see
define-lex-sexp-type-analyzer)

- keyword : define a keyword analyzer (see
define-lex-keyword-type-analyzer)

* syntax : define the syntax that matches a syntactic
  expression. When syntax is matched the corresponding type
  analyzer is entered and the resulting match data will be
  interpreted based on the kind of analyzer (see matchdatatype
  above).

The following lexical types are predefined :

+-------------+---------------+--------------------------------+
| type | matchdatatype | syntax |
+-------------+---------------+--------------------------------+
| punctuation | string | "\\\$\\\\s.\\\\|\\\\s$\\\\|\\\\s'\\\$+" |
| keyword | keyword | "\\\$\\\\sw\\\\|\\\\s_\\\$+" |
| symbol | regexp | "\\\$\\\\sw\\\\|\\\\s_\\\$+" |
| string | sexp | "\\\\s\\"" |
| number | regexp | semantic-lex-number-expression |
| block | block | "\\s(\\|\\s)" |
+-------------+---------------+--------------------------------+

In a grammar you must use a %type expression to automatically generate the corresponding analyzers of that type.

Here is an example to auto-generate punctuation analyzers with 'matchdatatype and 'syntax predefined (see table above)

%type <punctuation> ;; will auto-generate this kind of analyzers

It is equivalent to write :

%type <punctuation> syntax "\$\\s.\\|\\s$\\|\\s'\$+" matchdatatype string

;; Some punctuation based on the type defines above

%token <punctuation> NOT "!"
%token <punctuation> NOTEQ "!="
%token <punctuation> MOD "%"
%token <punctuation> MODEQ "%="

; On the Semantic 1.x lexer

In semantic 1.x, the lexical analyzer was an all purpose routine. To boost efficiency, the analyzer is now a series of routines that are constructed at build time into a single routine. This will eliminate unneeded if statements to speed the lexer.

Defined variables (48)

`semantic-flex-depth`	Default flexing depth.
`semantic-flex-enable-bol`	When flexing, report beginning of lines as syntactic elements.
`semantic-flex-enable-newlines`	When flexing, report newlines as syntactic elements.
`semantic-flex-enable-whitespace`	When flexing, report whitespace as syntactic elements.
`semantic-flex-extensions`	Buffer local extensions to the lexical analyzer.
`semantic-flex-keywords-obarray`	Buffer local keyword obarray for the lexical analyzer.
`semantic-flex-syntax-modifications`	Changes to the syntax table for this buffer.
`semantic-flex-tokens`	An alist of semantic token types.
`semantic-flex-unterminated-syntax-end-function`	Function called when unterminated syntax is encountered.
`semantic-ignore-comments`	Default comment handling.
`semantic-lex-analysis-bounds`	The bounds of the current analysis.
`semantic-lex-analyzer`	The lexical analyzer used for a given buffer.
`semantic-lex-beginning-of-line`	Detect and create a beginning of line token (BOL).
`semantic-lex-block-streams`	Streams of tokens inside collapsed blocks.
`semantic-lex-charquote`	Detect and create charquote tokens.
`semantic-lex-close-paren`	Detect and create a close parenthesis token.
`semantic-lex-comment-regex`	Regular expression for identifying comment start during lexical analysis.
`semantic-lex-comments`	Detect and create a comment token.
`semantic-lex-comments-as-whitespace`	Detect comments and create a whitespace token.
`semantic-lex-current-depth`	The current depth as tracked through lexical functions.
`semantic-lex-debug`	When non-nil, debug the local lexical analyzer.
`semantic-lex-debug-analyzers`	Non-nil means to debug analyzers with syntax protection.
`semantic-lex-default-action`	The default action when no other lexical actions match text.
`semantic-lex-depth`	Default lexing depth.
`semantic-lex-end-point`	The end point as tracked through lexical functions.
`semantic-lex-ignore-comments`	Detect and create a comment token.
`semantic-lex-ignore-newline`	Detect and ignore newline tokens.
`semantic-lex-ignore-whitespace`	Detect and skip over whitespace tokens.
`semantic-lex-maximum-depth`	The maximum depth of parenthesis as tracked through lexical functions.
`semantic-lex-newline`	Detect and create newline tokens.
`semantic-lex-newline-as-whitespace`	Detect and create newline tokens.
`semantic-lex-number`	Detect and create number tokens.
`semantic-lex-number-expression`	Regular expression for matching a number.
`semantic-lex-open-paren`	Detect and create an open parenthesis token.
`semantic-lex-paren-or-list`	Detect open parenthesis.
`semantic-lex-punctuation`	Detect and create punctuation tokens.
`semantic-lex-punctuation-type`	Detect and create a punctuation type token.
`semantic-lex-reset-functions`	Abnormal hook used by major-modes to reset lexical analyzers.
`semantic-lex-string`	Detect and create a string token.
`semantic-lex-symbol-or-keyword`	Detect and create symbol and keyword tokens.
`semantic-lex-syntax-modifications`	Changes to the syntax table for this buffer.
`semantic-lex-syntax-table`	Syntax table used by lexical analysis.
`semantic-lex-token-stream`	The current token stream we are collecting.
`semantic-lex-tokens`	An alist of semantic token types.
`semantic-lex-types-obarray`	Buffer local types obarray for the lexical analyzer.
`semantic-lex-unterminated-syntax-end-function`	Function called when unterminated syntax is encountered.
`semantic-lex-whitespace`	Detect and create whitespace tokens.
`semantic-number-expression`	See variable ‘semantic-lex-number-expression’.

Defined functions (78)

`define-lex`	(NAME DOC &rest ANALYZERS)
`define-lex-analyzer`	(NAME DOC CONDITION &rest FORMS)
`define-lex-block-analyzer`	(NAME DOC SPEC1 &rest SPECS)
`define-lex-block-type-analyzer`	(NAME DOC SYNTAX MATCHES)
`define-lex-keyword-type-analyzer`	(NAME DOC SYNTAX)
`define-lex-regex-analyzer`	(NAME DOC REGEXP &rest FORMS)
`define-lex-regex-type-analyzer`	(NAME DOC SYNTAX MATCHES DEFAULT)
`define-lex-sexp-type-analyzer`	(NAME DOC SYNTAX TOKEN)
`define-lex-simple-regex-analyzer`	(NAME DOC REGEXP TOKSYM &optional INDEX &rest FORMS)
`define-lex-string-type-analyzer`	(NAME DOC SYNTAX MATCHES DEFAULT)
`semantic-comment-lexer`	(START END &optional DEPTH LENGTH)
`semantic-lex`	(START END &optional DEPTH LENGTH)
`semantic-lex-beginning-of-line`	()
`semantic-lex-buffer`	(&optional DEPTH)
`semantic-lex-catch-errors`	(SYMBOL &rest FORMS)
`semantic-lex-charquote`	()
`semantic-lex-close-paren`	()
`semantic-lex-comments`	()
`semantic-lex-comments-as-whitespace`	()
`semantic-lex-debug`	(ARG)
`semantic-lex-debug-break`	(TOKEN)
`semantic-lex-default-action`	()
`semantic-lex-end-block`	(SYNTAX)
`semantic-lex-expand-block-specs`	(SPECS)
`semantic-lex-highlight-token`	(TOKEN)
`semantic-lex-ignore-comments`	()
`semantic-lex-ignore-newline`	()
`semantic-lex-ignore-whitespace`	()
`semantic-lex-init`	()
`semantic-lex-keyword-get`	(NAME PROPERTY)
`semantic-lex-keyword-invalid`	(NAME)
`semantic-lex-keyword-p`	(NAME)
`semantic-lex-keyword-put`	(NAME PROPERTY VALUE)
`semantic-lex-keyword-set`	(NAME VALUE)
`semantic-lex-keyword-symbol`	(NAME)
`semantic-lex-keyword-value`	(NAME)
`semantic-lex-keywords`	(&optional PROPERTY)
`semantic-lex-list`	(SEMLIST DEPTH)
`semantic-lex-make-keyword-table`	(SPECS &optional PROPSPECS)
`semantic-lex-make-type-table`	(SPECS &optional PROPSPECS)
`semantic-lex-map-keywords`	(FUN &optional PROPERTY)
`semantic-lex-map-symbols`	(FUN TABLE &optional PROPERTY)
`semantic-lex-map-types`	(FUN &optional PROPERTY)
`semantic-lex-newline`	()
`semantic-lex-newline-as-whitespace`	()
`semantic-lex-number`	()
`semantic-lex-one-token`	(ANALYZERS)
`semantic-lex-open-paren`	()
`semantic-lex-paren-or-list`	()
`semantic-lex-preset-default-types`	()
`semantic-lex-punctuation`	()
`semantic-lex-punctuation-type`	()
`semantic-lex-push-token`	(TOKEN &rest BLOCKSPECS)
`semantic-lex-start-block`	(SYNTAX)
`semantic-lex-string`	()
`semantic-lex-symbol-or-keyword`	()
`semantic-lex-test`	(ARG)
`semantic-lex-token`	(SYMBOL START END &optional STR)
`semantic-lex-token-bounds`	(TOKEN)
`semantic-lex-token-class`	(TOKEN)
`semantic-lex-token-end`	(TOKEN)
`semantic-lex-token-p`	(THING)
`semantic-lex-token-start`	(TOKEN)
`semantic-lex-token-text`	(TOKEN)
`semantic-lex-token-with-text-p`	(THING)
`semantic-lex-token-without-text-p`	(THING)
`semantic-lex-type-get`	(TYPE PROPERTY &optional NOERROR)
`semantic-lex-type-invalid`	(TYPE)
`semantic-lex-type-p`	(TYPE)
`semantic-lex-type-put`	(TYPE PROPERTY VALUE &optional ADD)
`semantic-lex-type-set`	(TYPE VALUE)
`semantic-lex-type-symbol`	(TYPE)
`semantic-lex-type-value`	(TYPE &optional NOERROR)
`semantic-lex-types`	(&optional PROPERTY)
`semantic-lex-unterminated-syntax-detected`	(SYNTAX)
`semantic-lex-unterminated-syntax-protection`	(SYNTAX &rest FORMS)
`semantic-lex-whitespace`	()
`semantic-simple-lexer`	(START END &optional DEPTH LENGTH)

File: lex.el.html

Defined variables (48)

Defined functions (78)

Defined faces (0)

File: `lex.el.html`