Skip to content

Digression into C

The copy-region-as-kill function (see copy-region-as-kill) uses the filter-buffer-substring function, which in turn uses the delete-and-extract-region function. It removes the contents of a region and you cannot get them back.

Unlike the other code discussed here, the delete-and-extract-region function is not written in Emacs Lisp; it is written in C and is one of the primitives of the GNU Emacs system. Since it is very simple, I will digress briefly from Lisp and describe it here.

Like many of the other Emacs primitives, delete-and-extract-region is written as an instance of a C macro, a macro being a template for code. The complete macro looks like this:

bash
DEFUN ("delete-and-extract-region", Fdelete_and_extract_region,
       Sdelete_and_extract_region, 2, 2, 0,
       doc: /* Delete the text between START and END and return it.  */)
  (Lisp_Object start, Lisp_Object end)
{
  validate_region (&start, &end);
  if (XFIXNUM (start) == XFIXNUM (end))
    return empty_unibyte_string;
  return del_range_1 (XFIXNUM (start), XFIXNUM (end), 1, 1);
}

Without going into the details of the macro writing process, let me point out that this macro starts with the word DEFUN. The word DEFUN was chosen since the code serves the same purpose as defun does in Lisp. (The DEFUN C macro is defined in emacs/src/lisp.h.)

The word DEFUN is followed by seven parts inside of parentheses:

  • The first part is the name given to the function in Lisp, delete-and-extract-region.

  • The second part is the name of the function in C, Fdelete_and_extract_region. By convention, it starts with ‘F’. Since C does not use hyphens in names, underscores are used instead.

  • The third part is the name for the C constant structure that records information on this function for internal use. It is the name of the function in C but begins with an ‘S’ instead of an ‘F’.

  • The fourth and fifth parts specify the minimum and maximum number of arguments the function can have. This function demands exactly 2 arguments.

  • The sixth part is nearly like the argument that follows the interactive declaration in a function written in Lisp: a letter followed, perhaps, by a prompt. The only difference from Lisp is when the macro is called with no arguments. Then you write a 0 (which is a null string), as in this macro.

    If you were to specify arguments, you would place them between quotation marks. The C macro for goto-char includes "NGoto char: " in this position to indicate that the function expects a raw prefix, in this case, a numerical location in a buffer, and provides a prompt.

  • The seventh part is a documentation string, just like the one for a function written in Emacs Lisp. This is written as a C comment. (When you build Emacs, the program lib-src/make-docfile extracts these comments and uses them to make the documentation.)

In a C macro, the formal parameters come next, with a statement of what kind of object they are, followed by the body of the macro. For delete-and-extract-region the body consists of the following four lines:

lisp
validate_region (&start, &end);
if (XFIXNUM (start) == XFIXNUM (end))
  return empty_unibyte_string;
return del_range_1 (XFIXNUM (start), XFIXNUM (end), 1, 1);

The validate_region function checks whether the values passed as the beginning and end of the region are the proper type and are within range. If the beginning and end positions are the same, then return an empty string.

The del_range_1 function actually deletes the text. It is a complex function we will not look into. It updates the buffer and does other things. However, it is worth looking at the two arguments passed to del_range_1. These are XFIXNUM (start) and XFIXNUM (end).

As far as the C language is concerned, start and end are two opaque values that mark the beginning and end of the region to be deleted. More precisely, and requiring more expert knowledge to understand, the two values are of type Lisp_Object, which might be a C pointer, a C integer, or a C struct; C code ordinarily should not care how Lisp_Object is implemented.

Lisp_Object widths depend on the machine, and are typically 32 or 64 bits. A few of the bits are used to specify the type of information; the remaining bits are used as content.

XFIXNUM’ is a C macro that extracts the relevant integer from the longer collection of bits; the type bits are discarded.

The command in delete-and-extract-region looks like this:

lisp
del_range_1 (XFIXNUM (start), XFIXNUM (end), 1, 1);

It deletes the region between the beginning position, start, and the ending position, end.

From the point of view of the person writing Lisp, Emacs is all very simple; but hidden underneath is a great deal of complexity to make it all work.