File: mail-extr.el.html

The entry point of this code is

   mail-extract-address-components: (address &optional all)

   Given an RFC-822-or-later ADDRESS, extract name and address.
   Returns a list of the form (FULL-NAME CANONICAL-ADDRESS).
   If no name can be extracted, FULL-NAME will be nil.
   ADDRESS may be a string or a buffer. If it is a buffer, the visible
    (narrowed) portion of the buffer will be interpreted as the address.
    (This feature exists so that the clever caller might be able to avoid
    consing a string.)
   If ADDRESS contains more than one RFC-822-or-later address, only
    the first is returned.

   If ALL is non-nil, that means return info about all the addresses
    that are found in ADDRESS. The value is a list of elements of
    the form (FULL-NAME CANONICAL-ADDRESS), one per address.

This code is more correct (and more heuristic) parser than the code in rfc822.el. And despite its size, it's fairly fast.

There are two main benefits:

1. Higher probability of getting the correct full name for a human than
   any other package we know of. (On the other hand, it will cheerfully
   mangle non-human names/comments.)
2. Address part is put in a canonical form.

The interface is not yet carved in stone; please give us suggestions.

We have an extensive test-case collection of funny addresses if you want to work with the code. Developing this code requires frequent testing to make sure you're not breaking functionality. The test cases aren't included because they are over 100K.

If you find an address that mail-extr fails on, please send it to the maintainer along with what you think the correct results should be. We do not consider it a bug if mail-extr mangles a comment that does not correspond to a real human full name, although we would prefer that mail-extr would return the comment as-is.

Features:

* Full name handling:

  * knows where full names can be found in an address.
  * avoids using empty comments and quoted text.
  * extracts full names from mailbox names.
  * recognizes common formats for comments after a full name.
  * puts a period and a space after each initial.
  * understands & referring to the mailbox name, capitalized.
  * strips name prefixes like "Prof.", etc.
  * understands what characters can occur in names (not just letters).
  * figures out middle initial from mailbox name.
  * removes funny nicknames.
  * keeps suffixes such as Jr., Sr., III, etc.
  * reorders "Last, First" type names.

* Address handling:

  * parses rfc822 quoted text, comments, and domain literals.
  * parses rfc822 multi-line headers.
  * does something reasonable with rfc822 GROUP addresses.
  * handles many rfc822 noncompliant and garbage addresses.
  * canonicalizes addresses (after stripping comments/phrases outside <>).
    * converts ! addresses into .UUCP and %-style addresses.
    * converts rfc822 ROUTE addresses to %-style addresses.
    * truncates %-style addresses at leftmost fully qualified domain name.
    * handles local relative precedence of ! vs. % and @ (untested).

It does almost no string creation. It primarily uses the built-in parsing routines with the appropriate syntax tables. This should result in greater speed.

TODO:

* handle all test cases. (This will take forever.)
* software to pick the correct header to use (eg., "Senders-Name:").
* multiple addresses in the "From:" header (almost all of the necessary
  code is there).
* flag to not treat , as an address separator. (This is useful when
  there is a "From:" header but no "Sender:" header, because then there
  is only allowed to be one address.)
* mailbox name does not necessarily contain full name.
* fixing capitalization when it's all upper or lowercase. (Hard!)
* some of the domain literal handling is missing. (But I've never even
  seen one of these in a mail address, so maybe no big deal.)
* arrange to have syntax tables byte-compiled.
* speed hacks.
* delete unused variables.
* arrange for testing with different relative precedences of ! vs. @
  and %.
* insert documentation strings!
* handle X.400-gatewayed addresses according to RFC 1148.

Defined variables (7)

mail-extr-@-binds-tighter-than-!Whether the local mail transport agent looks at ! before @.
mail-extr-disable-voodooIf it is a regexp, names matching it will never be modified.
mail-extr-full-name-prefixesMatches prefixes to the full name that identify a person’s position.
mail-extr-guess-middle-initialWhether to try to guess middle initial from mail address.
mail-extr-ignore-realname-equals-mailbox-nameWhether to ignore a name that is equal to the mailbox name.
mail-extr-ignore-single-namesWhether to ignore a name that is just a single word.
mail-extr-mangle-uucpWhether to throw away information in UUCP addresses

Defined functions (9)

mail-extr-demarkerize(MARKER)
mail-extr-markerize(POS)
mail-extr-nuke-char-at(POS)
mail-extr-nuke-outside-range(LIST-SYMBOL BEG-SYMBOL END-SYMBOL &optional NO-REPLACE)
mail-extr-safe-move-sexp(ARG)
mail-extr-undo-backslash-quoting(BEG END)
mail-extr-voodoo(MBOX-BEG MBOX-END CANONICALIZATION-BUFFER)
mail-extract-address-components(ADDRESS &optional ALL)
what-domain(DOMAIN)

Defined faces (0)