File: spam-stat.el.html
This implements spam analysis according to Paul Graham in "A Plan for Spam". The basis for all this is a statistical distribution of words for your spam and non-spam mails. We need this information in a hash-table so that the analysis can use the information when looking at your mails. Therefore, before you begin, you need tons of mails (Graham uses 4000 non-spam and 4000 spam mails for his experiments).
The main interface to using spam-stat, are the following functions:
spam-stat-buffer-is-spam -- called in a buffer, that buffer is
considered to be a new spam mail; use this for new mail that has
not been processed before
spam-stat-buffer-is-non-spam -- called in a buffer, that buffer
is considered to be a new non-spam mail; use this for new mail that
has not been processed before
spam-stat-buffer-change-to-spam -- called in a buffer, that
buffer is no longer considered to be normal mail but spam; use this
to change the status of a mail that has already been processed as
non-spam
spam-stat-buffer-change-to-non-spam -- called in a buffer, that
buffer is no longer considered to be spam but normal mail; use this
to change the status of a mail that has already been processed as
spam
spam-stat-save -- save the hash table to the file; the filename
used is stored in the variable spam-stat-file
spam-stat-load -- load the hash table from a file; the filename
used is stored in the variable spam-stat-file
spam-stat-score-word -- return the spam score for a word
spam-stat-score-buffer -- return the spam score for a buffer
spam-stat-split-fancy -- for fancy mail splitting; add
the rule (: spam-stat-split-fancy) to nnmail-split-fancy(var)/nnmail-split-fancy(fun)
This requires the following in your ~/.gnus file:
(require 'spam-stat)
(spam-stat-load)
Defined variables (19)
spam-stat | Hash table used to store the statistics. |
spam-stat-buffer | Buffer to use for scoring while splitting. |
spam-stat-buffer-name | Name of the ‘spam-stat-buffer’. |
spam-stat-coding-system | Coding system used for ‘spam-stat-file’. |
spam-stat-dirty | Whether the spam-stat database needs saving. |
spam-stat-file | File used to save and load the dictionary. |
spam-stat-last-saved-at | Time stamp of last change of ‘spam-stat-file’ on this run. |
spam-stat-max-buffer-length | Only the beginning of buffers will be analyzed. |
spam-stat-max-word-length | Only words shorter than this will be considered. |
spam-stat-nbad | The number of bad mails in the dictionary. |
spam-stat-ngood | The number of good mails in the dictionary. |
spam-stat-process-directory-age | Maximum age of files to be processed in directory, in days. |
spam-stat-score-buffer-user-functions | List of additional scoring functions. |
spam-stat-score-data | Raw data used in the last run of ‘spam-stat-score-buffer’. |
spam-stat-split-fancy-spam-group | Name of the group where spam should be stored. |
spam-stat-split-fancy-spam-threshold | Spam score threshold in spam-stat-split-fancy. |
spam-stat-syntax-table | Syntax table used when processing mails for statistical analysis. |
spam-stat-unknown-word-score | The score to use for unknown words. |
spam-stat-washing-hook | Hook applied to each message before analysis. |