File: spam-stat.el.html

This implements spam analysis according to Paul Graham in "A Plan for Spam". The basis for all this is a statistical distribution of words for your spam and non-spam mails. We need this information in a hash-table so that the analysis can use the information when looking at your mails. Therefore, before you begin, you need tons of mails (Graham uses 4000 non-spam and 4000 spam mails for his experiments).

The main interface to using spam-stat, are the following functions:

spam-stat-buffer-is-spam -- called in a buffer, that buffer is considered to be a new spam mail; use this for new mail that has not been processed before

spam-stat-buffer-is-non-spam -- called in a buffer, that buffer is considered to be a new non-spam mail; use this for new mail that has not been processed before

spam-stat-buffer-change-to-spam -- called in a buffer, that buffer is no longer considered to be normal mail but spam; use this to change the status of a mail that has already been processed as non-spam

spam-stat-buffer-change-to-non-spam -- called in a buffer, that buffer is no longer considered to be spam but normal mail; use this to change the status of a mail that has already been processed as spam

spam-stat-save -- save the hash table to the file; the filename used is stored in the variable spam-stat-file

spam-stat-load -- load the hash table from a file; the filename used is stored in the variable spam-stat-file

spam-stat-score-word -- return the spam score for a word

spam-stat-score-buffer -- return the spam score for a buffer

spam-stat-split-fancy -- for fancy mail splitting; add the rule (: spam-stat-split-fancy) to nnmail-split-fancy(var)/nnmail-split-fancy(fun)

This requires the following in your ~/.gnus file:

(require 'spam-stat)
(spam-stat-load)

Defined variables (19)

spam-statHash table used to store the statistics.
spam-stat-bufferBuffer to use for scoring while splitting.
spam-stat-buffer-nameName of the ‘spam-stat-buffer’.
spam-stat-coding-systemCoding system used for ‘spam-stat-file’.
spam-stat-dirtyWhether the spam-stat database needs saving.
spam-stat-fileFile used to save and load the dictionary.
spam-stat-last-saved-atTime stamp of last change of ‘spam-stat-file’ on this run.
spam-stat-max-buffer-lengthOnly the beginning of buffers will be analyzed.
spam-stat-max-word-lengthOnly words shorter than this will be considered.
spam-stat-nbadThe number of bad mails in the dictionary.
spam-stat-ngoodThe number of good mails in the dictionary.
spam-stat-process-directory-ageMaximum age of files to be processed in directory, in days.
spam-stat-score-buffer-user-functionsList of additional scoring functions.
spam-stat-score-dataRaw data used in the last run of ‘spam-stat-score-buffer’.
spam-stat-split-fancy-spam-groupName of the group where spam should be stored.
spam-stat-split-fancy-spam-thresholdSpam score threshold in spam-stat-split-fancy.
spam-stat-syntax-tableSyntax table used when processing mails for statistical analysis.
spam-stat-unknown-word-scoreThe score to use for unknown words.
spam-stat-washing-hookHook applied to each message before analysis.

Defined functions (35)

spam-stat-bad(ENTRY)
spam-stat-buffer-change-to-non-spam()
spam-stat-buffer-change-to-spam()
spam-stat-buffer-is-non-spam()
spam-stat-buffer-is-spam()
spam-stat-buffer-words()
spam-stat-buffer-words-with-scores()
spam-stat-compute-score(ENTRY)
spam-stat-count()
spam-stat-good(ENTRY)
spam-stat-install-hooks-function()
spam-stat-load()
spam-stat-make-entry(GOOD BAD)
spam-stat-process-directory(DIR FUNC)
spam-stat-process-non-spam-directory(DIR)
spam-stat-process-spam-directory(DIR)
spam-stat-reduce-size(&optional COUNT)
spam-stat-reset()
spam-stat-save(&optional FORCE)
spam-stat-score(ENTRY)
spam-stat-score-buffer()
spam-stat-score-buffer-user(&rest ARGS)
spam-stat-score-word(WORD)
spam-stat-set-bad(ENTRY VALUE)
spam-stat-set-good(ENTRY VALUE)
spam-stat-set-score(ENTRY VALUE)
spam-stat-split-fancy()
spam-stat-store-current-buffer()
spam-stat-store-gnus-article-buffer()
spam-stat-strip-xref()
spam-stat-test-directory(DIR &optional VERBOSE)
spam-stat-to-hash-table(ENTRIES)
spam-stat-unload-function()
spam-stat-unload-hook()
with-spam-stat-max-buffer-size(&rest BODY)

Defined faces (0)