Ticket #189 (new enhancement)

Opened 9 years ago

Add accuracy stats for SpamAssassin rules

Reported by: rjl Owned by: rjl
Priority: low Milestone: 1.1.0
Component: General Version: 1.0.0 RC5
Severity: normal Keywords: stats statistics spamassassin rules
Cc:

Description

We already track the SpamAssassin rules that are triggered by a given e-mail (in maia_sa_rules_triggered), but we're not currently analyzing these hits to determine how well each rule is performing. Since we have "confirmation" of the ham/spam status of a given e-mail by a given recipient, we ought to be able to keep totals for "correct" and "incorrect" diagnoses as additional columns in maia_sa_rules_triggered.

The rule's score indicates whether it is a spam rule (positive score) or a ham rule (negative score), so we can use this to decide which column to increment when a spam or ham item is confirmed. For instance, when a user confirms an item as spam, we look at the rules that were triggered by that mail item, and for any rules with positive scores we increment the "correct" column, while for any rules with negative scores we increment the "incorrect" column. We do the reverse for confirmed ham, obviously.

Likewise, when users report false positives and false negatives, we can reverse the points that were previously awarded, decrementing the "correct" column for those rules and incrementing the "incorrect" column, and vice-versa, based on whether the rule scores are positive or negative.

This would give Maia a better tool for analyzing the effectivess of individual rules, equivalent to the "mass_check" tool that SpamAssassin developers and rule-testers use. It would allow Maia to assess how often a given rule triggers on spam vs. ham, and therefore how prone it is to generating false positives/negatives. This will help administrators cull the rules that are not performing well.

Note: See TracTickets for help on using tickets.