The Heuristic Rules system utilizes a process whereby the content of each message is compared to a set of static rules to determine the likelihood that a message is spam. Each rule is worth a specific value and therefore each message's SpamAssassin score is adjusted based upon the value of each rule that the message matches. Rules and values are regularly adjusted and changed to keep up with the current trends in spam and junk email. SecurityGateway's SGSpamD can be configured to check for heuristic rule updates automatically at designated intervals, or you can check for updates manually.
Bayesian Classification is a statistical process that can optionally be used to analyze spam and non-spam messages in order to increase the reliability of spam recognition over time. You can designate a folder for spam messages and non-spam message that can be manually scanned or automatically scanned at a designated interval. All of the messages in those folders will be analyzed and indexed, or "Bayesian Learned", so that new messages can be compared to them statistically in order to determine the likelihood that they are spam. This can then increase or decrease a message's SpamAssassin score based upon the results of its Bayesian comparison.
Heuristic Rule Updates
Check for heuristic rule updates at midnight each night
Choose this option if you want SecurityGateway to check for heuristic rule updates automatically each day at midnight.
Check for heuristic rule updates once every [XX] hours
Choose this option and designate a value if you want SecurityGateway to check for heuristic rule updates automatically every certain number of hours instead of simply once per day.
Do not check for heuristic rule updates
Choose this option if you do not want SecurityGateway to check for heuristic rule updates automatically. You can still manually check for updates by using the "Click here to check..." option below.
Run SA-Update as part of the update process
Activate this check box if you wish to pull updates from updates.spamassassin.org in addition to updates from MDaemon Technologies. The feature ensures that your SpamAssassin rule-sets are always kept current. This option is enabled by default.
Click here to check for heuristic rule updates now
Click this link to manually check for updates to the heuristic rules.
Bayesian Classification
Enable Bayesian classification
Check this box to enable SGSpamD's Bayesian classification system. Use this feature if you want each message's SpamAssassin score to be adjusted based on its comparison to the currently known Bayesian statistics.
The Bayesian classifier needs a sample of both spam and non-spam messages to analyze before it can begin adjusting a message's SpamAssassin score. This is the Bayesian Learning process, and it is necessary in order to have a sufficient pool of statistics to draw from when making the Bayesian comparison. Once you have given the Bayesian Learning system these messages to analyze, it will be sufficiently equipped to begin applying the results of a Bayesian comparison to each message's SpamAssassin score. By continuing to analyze even more messages the Bayesian classifications will become more accurate over time. |
Non-spam messages which must be learned:
This is the number of messages designated as "non-spam" that must be analyzed before the Bayesian classifier will begin scoring messages. The default value is 200 messages.
Spam messages which must be learned:
This is the number of messages designated as "spam" that must be analyzed before the Bayesian classifier will begin scoring messages. The default value is 200 messages.
Bayesian Learning
Schedule Bayesian learning for midnight each night
Choose this option if you want the Bayesian Learning system to analyze the messages contained in the designated spam and non-spam folders automatically, once per day, beginning each night at midnight.
Schedule Bayesian learning for once every [XX] hours
Choose this option and specify a value if you want the Bayesian Learning system to analyze the messages contained in the designated spam and non-spam folders automatically, once every specified number of hours, rather than each night at midnight.
Do not perform scheduled Bayesian learning
Choose this option if you do not wish to schedule Bayesian Learning. You can, however, still start the Bayesian Learning process manually at any time by clicking the "Click here to perform Bayesian learning now" link below.
Path to known spam directory (false negatives):
This is the path to the folder containing messages designated as spam. Spam messages can be placed here manually, or automatically using the Automatic Bayesian Learning options.
Path to non-spam directory (false positives):
This is the path to the folder containing messages designated as non-spam. Non-spam messages can be placed here manually, or automatically using the Automatic Bayesian Learning options.
Spam forwarding address:
Use this text box to designate an address to which your users can forward spam messages so that the Bayesian system can learn from them. The default address that SecurityGateway will use is "SpamLearn[@AnySGDomain.com]", but you can change it to whatever you choose. Messages sent to this address must be received via SMTP from a session that is authenticated using SMTP AUTH. Further, the messages must be forwarded to the above addresses as attachments of type "message/rfc822". Any message of another type that is sent to this email address will not be processed. Finally, when entering an address into this option, only use the mailbox portion of the address - do not include the "@" or domain portion. For example, "Spam", "SpamLearn", "SpamMail", or the like are all acceptable addresses to use in this option. Messages can then be forwarded to that address at any of SecurityGateway's domains (e.g. SpamLearn@example.com, SpamLearn@company.mail, and so on).
Non-spam forwarding address:
Use this text box to designate an address to which your users can forward non-spam messages so that the Bayesian system can learn from them. The default address that SecurityGateway will use is "NonSpamLearn[@AnySGDomain.com]", but you can change it to whatever you choose. Messages sent to this address must be received via SMTP from a session that is authenticated using SMTP AUTH. Further, the messages must be forwarded to the above addresses as attachments of type "message/rfc822". Any message of another type that is sent to this email address will not be processed. Finally, when entering an address into this option, only use the mailbox portion of the address - do not include the "@" or domain portion. For example, "NonSpam", "NonSpamLearn", "GoodMail", or the like are all acceptable addresses to use in this option. Messages can then be forwarded to that address at any of SecurityGateway's domains (e.g. NonSpamLearn@example.com, NonSpamLearn@company.mail, and so on).
Don't learn from messages larger than [XX] bytes
Because larger messages are generally not spam, and because analyzing them can require a great deal of processing, messages over 50,000 bytes will not be analyzed by default. You can use this option to adjust the size value if you choose, or you can disable it completely if you wish to go ahead and analyze messages regardless of size.
Click here to perform Bayesian learning now
Click this link at any time to initiate the Bayesian Learning process manually, in addition to any scheduled interval that you may have set.
Automatic Bayesian Learning
Enable Bayesian automatic learning
With Automatic Bayesian Learning you can designate Message Scoring thresholds for both legitimate (i.e. non-spam) messages and spam. Any message with a final Message Score below the non-spam threshold will be treated by automatic learning as non-spam, and any message scoring above the spam threshold will be treated as spam. Although it should be used with caution, automatic learning can be beneficial if you are careful in setting your thresholds values, because it will allow expired tokens that are removed from the database files (see Bayesian Database below) to be replaced automatically. It can give the Bayesian Learning system a constant fresh supply of messages from which to learn while preventing the need for manual retraining to recover expired tokens.
Consider messages which score lower than [XX] to be legitimate
Messages with a Message Score below this value will be categorized as legitimate/non-spam messages for the purpose of Bayesian Learning.
...only learn non-spam from domain mail servers and authenticated sessions
Click this option if you wish to apply Automatic Bayesian Learning of legitimate mail only to messages coming in over authenticated session or from one of your domain mail servers. When using this option, inbound messages from non-local sources will not be used for Bayesian learning regardless of their final Message Score, unless coming from a domain mail server or authenticated source. However, you could still manually copy any legitimate messages to the designated "non-spam" folder listed above, thus providing the system those messages to learn from as well.
Consider messages which score more than [XX] to be spam
Messages with a Message Score above this value will be categorized as spam messages for the purpose of Bayesian Learning.
...only learn spam from inbound messages
Click this option if you wish to apply Automatic Bayesian Learning of spam mail to inbound messages only. When using this option, outgoing messages will not be used for Bayesian learning, regardless of their final Message Score. You can, however, still place messages manually in the "spam" folder listed above.
Bayesian Database
Enable Bayesian automatic token expiration
This option allows the Bayesian system to automatically expire database tokens whenever the number of tokens specified below is reached. Setting a token limit can prevent your Bayesian database from getting excessively large and slowing down processing.
Maximum Bayesian database tokens:
This is the maximum number of Bayesian database tokens allowed. When this number of tokens is reached, the Bayesian system removes the oldest, reducing the number to 75% of this value or 100,000 tokens, whichever is higher. The number of tokens will never fall below the larger of those two values regardless of how many tokens are expired. Note: 150,000 database tokens is approximately 8Mb.
Advanced
Maximum message processing threads (1-6):
Use this option to designate the maximum number of message processing threads that will be used by SGSpamD at any one time. You may set this value from 1 to 6 threads. The default is 4.
Maximum TCP connections per thread (10-200):
This is the maximum number of TCP connections to SGSpamD per message processing thread allowed at any one time. You may set this value from 10-200. The default is 200.