sa-learn will use about 20MB RAM, so if you don't have 64MB free memory
available during the migration (and after), the process may run very slowly. If
immediately after a reboot, you don't have 100MB free memory, I'm not sure I
would install MySQL. This migration procedure is extremely CPU intensive. You can expect
the system to slow to a crawl while sa-learn is backing up
and restoring the Bayes data. This may cause problems on a busy box so I
would take measures to limit mail traffic as much as possible. These instructions
assume you have built a Fedora Core 4 spamfilter box according to the instructions here:
http://www.freespamfilter.org/
and (like me) you have no prior experience with MySQL. I assume you are
running amavisd-new but I do not require that you are. Some aspects of
this HOWTO may differ considerably if you are not however. Absolutely no warranty,
use at your own risk. If you are building a new system, I strongly
recommend you first process at least a couple hundred messages prior to converting
Bayes to MySQL because some parts of this document will make more
sense if you currently have some data in your Bayes and auto-whitelist files.
Install MySQL 4.1:yum install mysql mysql-devel mysql-server
chkconfig --levels 235 mysqld on/etc/init.d/mysqld start See if it is running: lsof | grep LISTEN
Note that the FC4 installation provided the Perl modules SpamAssassin needs. (See 'Requirements' at http://svn.apache.org/repos/asf/spamassassin/branches/3.1/sql/README.bayes) Once MySQL is installed, log in: mysql -u rootFrom the mysql> prompt, create the two passwords required for root:
SET PASSWORD FOR 'root'@'localhost' = PASSWORD('roots_password');fc4.example.com is our FQDN hostname, you must change this to match your hostname. Now create the database we will use to store Bayes and auto-whitelist data: create database sa_bayes;We will now create a user called 'sa_user', create a password for 'sa_user' and allow 'sa_user' to do what it needs to the 'sa_bayes' database: GRANT SELECT, INSERT, UPDATE, DELETE ON sa_bayes.* TO 'sa_user'@'localhost' IDENTIFIED BY 'sa_users_password';Refresh privileges: FLUSH PRIVILEGES;Let's take a look at what we have done: SELECT Host, User FROM mysql.user;We are done for the moment, so: quit |
Navigate to where our Bayes data is currently stored:cd /var/amavis/.spamassassinAs the 'amavis' user, we need to create a backup file of our Bayes data that will later be restored to our SQL database. This process may take a while: su amavis -c 'sa-learn --sync --force-expire'
There are four files we need to grab from the source code: wget http://svn.apache.org/repos/asf/spamassassin/branches/3.1/tools/convert_awl_dbm_to_sqlThese commands create the tables we need, in the sa_bayes database.
mysql -u root -p sa_bayes < bayes_mysql.sqlNow we will configure SpamAssassin to use MySQL. Once we do this, don't restart amavisd-new until we are completely finished. If you have to restart amavisd-new before the process is complete, set local.cf back the way it was. Also note that we will comment out bayes_path and auto_whitelist_path: cp /etc/mail/spamassassin/local.cf /etc/mail/spamassassin/local.cf-pre-sqlvi /etc/mail/spamassassin/local.cfInsert text below, and comment out items as noted: bayes_store_module Mail::SpamAssassin::BayesStore::SQL bayes_sql_dsn DBI:mysql:sa_bayes:localhost bayes_sql_username sa_user bayes_sql_password sa_users_password auto_whitelist_factory Mail::SpamAssassin::SQLBasedAddrList user_awl_dsn DBI:mysql:sa_bayes:localhost user_awl_sql_username sa_user user_awl_sql_password sa_users_password #auto_whitelist_path /var/amavis/.spamassassin/auto-whitelist #bayes_path /var/amavis/.spamassassin/bayes #this next part will store all data in just one user's table, no matter who runs sa-learn bayes_sql_override_username amavis |
It is required we initialize the database by learing a message (once as 'root', and as once as 'amavis').sa-learn --spam gtube.txtsu amavis -c 'sa-learn --spam gtube.txt'We are now going to load in the data from our old auto-whitelist file. Run these commands to show what arguments are required to run the special program convert_awl_dbm_to_sql. You should still be in the
/var/amavis/.spamassassin directory where we downloaded this script:
chmod +x convert_awl_dbm_to_sql
Now that you have read and understood what is needed, you may run the program. Edit if necessary; make sure you select, copy and paste the entire text: ./convert_awl_dbm_to_sql --username amavis --dsn DBI:mysql:sa_bayes:localhost --dbautowhitelist /var/amavis/.spamassassin/auto-whitelist --sqlusername sa_user --sqlpassword sa_users_password --okNow log in to MySQL and issue these commands so we can see if it worked: mysql -u root -p
USE sa_bayes;Assuming you have been using auto-whitelist for at least a little while, you should see some data. Make a note of how many rows are in the set. Now we can quit: quitOK, now let's load in the Bayes data; this may take 5 minutes, or more than an hour and don't be surprised if your CPU load increases substantially during this process:
su amavis -c 'sa-learn --restore backup.txt'
On a healthy system that has been in use for a while there will be between 100,000 and 150,000 rows of data in the Bayes data tables. If you would like to view our progress, open another PuTTY session, log into MySQL, and monitor the progress:
mysql -u root -pYou can use the [up-arrow] to recall commands. While you wait, commands that provide interesting data: SHOW DATABASES;(should show 'amavis' and 'root') Once the restore is complete (our first PuTTY session has returned to the shell prompt), we will change all tables from MyISAM to InnoDB. Note that the 4.x version of MySQL we are using does not require any configuration changes to use InnoDB; 3.x versions do however. Log back into MySQL if necessary (and use the sa_bayes database), then issue these commands to convert our data:
mysql -u root -pIf everything went well, we can quit: quit
|
The data conversion is complete, now we can test amavisd-new:
amavisd stopSend a test mail through, and observe. You should see: debug: bayes: Database connection established debug: bayes: found bayes db version 3 and if use_auto_whitelist is enabled (rather, is not disabled): debug: SQL Based AWL: Connected to DBI:mysql:sa_bayes:localhost ... debug: Post AWL score: 0.0520000000000005 If everything looks OK, stop debugging with [Ctrl]+c and start amavisd-new: amavisd startOur password file changed, so update the copy Postfix uses:
cp /etc/passwd /var/spool/postfix/etc/passwd |
Now we will create a script to keep our auto-whitelist data trim. If you
are currently using my /etc/cron.weekly/trim_whitelist_weekly script, it will need
to be replaced. First let's create a new script:
vi /usr/sbin/trim-awl
and insert (always make sure you hit [Enter] at the end of the last line):
#!/bin/sh
Save and exit the file, then make the script executable:chmod +x /usr/sbin/trim-awl
Now create a file that the script will use for input:
vi /etc/trim-awl.sql
and insert:
USE sa_bayes;Save and exit, then create a cron job to run the script: cd /etc/cron.weeklyrm trim_whitelist_weeklyvi trim-sql-awl-weeklyand insert (please verify the paths of the files listed here):
#!/bin/sh
#
# Weekly maintenance of auto-whitelist for Fedora with amavisd-new and
# SpamAssassin using MySQL
test -e /usr/sbin/trim-awl && test -e /usr/local/sbin/amavisd && {
su - amavis /usr/sbin/trim-awl >/dev/null
}
exit 0
Save and exit, then make it executable:chmod +x trim-sql-awl-weekly
Then, to test it, run it: ./trim-sql-awl-weekly
Assuming you have been using auto_whitelist at least for a few days, log back in to MySQL, and note how many rows of data are in the awl table, compare it to the number I asked you to make a note of earlier. There should be less if the script is working (unless you previously did not have any entries with a 'count' of '1'):
mysql -u root -p
Do you have any free memory?free
Is this maybe an opportunity to reboot? |
|
Optional: If you would like finer control of auto-whitelist maintenance: Log in to mysql and add a new field "lastupdate" to the awl database, and then populate it (initialize it) with today's date: mysql -u root -p
Now, edit the trim-awl.sql file:
vi /etc/trim-awl.sql
and replace the simple-minded:
DELETE FROM awl WHERE count = 1;
with more sophisticated statements:
DELETE FROM awl WHERE lastupdate <= DATE_SUB(SYSDATE(), INTERVAL 6 MONTH);Feel free to modify the time intervals as you see fit. Each time a record is created or modified, the lastupdate field will reflect the current date and time. These statements not only get rid of "single hit" records 15 days or older, but also records with 6 months or more of inactivity. |
Last minute notes: If you upgrade to SpamAssassin 3.1 (does not apply to earler versions), and use MySQL >= 4.1, change:
Mail::SpamAssassin::BayesStore::SQLto:
Mail::SpamAssassin::BayesStore::MySQL
If you would like to make a backup of the Bayes data:
cd /var/amavis/.spamassassinIf you would like to see how many messages have been learned and how many tokens are in the database: su amavis -c 'sa-learn --dump magic' |