Feeding your mail gateway a proper spam diet

In a previous post I described the process of how to get a Linux based mail filtering gateway set up on your network to check for viruses and do some basic filtering, eventually delivering messages to your Exchange server.

In this post I will expand on the various ways to “train” and customize your SpamAssassin mail filter to do more checks to weed out spam and generally lower the amount of junk that is making its way to your users’ inbox.

There are a number of things that aren’t enabled by default in SpamAssassin.  Obviously this isn’t as efficient as we would like, so there is a little bit of extra leg work getting everything set up the way it should be.

Tightening up Postfix:

This is the first step to improving the efficiency of your filtering process.  There are a number of checks that can be enabled in the configuration file (/etc/postfix/ here to fight the incoming spam.  I have appended these various checks to the end of my configuration posted previously to lower the amount of spam getting through by ensuring proper sending addresses, valid recipients, proper domains, etc.

smtpd_helo_required = yes
smtpd_sender_restrictions =
smtpd_recipient_restrictions =

Configuring these properties will cause an immediate drop in the amount of spam that makes its way through the filter so the importance of getting this implemented cannot be overstated.

Make your spam filter happy, feed it spam:

The next technique that I will discuss took me FOREVER to figure out, so I hope that by sharing what I have learned I will help people save time in their own implementations.  It didn’t help that IMAP wasn’t enabled on our Exchange server, but I will save that story for another day.

Essentially you want to get a good chunk of SPAM and HAM emails messages to your mail filter for SpamAssassin to apply it Bayesian filtering techniques to learn how to classify incoming messages(statistical analysis stuff, I don’t know a lot about the specifics).

My first thought was to have users copy SPAM messages into a public folder on my Exchange server and pull the messages down directly to my mail gateway.  BUT that dream was shattered when I discovered that IMAP support for public folders had been dropped in the version of Exchange I am using (Exchange 2010).

So I dabbled with a few ideas that weren’t very graceful, the most notable of which was copying the Exchange public folder into Thunderbird then copying the mbox file from Thunderbird to the mail gateway, yuck.  I finally got some help from my friends over at ServerFault.  I basically had to install and configure fetchmail to go out and look for two specified mailboxes on my Exchange, one SPAM account (a spam collection account I created) and one HAM account (my personal inbox).

To install fetchmail issue the following command:

sudo aptitude install fetchmail

Next, we need to configure fetchmail to look at our specified IMAP acounts, so we need to edit the config file ~/.fetchmailrc

poll protocol IMAP port 993:
auth password user "domain/spamacct" with password "password" ssl
auth password user "domain/hamacct" with password "password" ssl

Modify the permissions so that only the specified user can read/write the config file

chmod 600 .fetchmailrc

Finally you should be able to pull the emails onto your mail gateway by issuing the following command:

fetchmail -a -v -n -k --folder inbox

At this point the mail should be on your mail server in the directory /var/spool/mail/USER.  The final step is to feed the mail into the Bayesian filter provided by SpamAssassin.  To do this, issue the following command:

sa-learn --showdots --mbox --spam spam
sa-learn --showdots --mbox --ham ham

I had to fool around with the mail file names when I first copied them to the server to read as “spam” and “ham” but that should be easy enough to accomplish.

To check how the learning process is going we need to check the sa-learn database for the tokens, ham and spam it has received.  There are a few ways to check the database but the easiest I have found is to enter the following into the command line:

sa-learn --dump magic

This will output a number of results, the most important of which are the nham, nham and ntoken outputs.  Here is a sample from the initial training stages from my spam filter:

bruticus@bruticus:~$ sa-learn --dump magic
0.000          0          3          0  non-token data: bayes db version
0.000          0        341          0  non-token data: nspam
0.000          0        210          0  non-token data: nham
0.000          0      69078          0  non-token data: ntokens
0.000          0 1318421928          0  non-token data: oldest atime
0.000          0 1319205954          0  non-token data: newest atime
0.000          0 1319142287          0  non-token data: last journal sync atime
0.000          0 1319142287          0  non-token data: last expiry atime
0.000          0          0          0  non-token data: last expire atime delta
0.000          0          0          0  non-token data: last expire reduction count

Ideally you want the nham and nspam outputs around or above the 1000 message mark, but the filter can begin working with as little as 200 of each.

Also, I have read that the best way to train is to feed SpamAssassin the newest spam and ham messages that you have, so make sure to look for the newest messages to feed it.  I read that it has something to do with the Bayesian analysis.

NOTE:  Try to do the spam/ham learning step of the process in off hours or a slow time because it adds a tremendous amount of overhead to Postfix to process all the messages as well the machine itself taking up a large chunk of memory.

That’s it. The spam filter should be able to filter out even more messages now thanks to the bayesian filtering that we just enabled.

Final Step:

This one may or may not be overkill, I just implemented it yesterday and haven’t had a chance to get any feedback from it yet.  If you are in a multi-language  environment  this addition may not be feasible either.  With this step we are going to enable a SpamAssassin plugin to attempt to detect the email language and filter out everything that isn’t either English or Spanish.

To do this we need to enable the plugin so open up the SpamAssassin config, /etc/spamassassin/v310.pre and uncomment the following line,

loadplugin Mail::SpamAssassin::Plugin::TextCat

Then we need to edit the main SpamAssassin configuration file, /etc/spamassassin/ to filter out all non English or Spanish languages, this line can be added anywhere, I chose to add it under the Bayesian filtering sections.

ok_languages en es
ok_locales en es


That is pretty much it, at least for now. There are possibly a few other things to modify but I need to see how efficient the spam filter is at this point before I decide if I need to add any more layers.  I have a feeling that things are pretty good at this point and adding more filtering wouldn’t really add much value to the filter.

I am very satisfied with the results that I have attained with this project and hope to keep refining the process as I see fit.  Although, at some point I think I am just going to need to take a look at is and say “enough is enough”.  So, if you have any questions or ideas for improvement let me know, I would be glad to hear them.


Short Hiatus

I have a new job lined up as an “official” Systems Administrator that I will be starting here shortly, so the posts may be a little light in the next month or so, plus I haven’t had any really good ideas for topics (so if you have something you would like to see, let me know). Hopefully with this new job I will get some fresh new challenges and be able to blog about how I solved them or get ideas for other future posts.

This new position will be focused primarily on Windows and Network administration so I can foresee posts in the future focusing more heavily on those sorts administration aspects although I do have some Linux plans for my own personal knowledge in the works right now which I can’t wait to write about either.

As for now, keep checking back. If the site is down its probably because I’m in the middle of moving or haven’t gotten my new internet connection set up. Once I’m all settled in I will start cranking out the posts again. If you have something cool you would like to share I would love to post it here for you, so let me know about that as well.

Setting up a spam filtering mail gateway for Exchange 2010

Sorry for the long boring title, I wasn’t sure what to call this.  There are a variety of components to this filtering system so it is hard to classify.  It has a MTA built into it, is a spam filter, a mail anti-virus solution, a graphing tool and has a log analysis component.

Alright, so let’s get going.  This has been an ongoing project for me at work as I had no prior experience in setting something like this up.  The first step for me was determining what sorts of tools were going to work the best for me.  We are on a strict budget where I work at so any paid, third party solutions were out of the picture (Postini, GFI Mail Essentials were two that actually showed some promise).  Instead I had to take the Open Source route, which it turns out has a multitude of different options, whew!

Enter Spamassassin.  This is the main service that I decided to build this system around.  It is easy to set up and get running and provides a robust spam filtering system, easy enough.  Here is the list of tools that I have put together for this system, based on Ubuntu Server 10.04 LTS with everything but SSH disabled initially:

postfix – mail transfer agent
spamassasin – spam filter
clamAV – anti-virus
amavis-new –  interface for postfix -> SA/clamAV
mailgraph – tool to visualize mail statistics
rrdtool – graphing tool for mailgraph to functions

Configuring Postfix:

This piece was confusing to me initially so I hope that this guide will make things a little easier to understand.  If there are questions I will do my best to answer them through my own experience with this project.

Ok, the first step is to grab and install Postfix on the new server.

sudo aptitude install postfix

Next, we need to edit the Postfix  configuration  file /etc/postfix/ to act as the gateway for our Exchange server, these are the settings that I have currently configured for my gateway so you will need to alter yours accordingly.

smtpd_banner = $myhostname ESMTP $mail_name (Ubuntu)
biff = no

# appending .domain is the MUA's job.
append_dot_mydomain = no

# Uncomment the next line to generate "delayed mail" warnings
#delay_warning_time = 4h

readme_directory = no

# TLS parameters
smtpd_tls_session_cache_database = btree:${data_directory}/smtpd_scache
smtp_tls_session_cache_database = btree:${data_directory}/smtp_scache

# See /usr/share/doc/postfix/TLS_README.gz in the postfix-doc package for
# information on enabling SSL in the smtp client.

myhostname = computer.local.domain
mydomain = local.domain
myorigin = $mydomain
inet_interfaces = all
alias_maps = hash:/etc/aliases
alias_database = hash:/etc/aliases
myorigin = /etc/mailname
#mydestination = localhost, localhost.local.domain
mydestination =,, localhost
relay_domains =
relayhost =
mynetworks = [::ffff:]/104 [::1]/128
mailbox_size_limit = 0
recipient_delimiter = +
transport_maps = hash:/etc/postfix/transport
append_at_myorigin = no
local_recipient_maps =
smtpd_helo_required = yes
smtpd_recipient_restrictions =
 permit_mynetworks reject_unauth_destination

# Content filtering
content_filter = smtp-amavis:[]:10024

Now we need to configure Postfix to relay mail through our filter to our Exchange server. To do this we need to make sure our domain is the only place email gets forwarded to. Add this line to the file /etc/postfix/transport smtp:[]

This maps our external site “” to our Exchange server living comfortably inside the network. Finally, build the hash table for Postfix to use to forward mail

postmap /etc/postfix/transport

and then restart Postfix to update all of the new settings

sudo /etc/init.d/postfix restart

Configuring the Spam Filter:

Ok, so once everything is updated and you have configured postfix the way you want it, you should be able to start the installation/configuration process.

sudo aptitude install amavisd-new spamassassin clamav-daemon
sudo aptitude install libnet-dns-perl libmail-spf-query-perl pyzor razor

This will install all of the necessary items for the filtering system.  Next, we need to set up clamAV and amavis-new to talk to each other.

sudo adduser clamav amavis
sudo adduser amavis clamav

To get these new settings to work (figured this out the hard way) we need to restart the amavis and clamAV services.

sudo /etc/init.d/clamav-daemon restart

Next, we need to enable virus scanning in amavis by editing  /etc/amavis/conf.d/15-content_filter_mode and uncommenting the following lines in the configuration:

@bypass_virus_checks_maps = (
   \%bypass_virus_checks, \@bypass_virus_checks_acl, \$bypass_virus_checks_re);

@bypass_spam_checks_maps = (
   \%bypass_spam_checks, \@bypass_spam_checks_acl, \$bypass_spam_checks_re);

Restart amavis service for the changes to take effect.

/etc/init.d/amavis restart

Ok, now we need to integrate these pieces into the postfix service. Edit the /etc/postfix/ and add these lines at the bottom

smtp-amavis     unix    -       -       -       -       2       smtp
        -o smtp_data_done_timeout=1200
        -o smtp_send_xforward_command=yes
        -o disable_dns_lookups=yes
        -o max_use=20 inet    n       -       -       -       -       smtpd
        -o content_filter=
        -o local_recipient_maps=
        -o relay_recipient_maps=
        -o smtpd_restriction_classes=
        -o smtpd_delay_reject=no
        -o smtpd_client_restrictions=permit_mynetworks,reject
        -o smtpd_helo_restrictions=
        -o smtpd_sender_restrictions=
        -o smtpd_recipient_restrictions=permit_mynetworks,reject
        -o smtpd_data_restrictions=reject_unauth_pipelining
        -o smtpd_end_of_data_restrictions=
        -o mynetworks=
        -o smtpd_error_sleep_time=0
        -o smtpd_soft_error_limit=1001
        -o smtpd_hard_error_limit=1000
        -o smtpd_client_connection_count_limit=0
        -o smtpd_client_connection_rate_limit=0
        -o receive_override_options=no_header_body_checks,no_unknown_recipient_checks

and this to the section immediatley below the “pickup” transport service.

         -o content_filter=
         -o receive_override_options=no_header_body_checks

Finally, we need to restart the postfix service to update the changes.

sudo /etc/init.d/postfix restart

Everything should be ready to go.  If you have a port forward pointing to your Exchange server on your firewall, now is the time to point the port forward to the new address.  Now we are ready to go!

Graphing Statistics

Now that everything is set up we will want a way to see what kind of work our new system is doing.  For a graphical representation we will use a tool called mailgraph to give us results in a nice pretty format.  To get started we will need to grab it and put it on our server.

sudo aptitude install rrdtool mailgraph

This should take care of most everything, but we want to be able to take a look at the results locally on our network in a browser

cp -p /usr/lib/cgi-bin/mailgraph.cgi /var/www/cgi-bin

The script should be executable so we simply need to point our browser at the new location.

http://ipaddress/cgi-bin/mailgraph.cgi or

Mailgraph in action

Given a little bit of time you should start seeing the graphs fill up with your mail data. W00t!