This short report started when I got curious about the assertion of some friends who work in computer security. The assertion was that most spam today comes from broadband networks, where home and small office computers would be used as relays by spammers. One friend had gone so far as to build rules for Postfix, a mail server, to deny access to any IP address within a long list of broadband networks, unless the email came from a known mail server.
I had already been blocking some spam using pattern matching rules for indecent Subject lines, and certain domains that produce nothing but spam. For example, my network was connected through Cybertrails, via an ISDN link. I ran my own mail server, and never used their email services. But several usernames were set up for myself, and my wife, at a domain owned by Cybertrails. These addresses, like rik@kachina.net, were never used, but managed to attract large volumes of spam from the Cybertrails server. However, Cybertrails also provides broadband connections over ISDN, DSL, and wireless, so this may also be a factor, and one I've not explored.
In late 2003, besieged by spam, I installed SpamAssassin, and began to rely on it to help filter spam. My setup places all suspected spam in a Spam directory (using procmail), where each message gets assigned a number as a filename. I had been routinely emptying this directory, after checking for misfiled messages, but stopped on January 18, 2004. By April 22, I had acquired over 28,400 spam messages.
In late February, I wrote a few short Perl scripts so I could examine the domain names of the systems sending me spam. I used this data to write two magazine articles, one for ;login:, and the other for Network Magazine. A student asked if I could provide my data for reference, and I agreed to do so. Well, I decided to clean up the Perl scripts and improve my methodology a bit first.
Back in February, I had my own spam plus that of Brian Martin, a security researcher who really hates spam, and will routinely send email to the abuse account for the spammer's domain. The results showed that a lot of spam does come from broadband providers, with comcast leading the pack:
The results were revealing, but not too surprising. Comcast.net and attbi.com together accounted for 15% of all spam received, with Roadrunner coming in third with 5%. Many other well-known broadband providers fit into the top twenty offendors, for a total of over half of all the spam we had received.
Not everything that appeared in the top twenty were real domains, however. My script lumped all of .net.br into a single score, so I wanted to rewrite the script and try again. I also had used all Received lines in the first experiment, and that had the unsavory result that some spoofed Received lines (those added by the spam tool or spammer) also appeared in the results.
I wrote a perl script that iterates through the files in my spam directory, and looked for the first instance of Received in each file. The first instance is guaranteed to be one that my own mail server added, so it will never be spoofed. After an upgrade, Spamassassin started working differently, capturing email and resending it locally, so I had to add in a special case rule so that a Received line that included localhost would be ignored in favor of the original Received line that occurs later in the message.
The Perl script parses the Received line to extract the domain name, skips it if it is unknown (the IP address of a spamming system that failed the reverse lookup), then outputs the last two or three parts of the domain name.
I ran that script on my spam directory, and came up with the following results file. Then I ran the rough results through a UNIX pipeline to sort, count, then reverse sort by count, the results:
sort first-rcvd-out|uniq -c| sort -rn > first-rcv-sort
Note that 8602 of the sending IP addresses did not have a reverse (PTR) DNS record, so should not be legitimate mail servers. Well, one would hope that mail servers have both MX and PTR records. A total of 28400 emails were processed on this go round.
The top source of spam based on this report are: