Saturday, August 9, 2008

Is one of your machines secretly a spambot?

Some times we just need facts on the table, automated.

In my previous blog post, I wondered aloud about publishing data about the machines that verifiably tried to spam us. The response was other than overwhelming, and with the script running once per day anyway, I now publish the results via the Name And Shame Robot page.

The annoucement below is very close to the text there, so by way of explanation, here is a gift to all my fellow spamd junkies out there:

We started actively greytrapping and publishing our list of greytrap addresses (almost exclusively addresses generated or made up elsewhere and harvested from our logs) during July 2007. The list of greytrap addresses is published on the Traplist page along with some commentary. You can find related comments in this blog post and its followups.

One byproduct of the greytrapping is a list of IP addresses that has tried to deliver mail to one or more of our greytrap addresses during the last 24 hours. The reasoning is, none of these addresses are valid, and any attempts at delivering to those addresses is more likely than not spam. You can download that list here as a raw list of IP addresses, or as a DNS zone file intended as a DNS blacklist here.

In early August 2008, I wrote a small script that copies (rsyncs, actually) the current list of trapped IP addresses as well as the spamd log off the firewall and for each IP address collects all log entries from the spamd log. The resulting file is rsynced back to the webserver, and you can view the latest version here.

The material here is useful mainly to the system administrators responsible for the machines that appear in it, or people who are interested in studying spammer or spambot behavior. Times are given according to the Europe/Oslo time zone (CET or CEST according to season), and if a date appears several times for an IP address entry, the reason is simply that the log data spans several years. The default syslog settings do not record the year.

In the data you will find several kinds of entries, most of them are pretty obvious and straightforward, others less so. The likely FAQ is, "what are the entries with no log data?". The answer is, the spamd here synchronizes with a spamds at other sites. The entries without log data entered our traplist through a sync operation, but the host did not attempt direct contact here.

The other likely question is, "what is that becks list?". It's what the rest of the world refers to as uatraps. I copied the data for that list into my config from Bob Beck's message on OpenBSD-misc and didn't notice that the list had an official name until much later.

Please note that this is not an up-to-the minute list. Depending on the number of hosts currently in the list of trapped addresses, the script's run time could be anything up to several hours. For that reason, the script starts at the time stated at the beginning of the report file and runs until it finishes generating. The last thing the script does is to rsync the report file to the webserver. For the time being, I archive older versions off-line.

This is now a totally hands-off, automated operation. The report is currently generated on a Pentium IV-class computer with few and only occasional other duties. If you have any comments or concerns, the address in the next sentence is the one I use for day to day email. If you find this data useful, donations of faster hardware or money (paypal to peter@bsdly.net or contact me for bank information) is of course welcome.

No comments:

Post a Comment

Note: Comments are moderated. On-topic messages will be liberated from the holding queue at semi-random (hopefully short) intervals.

I invite comment on all aspects of the material I publish and I read all submitted comments. I occasionally respond in comments, but please do not assume that your comment will compel me to produce a public or immediate response.

Please note that comments consisting of only a single word or only a URL with no indication why that link is useful in the context will be immediately recycled so those poor electrons get another shot at a meaningful existence.

If your suggestions are useful enough to make me write on a specific topic, I will do my best to give credit where credit is due.