I've been having pretty good success with MT-Blacklist in blocking (or removing) comment spam.
The approach it takes is interesting. Rather than trying to track IP addresses (which can be spoofed) or e-mail addresses (which can be spoofed even more easily), it looks at URLs, links in the comment. If they are on the blacklist, it gets blocked.
(You can also use that to block certain text, but that's not the way it's designed to work.)
So, why not take that approach with e-mail spam? Aside from viruses and Trojan Horses and the like, what really sets e-mail spam is the links to the commercial sites. The verbiage its all wrapped up in makes no difference to a large degree (as seen in some recent efforts to spoof Bayesian filters). If the link isn't usable and visible, the e-mail has done no good. And since URLs must be established to do any good, and that costs money and takes effort, it seems like a better way to strike back at spammers.
Are there any mail spam products that focus just on URLs? It really seems to me that would be a superior approach. Unless I'm missing something.
« Previous FRONT PAGE Next »
If there's something like this, I'd love to see it, too.
It just seems to me to be a very effective approach, adding a cost (time and money) to spammers and relying on something much harder to spoof.
Here's Spam Assassin's URL tests. Note the high positive numbers. What this means is your intuition holds.
uri Uses a numeric IP address in URL NUMERIC_HTTP_ADDR 2.899 2.800 2.696 0.989
uri Uses a dotted-decimal IP address in URL NORMAL_HTTP_TO_IP 0.427 0.211 0.617 0.100
uri 'remove' URL contains an email address HTTP_WITH_EMAIL_IN_URL 0.569 0.156 0.249 0.200
uri Uses %-escapes inside a URL's hostname HTTP_ESCAPED_HOST 1.101 2.403 1.001 1.509
uri Uses control sequences inside a URL hostname HTTP_CTRL_CHARS_HOST 0.211 0 0 0
uri Completely unnecessary %-escapes inside a URL HTTP_EXCESSIVE_ESCAPES 0.153 0.748 0.502 0.777
uri URL uses words/phrases which indicate porn PORN_4 1.375 1.302 2.498 1.888
uri Frequent Spam content WWW_CLIK4YOU_COM 1
uri Dotted-decimal IP address followed by CGI IP_LINK_PLUS 1
uri URL of CGI script has unsubscribe or remove UNSUB_SCRIPT 0.865 0.691 1.156 1.999
uri URL of page called "unsubscribe" UNSUB_PAGE 1
uri URL of page called "remove" REMOVE_PAGE 1.052 0.822 1.189 0.501
uri Includes a link to send a mail with a subject MAILTO_WITH_SUBJ 1
uri Includes a link to a likely spammer email MAILTO_TO_SPAM_ADDR 0.496 1.052 0.286 0
uri Includes a 'remove' email address MAILTO_TO_REMOVE 0.855 0.037 0.656 0
uri Javascript protocol in a URI JAVASCRIPT_URI 1
uri Uses non-standard port number for HTTP WEIRD_PORT 1.345 1.944 0.554 1.407
uri URL contains username and (optional) password USERPASS 2.869 3.078 2.337 3.806
uri Filename is just a '\#'; probably a JS trick URI_IS_POUND 0 0.185 0 0
uri Frequent Spam content BTAMAIL_URL 2.900 2.800 2.696 2.700
uri Frequent Spam content CHINA_URL 2.900 0 0 0
uri Includes a link to a likely spammer email MAILTO_TO_B2BMAIL 1
uri Spam URL pattern, DailyPromotions redirect DAILY_PL 2.900 2.800 0 0
uri Spam URL pattern, DailyPromotions server link DAILY_PXE 2.900 2.796 0 0
uri Includes a link to a likely spammer domain E_MAILPROMO_URL 1
uri Includes a link to a likely spammer domain BARGAIN_URL 2.899 2.796 2.274 2.700
uri Frequent Spam Content URI_PXLG 2.900 2.800 2.800 2.700
uri Contains a URL in the BZ top-level domain BZ_TLD 2.899 2.796 3.680 3.350
uri Contains a URL in the BIZ top-level domain BIZ_TLD 0.747 0.784 0.782 0.100
uri URI obscured with character entities HTTP_ENTITIES_HOST 1.059 0 1.719 1.920
uri Message has URI for dollarmachine URI_DOLLARMACHINE 2.900 2.800 2.800 2.700
uri Message has URI for hitbox.com URI_HITBOX 0 2.425 0 0
uri Has Yahoo Redirect URI YAHOO_REDIR 4.300 2.621 4.100 4.100
uri Message has link to mortgage URI MORTGAGE_LINKS 2.900 2.800 2.800 2.700
uri Message has link to company offers URI_OFFERS 2.294 2.373 1.780 1.001
uri Message has URI for bannedcd URI_BANNEDCD 2.900 2.800 2.800 2.700
uri Message has URI for freeht URI_FREEHT 2.900 2.800 2.800 2.700
uri Message has URI 4you URI_4YOU 1.705 1.341 2.696 2.513
Well, it looks like it's making weighted guesses on URL structure, which is probably worth something. I'd just as soon be able to easily identify URLs in a binary fashion.
Heh. Popfile read the notification from your comment as spam, Rich. :-)
I am on an open source project that deals with blog spam. This is hosted on SourceForge. Part of this includes an e-mail discussion group. In order to discuss things we had to come up with ways to discuss a certain Pfizer drug that deals with ED because they get auto-deleted from the group. Thus, how do you discuss spam without being treated as spam?
That's actually pretty funny.
"To know recursion, you must know recursion."
I suppose you have to come up with your own nickname for it. "Bobdole" or something.
I can see the PDR now.
Known side effects: Dry mouth, high blood pressure, and a propensity to talk about oneself in the third person.
I note, looking at a review of CipherTrust's Ironmail 4.0 (a corporate anti-spam appliance) that it includes "URL filtering that can block messages that refer up to 18,000 URLs that spammers are trying to get users to visit." Among, of course, many other things.
I'd be interested in something that did that on the client end, though.
Note: This comment space is for discussion of the above topic, and not for unsolicited commercial links. I use SpamLookup, optional TypeKey registration, and mandatory TinyTuring text CAPTCHA to filter out comment spam. If you have technical problems with these measures, please . With or without TypeKey, you'll need to specify an e-mail address, which will not be published or otherwise abused.
Original material on this weblog is available under a Creative Commons License from
The views expressed by me on this website/weblog are mine alone and do not necessarily reflect the views of
my employer, my church, my party, my candidate, my community, my wife, my friends, or, on occasion, myself.
Views expressed by others are, well, theirs.