Update: This essay is now up at WLTC, if you like it, go vote for it!
They’re everywhere, and they’re annoying. They’re called CAPTCHAs and they’ve become a ubiquitious part of blog commenting. Bloggers use them as a quick and dirty solution to an annoying problem without consideration for the annoyance they will cause the reader.
I want to persuade all bloggers who are using them to please stop.
What are they?
CAPTCHA stands for “completely automated public Turing test to tell computers and humans apart”. I know it should really be CAPTTTTCHA but hey, I didn’t come up with the acronym.
Before bigots destroyed his life, Alan Turing posited the idea of a test to determine machine sentience. His test was designed to decide if a computer had achieved artificial intelligence. So far no computer has passed a Turing test, but the CAPTCHA uses the idea of a Turing test in reverse, testing if a supposed person is really a person and not a computer program pretending to be a person.
So a CAPTCHA is a test to make sure the person posting a comment (or anything else, but I’m concentrating on the blogging usage here) is really a person, and not a spam generator trying to post comments about card games, prescription drugs or sex. It usually involves an image showing some distorted text, requiring the user to type in what they see in the distorted text.
Why are they bad?
Anything that stop spammers is good, right? Well generally yes, but some things that stop spammers are better than others; so much better that the inferior solutions become un-necessary. There are many problems with CAPTCHAs:
- Any extra work required to comment is likely to deter some people from commenting at all.
- Sometimes the images are so distorted they’re almost impossible to read, even with perfect eyesight.
- CAPTCHAs are hackable. Spammers are smart, they can get past many of our barriers.
- Visually impaired users are completely excluded (although there are audio CAPTCHAs available now).
- Dyslexics have a hard time too.
- There are better and less intrusive solutions.
What are these better solutions?
Hopefully by now I’ve convinced you that CAPTCHAs are not the best solution to the spam flood. Now it’s time to bring in the alternatives, but before I offer my alternatives, we should decide what our requirements are. An effective and non-intrusive spam blocker should:
- Require nothing or as little as possible from the valid commenter.
- Require as little effort as possible from the webmaster/blog owner.
- Work on as many blog platforms as possible, or have similar alternatives for other blogging platforms.
- Stop as much spam as possible.
- Not interfere with valid comments
Here are the solutions which I feel best meet these requirements:
Centralized spam database
This is what I use, in the form of Akismet. The idea is that all spam comments get submitted to a central server. Each time someone comments on your blog the comment gets checked against the central database. If the comment looks like spam it is automatically flagged as such. The person leaving the comment didn’t have to do anything. The blogger just has to check for false positives occasionally. Everybody is happy.
So far Akismet has stopped over 15,000 comments from being published on my blog with about three false positives (comments marked as spam which were not spam) that I know of and about 5 false negatives (spam comments that did not get marked as spam).
Akismet is designed for WordPress but will work with other blogging platforms, and the API is open source.
The downside of this solution is the reliance you have on a central database. If the database goes down or disappears altogether then the spam flood will begin again. But while it’s around, why not take advantage of it?
Comment analysis programs
Programs like the Bad Behaviour plugin for WordPress take all comments received and analyze them for telltale signs of spaminess. Using data hidden in the HTTP headers like user agent information it is possible to tell if a comment came from a legitimate user or a spambot.
The downside of this kind of solution is that it has to be smarter than the spammers, and spammers are smart. Bad Behaviour works very well though, or so I’ve heard; Akismet takes care of things so well that I haven’t needed extra solutions.
Filtering, whitelisting and blacklisting
If your spam problem isn’t big enough to warrant external tools, you can probably get a fairly good spam filter going just with what your blogging software offers natively. You should be able to filter out comments which contain common spammy words (like phentermine, poker, viagra, holdem, etc.).
If spam is still getting through you can look at whitelisting; maybe your blog has an option like “only allow comments from people who have commented before” which is like an automatic whitelist after the first moderated comment is approved.
Blacklisting is trickier, but if you see spam constantly coming from the same source then you can blacklist that source. Most spammers will get around this easily though.
For a list of other spam busters, you can try this page, which is for WordPress, but the concepts still apply to other blog platforms.
CAPTCHAs are bad. They don’t test for humans, they test for smart non-lazy humans with good eyesight and smart spambots that have CAPTCHAs all figured out. They are at best an annoyance and at worst discriminatory.
Using some or all of the suggestions I offered above, you can eliminate your spam problem without making your readers jump through hoops and without losing your own time dealing with the problem. If your chosen blogging platform doesn’t support these solutions, then think seriously about changing your platform. I heartily recommend WordPress for all your blogging needs, either hosted or your own installation.
My final piece of advice is for quitters. If you give up trying to deal with comment spam, or you give up blogging completely, please please please remember to disable commenting before abandoning your blog. Every spam comment that gets published is a victory for the spammers.
NB: This post is longer than my usual offerings because it’s my entry into the WLTC blogging essay competition.