Spam bots are evil.
A few years back I was a member on a little-used forum (vBulletin powered). The owner only logged in every couple of weeks or so. Without a helper the forum spam would pile up--until the owner logged in and cleared it out. I volunteered to help and he gave me access to some of the inner workings of the system.
By carefully reading the daily signups in the user database I learned some fascinating things.
- That forum had about 10x as many bots sign up as ever posted anything.
- It was an option during signup that users set their timezone. The first timezone in the list was something crazy like +14 hours east of GMT, which is a cluster of tiny islands in the equatorial Pacific. Everyone, even the guy who was banned and kept coming back to signup for new accounts, set the timezone selector--except the spam bots. Every single spam bot didn't bother to set the timezone.
- The owner had added a nonsense question to the signup process and made it required: "If you could transform into anything at all, what would it be?" You'd think there's no wrong answer to a question like that, but spam bots always answered the same thing--their user name. Canpharm576's answer would be Canpharm576.
- The majority of spam signups didn't post, but they did add a link on their new user profile that pointed to an obviously spammy website. Not a strong indicator, as many humans link to websites. The ratio was about 33% of humans, 98% of spam bots.
- That rev of vBulletin kept track of time to the nearest minute, and it kept track of when the new user downloaded the signup page and when they submitted it. Human: two or three minutes might pass. Spam bot: most would download and submit within the same minute. So the time a bot takes to fill out the form is probably a fraction of a second. The total time was dominated by download and response return times.
- There was a captcha, but I did some Googling and found professional spam-bot software that claimed to be able to read captchas, at least some of the time. Think about it. If the bot only gets the captcha right 10% of the time it's good enough. Bots don't get tired and they don't give up.
The captcha is a Turning test. It's purpose is to discern which new user is human and which machine. But the whole signup process could be turned into a Turing test. Besides the criteria above one could also use the following:
- Remember how many pages a new user visits on the site before they hit the signup page. Bots keep lists of signup pages. If a new user suddenly appears on the signup page with no other history, it's probably a bot.
- Have your HTML submit button, but wrap that in a JavaScript button, and then wrap that in a Flash button. When the user submits remember which button they used. Very few bots run JavaScript, and none I know of run Flash.
- Attach JavaScript to text boxes to measure keystroke cadence. Submit the variation in cadence (min/mean/max) with the signup form. Yes, many bots don't run JS, but this could catch those that do.
- Actually measure the time (to the second) the new user takes to fill in the form and submit it.
- Keep your captcha, but only keep track of how well the new user did. 8 of 9 characters? Not perfect, but okay. Without a single-point of failure one can afford to be generous.
- Might even try browser fingerprinting, but that could be overly invasive to real humans and might not be very effective against bots.
Some of these measures can trip up blind users. But checking the user's browser ID string could help. Spam bots lie about their browser, but will lie in favor of a popular browser. Blind users will have a rare but easily identifiable ID string.
I think of this as something like a credit score. Each parameter is worth so many points. If you get a high score you get an account right away (after e-mail verification). Middle score gets set aside for human scrutiny. Low score gets the success page (like everyone else) that the e-mail's been sent, but in reality the signup got dropped. This could drive spam-bot operators nuts. Run the bot, get the success message, but no e-mail and no access. Go to the site manually and give the same responses as the bot, get the success message, and get the e-mail and the access. Run the bot, no access. Human, access.