Banner ad



comments

September 3, 1994
 
Printer-friendly version
By watching how spammers fail to create spam on my site, there seem to be three different types of spam creators: Playback spambots, form-filling spambots, and humans.

Playback bots

These are bots which have recorded POST data which they replay back to the form submission URL. A person visits the form the first time, and records the form data. Certain fields are marked as slots to be filled in with randomized spam later, but the structure of the form is played back verbatim each time. This includes the names of the fields, and the contents of hidden fields.

These bots don't even bother looking at the form as served by the site, but blindly post their canned data to the submission URL. Using unusual field names to avoid these bots will only work for a week or so, when they will then record the new field name, and begin posting with it.

A playback bot can be stopped by varying the hidden data on the form so that it will not be valid forever. A timestamp is a simple way to do this, making it possible to detect when old data is being replayed. The timestamp can be made tamper-proof by hashing it with a secret and including the hash in the hidden data of the form. Replaying can be further hindered by including the client's IP address in the hash, so that data can't even be immediately replayed across an army of spambots.
Form-filling bots

These bots read the form served by the site, and mechanically fill data into the fields. I don't know if they understand common field names (email, name, subject) or not. On my site, I've observed bots that look at the type of the field, and fill in data based on the type. Single-line edit controls (type=text) get name, email, and subject, while textareas get the comment body. Some bots will fill the same data into all the fields of the same type, while others will enter (for example) different first names into each of the single-line fields.

Form-filling bots can be stopped by including editable fields on the form that are invisible to people. These fields are called honeypots and are validated when the form data is posted. If they contain any text, then the submitter must be a bot, and the submission is rejected.

Using randomized obscured field names, and strict validation can also stop these bots. If the email field must have an @-sign, and the name field must not, and the bot can't tell which field is email and which is name, then the chances it will make a successful post have been greatly reduced.
Humans

These are actual people using your form. There's nothing you can do to stop them, other than to remove the incentive. They want link traffic. Use the rel="nofollow" attribute on all links, and be clear that you are doing it.




Comments

Post new comment

The content of this field is kept private and will not be shown publicly.
CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.