Beginners Guide: Preventing Comment Spam - Webmaster Forum - Web Design, Programming and SEO forums

10-09-2007, 04:05 PM

PIXL

Status: Sin Binner
Join date: Dec 2006
Location:
Expertise:
Software:

Posts: 222

iTrader: 4 / 100%

PIXL is on a distinguished road

Beginners Guide: Preventing Comment Spam

Beginners Guide: Preventing Comment Spam
provided by dizbe.com

Spam is no longer limited to email. If you run a Website on which you allow users to leave comments, you have undoubtedly faced the problem of comment spam.

The spammers’ aim is not to redirect some of your traffic to their site, which is the obvious initial conclusion; it is to increase their (or their clients’) ranking in search engines. Most search engines now count in a site’s ranking how many other Websites have linked to it. By leaving comments on your site, the spammers’ sites can achieve a slightly higher search engine ranking.
The spammers’ job is to get around spam-blockers and target the security of individual Websites; though occasionally they do so on a manual basis, by far the most common forms of comment spam are achieved with spam “bots” or scripts. Unfortunately, many site owners don’t focus on their Websites as their day job, which can make adapting to spam bots difficult.

Rules of Thumb

When you find that your site is the victim of comment spam, it’s easy to react strongly, on a per-case basis, rather than look at the bigger picture. These Rules of Thumb should help you keep things in perspective.
The most important of these rules is: don’t take it personally. Spammers don’t want to degrade your site. They simply want to get people to their sites and make a larger profit.

1. Don’t Ban Specific IP Addresses

Don’t bother banning IP addresses. Although this is the most logical thing to do, it rarely helps much. Most comment-spammers bounce requests off other computers and servers, so you’ll likely never be able to eradicate them from your site entirely.
As a Comment Spammer explained in this interview at The Register, “So Sam (a comment spammer), like other link spammers, uses the thousands of ‘open proxies’ on the net. These are machines which, by accident (read: clueless sysadmins) or design (read: clueless managers) are set up so that anyone, anywhere, can access another Website through them. Usually intended for internal use, so a company only needs one machine facing the net, they’re actually hard to lock down completely.”

2. Don’t Allow HTML

There is no reason for a user of your site to be able to write HTML or JavaScript. This should be obvious, but there are many Websites that allow users to use either, or both, of these languages.

If you feel the need to allow the user to include links, there are a number of ways by which you can code to accommodate that functionality, without making your site vulnerable to attack. The most common method is to inform the user that all URLs will be converted to links automatically, then convert any content that starts with http:// to a link.

3. Use Non-Descriptive Form Names

Good programming requires the use of descriptive names, but in avoiding comment spam, you should stay away from names that describe a form’s fields. Form element names like “Comment” make it too easy for spammers to access your comment system.

4. Use rel=”nofollow” for All Links
If you allow site users to include links in their comments, add rel=”nofollow” to the tag, as shown below:

<a href=\"http://www.dizbe.com” rel=”nofollow”>dizbe</a>

This technique allows search engines bots to ignore the link, so the spammer gains no benefit from adding links to your comments.

Stopping Spam
You can stop comment spam via two approaches. The first tackles the problem before the comment is posted; the second addresses spamming after the fact.

1. Differentiate Between Spammers and Regular Users
Differentiating between spammers and regular users involves requiring your human posters to identify themselves as such though an extra step inserted in the commenting process. This is possibly the most widely used approach to avoiding comment spam, and includes two options.

The Turing Test
The most commonly used Turing Test (named after a computer scientist Alan Turing) is called CAPTCHA (Completely Automated Public Turing Test to Tell Computers and Humans Apart).
It involves adding to your site’s comments area an image that contains a random piece of text. The text must be somewhat tarnished or blurred so that a human can read it, but a computer cannot. The commenter is asked to copy this text exactly into a form field before they submit the form.
This process makes it easy to guarantee that the commenter is a person, not a bot. It does not solve the problem of a human being spamming your comments section manually. However, as most spamming is carried out automatically by bots, this technique should stop most comment spam.

Advantages
Constant updating of your code is not necessary, as this technique is not code-reliant.

Disadvantages
Users are forced to perform an extra step, which does not benefit them, in order to submit their comments. This can be a serious downfall if your Website is just starting out and you’re trying to encourage people to comment.
More information on CAPTCHAS can be found at Capcha.net. A free resource that explains how to use CAPTCHAS on your site is available at Human Verify.

User Authentication
Using this method, the site owner requires all users to set up a username and password before they can comment on the site. If the administrator then finds a user spamming the site, he or she can ban that username or email address.
This approach works in two ways. Firstly, spammers don’t want to be identified and therefore will be unlikely to sign up. But, even if they do, the hurdle of having to take the time to sign up in order to spam (and be banned immediately) can be a strong deterrent. After all, there are many, easier targets online than a site that has a user authentication system in place.

Advantages
Though code-driven, this solution involves a database of users and simplified user management, so it’s not too time-consuming. The banning of offending usernames may take some time, however.

Disadvantages
Before they can post, users are forced to perform numerous extra steps, for which they may see little benefit. This can be a serious downfall if your site is just starting out and you are trying to encourage usage.
One way to make the system run a little more smoothly may be to use cookies to automatically sign in repeat users each time they visit the site. This way, when they go to post, they need not be faced with the login form: they’re already logged in and can just start to post. This option doesn’t alleviate the initial signup process, however.
More information on creating a user authentication system can be found at Developer Fusion.

2. Catch Comment Spam After it has Been Added

Catching comment spam will be necessary if you decide not to differentiate between spammers and human users. It may also be necessary if you have taken the steps above – some comment spam is almost inevitable.
This approach involves the creation of a check that occurs after the comment is submitted to identify it is spam or a legitimate post. Of course, you can go through posts manually before they’re made live, checking to ensure they’re not spam. But you can also automate the process: create a list of keywords that are common to spam, and check each post against this list. You can then weed out any comments that contain the offending words (which might include terms like Viagra, gambling, poker, meds, etc.).
This comparison can be done in various ways, and at a number of points during comment processing. Most programming languages make it very easy to check a string for given keywords. Make sure that the string’s case is also compared, by converting the string to lowercase or uppercase, before you run the comparison. If the comment is found to contain the key words, the spammer can be warned, and the comment deleted.

Advantages
This approach does not require the commenter to take any extra steps, so the comments section remains simple and easy to use.

Disadvantages
As the spammer changes the words used by the spam bot, your keyword list must also be updated. This technique will also be difficult to implement if the spammer advertises products that are relevant to your Website, your list of banned words might stop legitimate comments from being posted.

Managing Comment Spam
Spam will always be a problem. However, a well-designed site that has taken into consideration the common spamming techniques will be able to avoid most spam. The techniques we’ve explored here should help site owners effectively to battle comment spam.
Ultimately, the Webmaster needs to adapt his/her techniques to deal with spam on an ongoing basis. The secret to success, then, is continual monitoring and adaptation to spammer’s changing tactics.

10-17-2007, 11:12 PM

Tiny

Status: I'm new around here
Join date: Oct 2007
Location:
Expertise:
Software:

Posts: 5

iTrader: 0 / 0%

Tiny is on a distinguished road

Brilliant, hope it will work. Hat spam and spammers, no need for it at all

10-30-2007, 02:15 PM

JochenVDV

Status: I'm new around here
Join date: Oct 2007
Location: Belgium
Expertise:
Software:

Posts: 18

iTrader: 0 / 0%

JochenVDV is on a distinguished road

Great article on how to prevent comment spam, thanks for sharing!

10-31-2007, 03:34 PM

azariy

Status: Sin Binner
Join date: Oct 2006
Location:
Expertise:
Software:

Posts: 102

iTrader: 0 / 0%

azariy is on a distinguished road

Nice artice and helpful.

11-29-2007, 12:33 PM

maZtah

Status: I'm new around here
Join date: Nov 2007
Location:
Expertise:
Software:

Posts: 15

iTrader: 1 / 100%

maZtah is on a distinguished road

I always use CAPTCHA for preventing spam. Also, another simple method to stop spam is to have another question added to your form with a simple (math) question like: 3 + 6? Where as the user can fill in 9, but a spambot don't know what's being asked.