It seems I'm getting a bit preoccupied with spam at the moment. A couple of days ago I posted that I thought
referer spam was being generated using meta-refresh. I've since stumbled across some evidence of this. Going through my server logs, I found a referal from:
**www.rob-smith.com/GlamourPhotographer/BlogIndexer.php**
The *s are to prevent a hyperlink - I won't give him the satisfaction. If you're going to take a look at it make sure you've disabled meta-refresh in your browser otherwise you may be generating more spam (and be aware that the site is NSFW).
This appears to be just a gateway page, but uses a meta refresh to redirect to a random site. At the moment it just goes to the main site, so how do I know it was used for spamming?
This Google search is the giveaway. Google's cache, that is supposedly of the page, is actually of a completely different site that just happened to be the one randomly redirected to when
Googlebot visited.
This raises a potential problem when using a .htaccess file to block spam based on referer - could you be inadvertantly blocking search engine spiders from crawling your site? If, like I do, you redirect to an external site when the referer is a known spammer, if the spider passes this referer then it may index the other site. Similar will happen if you return a 403 forbidden - the spider will record that against your URL and may not come back for some time.
So where does that leave sites that have a 'recent referers' section but don't want to risk blocking search engine spiders? Unless your blog software has specific blacklists to block referer spam, you're USCWAP. Maybe the days of vanity lists (which in all honesty is what the recent referers sections are) are numbered.
No Comments/Trackbacks for this post yet...