My bug, nefarious scraper or a legitimate browser plugin?


#1

I’ve been faced the the following symptoms for some time.

I have links coded as :post or :put, so I can make sure that bots
aren’t hitting particular links.

But it something is either hitting them as :get through an error I’ve
made (like link_to not working well in some browsers?), or there’s 1
or more plugins that pre load urls; or I have scrapers.

Each day I’ll get 50-100 error messages - where routes aren’t found -
of this nature.

When I get many of these hits from the same ip address, I usually
assume a scraper, then block that ip address…but I don’t want to do
this in very case if it’s possible that a legimate (pre-loader browser
plugin) is causing this to happen.

Does anybody else this kind of behaviour? How do you handle it?

thanks.
Jodi


#2

On Mar 30, 4:09 pm, Jodi S. removed_email_address@domain.invalid wrote:

I’ve been faced the the following symptoms for some time.

I have links coded as :post or :put, so I can make sure that bots
aren’t hitting particular links.

But it something is either hitting them as :get through an error I’ve
made (like link_to not working well in some browsers?), or there’s 1
or more plugins that pre load urls; or I have scrapers.

A browser with js turned off would also do this (or using a firefox
plugin like noscript to only have it on for certain websites)

Fred


#3

On 30-Mar-09, at 1:22 PM, Frederick C. wrote:

A browser with js turned off would also do this (or using a firefox
plugin like noscript to only have it on for certain websites)

Fred

ty Fred - yes. good thought - likely the simplest explanation.

will trap, then see where that leads me.

Jodi


#4

This latest info rules out JS-off or a noscript plugin -

On 30-Mar-09, at 1:22 PM, Frederick C. wrote:

A browser with js turned off would also do this (or using a firefox
plugin like noscript to only have it on for certain websites)

Fred

Here’s the markup where the link is:

Phone Contex Roofing Company Ltd.Contact: Contex
Roofing Company Ltd.

The bot/human is reaching the url
"http://homestars.com/messages/201895-contex-roofing-company-ltd/new_company
", but you can see the href = ‘#’ - so something is scanning the html,
looking for urls to harvest.

So looks like either a bot or a page preloader…

I don’t mind pre-loaders - so I think I’ll see if I can find a patter
in the plugins loaded…and if I don’t find a plugin then this could
work as a honeypot.

I guess I don’t have a specific question - merely symptoms - hopeful
that someone may have faced such a thing.

Jodi