Duplicating strange bot error

addis_a · July 20, 2014, 4:13am

I’m getting a 500 error on my website that obviously comes from a bot.
I’d
like to duplicate that error so that I can try to suppress the email
message that gets sent to me.

The error contains:
(ArgumentError) "invalid %-encoding

It’s in a “show” action, so it’s a GET command. I can see the URL and
that
URL doesn’t contain any strange characters. When I put that URL in a
browser everything works.

I notice, in the error message I receive, there is a bunch of non-ascii
text, and embedded in it is “Network Solutions Certificate Authority”.

There is no indication that I can see of how that info is being sent. Is
that in a cookie? Is there any other mechanism that a client can sent
info
to the server?

NOTE: This is NOT an https site.

As a last resort, I could suppress all “invalid %-encoding” errors, but
I
would like to see that error if it really came from a real person.

I guess another approach would be to suppress all errors from
non-humans,
but I’m not sure how to do that.

And ultimately, I’m curious about exactly what is being sent to the
server.
I want to understand that.

paul · July 20, 2014, 10:09pm

Thanks. I see that the sender’s IP always starts with 183.60.x.x with
the
third number between 213 and 216.

I could just block those addresses and kick the can down the road.

If I could duplicate what the bot is sending then I could take a stab at
the rack filter. It seems like I should be able to do that with curl.
I’ll
post if my experiments look useful, but if anyone has already figured it
out, please post.

On Sun, Jul 20, 2014 at 10:43 AM, Hassan S. <

paul · July 20, 2014, 4:44pm

On Sat, Jul 19, 2014 at 7:11 PM, Paul [email protected] wrote:

I notice, in the error message I receive, there is a bunch of non-ascii
text, and embedded in it is “Network Solutions Certificate Authority”.

There is no indication that I can see of how that info is being sent. Is
that in a cookie? Is there any other mechanism that a client can sent info
to the server?

I’ve been seeing a lot of these lately, all from this user-agent:
Mozilla/5.0 (compatible; EasouSpider;
+http://www.easou.com/search/spider.html)
from the following IP:183.60.214.126 (China Telecom block)

The problem is it’s a GET request with a content-body, which is not
strictly prohibited by the RFCs, but not technically supported either.

If your exception notifier provides it, look at the value of
‘rack.request.form_vars’
where you’ll see what appears to be a binary cert file’s contents.

Regardless, it seems like this spider is either seriously broken, or
actively hostile. I’m thinking about a Rack filter to drop any GET
request with a content-length header or a non-empty body, but the
quickest fix is to use iptables to block this thing altogether

HTH,

Hassan S. ------------------------ [email protected]

twitter: @hassan