I’m getting a 500 error on my website that obviously comes from a bot.
I’d
like to duplicate that error so that I can try to suppress the email
message that gets sent to me.
The error contains:
(ArgumentError) "invalid %-encoding
It’s in a “show” action, so it’s a GET command. I can see the URL and
that
URL doesn’t contain any strange characters. When I put that URL in a
browser everything works.
I notice, in the error message I receive, there is a bunch of non-ascii
text, and embedded in it is “Network Solutions Certificate Authority”.
There is no indication that I can see of how that info is being sent. Is
that in a cookie? Is there any other mechanism that a client can sent
info
to the server?
NOTE: This is NOT an https site.
As a last resort, I could suppress all “invalid %-encoding” errors, but
I
would like to see that error if it really came from a real person.
I guess another approach would be to suppress all errors from
non-humans,
but I’m not sure how to do that.
And ultimately, I’m curious about exactly what is being sent to the
server.
I want to understand that.
Thanks. I see that the sender’s IP always starts with 183.60.x.x with
the
third number between 213 and 216.
I could just block those addresses and kick the can down the road.
If I could duplicate what the bot is sending then I could take a stab at
the rack filter. It seems like I should be able to do that with curl.
I’ll
post if my experiments look useful, but if anyone has already figured it
out, please post.
I notice, in the error message I receive, there is a bunch of non-ascii
text, and embedded in it is “Network Solutions Certificate Authority”.
There is no indication that I can see of how that info is being sent. Is
that in a cookie? Is there any other mechanism that a client can sent info
to the server?
I’ve been seeing a lot of these lately, all from this user-agent:
Mozilla/5.0 (compatible; EasouSpider;
+http://www.easou.com/search/spider.html)
from the following IP:183.60.214.126 (China Telecom block)
The problem is it’s a GET request with a content-body, which is not
strictly prohibited by the RFCs, but not technically supported either.
If your exception notifier provides it, look at the value of
‘rack.request.form_vars’
where you’ll see what appears to be a binary cert file’s contents.
Regardless, it seems like this spider is either seriously broken, or
actively hostile. I’m thinking about a Rack filter to drop any GET
request with a content-length header or a non-empty body, but the
quickest fix is to use iptables to block this thing altogether