Commentable and Spam

smc7000 · September 6, 2006, 7:14pm

It’s probably too early for many of you who are using the Comments
behaviors
to get blog-spam, but I wanted to address the issue before it becomes a
problem.

One user, “Nolan”, commented that CAPTCHA images are inaccessible (to
vision-impaired users) and should not be considered an option. As an
alternative, he suggested the use of spam-filtering services like
Akismet.
While this seems a very promising option, that would be a whole lot of
extra
code (and perhaps another library, even a gem!) and probably outside the
scope of my little behaviors.

Instead I’d like to get your thoughts on another method, although there
is
ample room for many methods down the road. The method I’m considering
is a
challenge-response where the challenge is in human-readable text (a
method
suggested on some of the CAPTCHA pages I read). I haven’t worked out
the
particulars, but it would go something like “Subtract twenty-five from
the
sum of eighty and thirty-seven and enter the result as a number.” The
real
ones would probably be easier than that, of course. This could be
automatically generated of course. Another option is to say “Enter the
subject of the sentence ‘Jane walked to school.’” or something of that
sort.

Please give me your thoughts on this…

Cheers,

Sean C.
seancribbs.com

P.S. updated comments behaviors at
http://seancribbs.com/svn/rails/plugins/comments_behaviors

smc7000 · September 6, 2006, 7:34pm

On Sep 6, 2006, at 12:12 PM, Sean C. wrote:

One user, “Nolan”, commented that CAPTCHA images are inaccessible (to
vision-impaired users) and should not be considered an option. As an
alternative, he suggested the use of spam-filtering services like
Akismet.

It isn’t that I think it shouldn’t be an option, but that it
shouldn’t be the only and default option, because if it is then
it gets enabled without users thinking through the implications. And,
admittedly, the implications aren’t major for the average user, but I
promise you that anyone in my physical proximity when I can’t leave a
comment or join a forum isn’t quick to forget them when they’ve heard
me curse up a storm because I can’t. I worry that if CAPTCHA is
the only and default scheme, or even if it is the first scheme that
takes hold for any early adopters, that Radiant will become
exclusionary to those of us who want to comment but can’t.

Another option might be to fix the captcha gem to include the textual
challenge/response discussed below, or maybe to pipe its text/numbers
to festival. It’d be clunky, but CAPTCHA itself is a rather clunky
solution IMHO. I’m even pondering writing a web service that accepts
CAPTCHA images and farms decoding them out to volunteers via RSS, and
not being particular about whether the one requesting the text is an
innocent user or a spammer, I’m just that annoyed with the tactic
and its use by companies like Yahoo who, when I last checked, insist
that VI users provide contact information not requested from sighted
users. Sorry, I don’t want to be contacted by someone for permission
to join a group or del.icio.us or whatever.

Ahem, sorry, I’m ranting. I’m just eager not to see any more web
developers go down this road.

smc7000 · September 6, 2006, 7:38pm

On 9/6/06, Sean C. [email protected] wrote:

One user, “Nolan”, commented that CAPTCHA images are inaccessible (to
vision-impaired users) and should not be considered an option. As an
alternative, he suggested the use of spam-filtering services like Akismet.
While this seems a very promising option, that would be a whole lot of extra
code (and perhaps another library, even a gem!) and probably outside the
scope of my little behaviors.

If you write directly to the service by doing your own POST and parse
the results back, it should be doable in 200 lines or less. But you
would introduce depedencies on rexml and open-uri. I interfaced with
Flickr recently by doing the ff.

My model objects are based off

class Base

def self.create(options)
object = self.new
options.each_pair { |k,v| object.method(“#{k}=”).call(v) }
object
end

end

To make them more ActiveRecord-like (am not using a database in this
particular app.

Call the REST api by constructing a URL, passing that to open-uri,
with a block that both selects what you want, and parses the response.

build the URL

def self.flickr(method, token, options,
rest=“http://www.flickr.com/services/rest/?”)
raise FlickrError.new(“hash expected”) unless options.is_a?(Hash)
raise FlickrError.new(“url expected”) unless rest &&
rest.is_a?(String)
options.update(‘api_key’ => @@api_id)
options.update(‘method’ => api(method)) if method
options.update(‘auth_token’ => token) if token
signature = options.keys.sort.map { |k| “#{k}#{options[k]}”
}.join(‘’)
parameters = options.keys.sort.map { |k|
“#{k}=#{url_escape(options[k])}” }.join(‘&’)
url = rest + parameters +
“&api_sig=#{Digest::MD5::hexdigest(@@shared_secret + signature)}”
logger.debug “url: #{url}”
url
end

Now fetch and parse it

def self.fetch(url, xpath, &block)
open(url) do |reponse|
xml = reponse.read
logger.debug “xml: #{xml}”
doc = REXML::Document.new xml
doc.elements.each(xpath) { |element| yield element if block_given?
}
end
end

And here’s how I use it

def self.find(token, method, options)
photos = []
fetch(flickr(method, token, options), ‘rsp/photos/photo’) do
|element|
photos << Photo.create(
:id => element.attributes[‘id’],
:secret => element.attributes[‘secret’],
:title => element.attributes[‘title’],
:server => element.attributes[‘server’],
:ispublic => element.attributes[‘ispublic’] == ‘1’,
:isfriend => element.attributes[‘isfriend’] == ‘1’,
:isfamily => element.attributes[‘isfamily’] == ‘1’,
:owner => element.attributes[‘owner’],
:longitude => element.attributes[‘longitude’].to_f,
:latitude => element.attributes[‘latitude’].to_f,
:geotagged => (element.attributes[‘longitude’].to_f != 0.0 ||
element.attributes[‘latitude’].to_f != 0.0)
) if element.attributes[‘id’]
end
photos
end

Simple and can be kept in only one extra file. If that file defines an
interface, then others can be written to handle different spam
services.

Instead I’d like to get your thoughts on another method, although there is
ample room for many methods down the road. The method I’m considering is a
challenge-response where the challenge is in human-readable text (a method
suggested on some of the CAPTCHA pages I read). I haven’t worked out the
particulars, but it would go something like “Subtract twenty-five from the
sum of eighty and thirty-seven and enter the result as a number.” The real
ones would probably be easier than that, of course. This could be
automatically generated of course. Another option is to say “Enter the
subject of the sentence ‘Jane walked to school.’” or something of that sort.

This would also be nice, though I prefer the automated, invisible
approach to one which gets in the way. Which one is more brittle is
also a concern. Is it the spammers write a reverser for your
generator, or akismet falls over under the load or they discover a big
hole in akismet’s algorithms. I think centralized spam detection will
do better but that’s just a hunch.

– G.

smc7000 · September 6, 2006, 7:40pm

Nolan,

I totally understand where you’re coming from. Personally, I’m also not
eager to create something that would potentially cause problems on the
server or require any library that the user may or may not have control
over
(i.e. RMagick – took 4+ weeks to get an update on Textdrive, and it’s
still
a version behind because of the FreeBSD port taking too long). This is
why
I’m leaning toward something that can be done purely in Ruby and be
relatively lightweight.

Sean C.
seancribbs.com

smc7000 · September 6, 2006, 7:44pm

Is it the spammers write a reverser for your
generator, or akismet falls over under the load or they discover a big
hole in akismet’s algorithms. I think centralized spam detection will
do better but that’s just a hunch.

– G.

Let’s make no mistake, fighting spam is a constant battle. However, as
Nolan suggests, I’d like to provide a default implementation that will
be
“good enough”. No CAPTCHA (visual or textual) will completely prevent
spam,
but at least it can filter out the automated spam-bots.

PDI

Cheers,

Sean C.
seancribbs.com

smc7000 · September 6, 2006, 7:48pm

On Sep 6, 2006, at 12:39 PM, Sean C. wrote:

Nolan,

I totally understand where you’re coming from. Personally, I’m
also not
eager to create something that would potentially cause problems on the
server or require any library that the user may or may not have
control over

No worries, I completely understand, and am not annoyed at you or
your efforts. It’s the tacticitself that annoys me, and I try
making developers aware of the frustrations that I and other blind
users feel when it is used.

Unfortunately, I don’t have an easy and quick recommendation for
another solution, so I feel bad for complaining about something
that’s reasonably easy to implement.

Thanks for understanding and being responsive. I might take a look at
the captcha gem, see if the challenge aspect can be abstracted into a
module that can be replaced dynamically based on the needs of the end
user.

smc7000 · September 6, 2006, 10:02pm

Hi all my first post

have you seen this site it may have what you are looking for typo uses
it,
I think its built from the sanitize plugin on moveable type.

Kind Regards
Martin

smc7000 · September 6, 2006, 10:29pm

On Sep 6, 2006, at 3:16 PM, Sean C. wrote:

In the meantime, I’d appreciate if someone would look into a
textual challenge-response as well.

I’m going to stop getting annoyed and actually do something about
it. I’m looking into rewriting the captcha gem to abstract it into
ChallengeResponse mechanisms (I.e. ChallengeResponse::RandomText,
ChallengeResponse::GrammarNazi, etc.) which contain Challenges that
can be represented in different Formats (I.e. :audio, :text, :image)
produced by Producers (I.e. RMagick, Festival, WaveConcatenator,
etc.) Needless to say, that’s a bit more complicated than just a
textual implementation, but I think it’s a bit more free-form and
open-ended with the possibility to add new ChallengeResponse schemes/
formats/producers.

But I think I’m straying a bit off-topic. In summary, I’m looking
into it.

smc7000 · September 6, 2006, 10:17pm

Thank you all for your thoughts. I’ll be looking into the Akismet, Rick
said the code in Mephisto was free-for-the-taking: he lifted it from
somewhere too. In the meantime, I’d appreciate if someone would look
into a
textual challenge-response as well.

I also wanted to probe John about the subject of where comments are
stored.
I know there had been previous discussions of how to implement comments
on
the list, and they largely shaped my chosen implementation. Do you
think it
would still be relevant to have a separate Comments model with its own
admin-UI, or is the stored-as-pages method ok? (i.e. is the method I’m
using
good enough to fulfill the goal of “Radiant has comments”)

Sean C.
seancribbs.com

smc7000 · September 6, 2006, 11:13pm

Looking forward to it, Nolan. Let me know how I can help!

Sean C.
seancribbs.com

smc7000 · September 7, 2006, 1:11am

On 07/09/06, Nolan D. [email protected] wrote:

It isn’t that I think it shouldn’t be an option, but that it
shouldn’t be the only and default option, because if it is then
it gets enabled without users thinking through the implications.

I’m utterly with Nolan on this. I can’t stress enough that any sort of
CAPTCHA needs to be an option. If you feel strongly enough about spam
that
you feel it should be the default option, that’s fine and I’m all for
it,
but a site user must be able to turn it off without resorting to a
developer

I haven’t had a chance this week to check out your code, Sean, (planning
to
this weekend) but I’m hoping that comments offer the same functionality
in
general? ie you can turn comments on and off for particular posts

I’m big on user choice…

Lachlan Hardy

smc7000 · September 7, 2006, 2:44pm

Thanks for your thoughts, John. I guess a major reason why I did it
that
way – besides expediency – is that I think all content, whether
created by
a logged-in user, should be editable/removable via the admin interface.
I
also didn’t want to spend a lot of time increasing the complexity of the
admin interface, considering it is so clean and elegant as it is. In
some
ways, the discussion of page metadata as page-parts influenced this
decision.

The method I used could be used to model any type of structured, textual
data, of course. That is, separating the structural pieces (fields)
into
page-parts. “Page as Hash”, just as DHH considers DBMS’s to be Hashes.

However, the code would be a lot cleaner if there were another model
instead. Maybe this could be done in a future iteration. I really love
the
way Radiant works, looks, and feels and hope to continue working on
behaviors and such for a while!

Sean C.
seancribbs.com

smc7000 · September 7, 2006, 2:11am

Sean C. wrote:

I also wanted to probe John about the subject of where comments are stored.
I know there had been previous discussions of how to implement comments on
the list, and they largely shaped my chosen implementation. Do you
think it
would still be relevant to have a separate Comments model with its own
admin-UI, or is the stored-as-pages method ok? (i.e. is the method I’m
using
good enough to fulfill the goal of “Radiant has comments”)

Sean, your method is pretty creative considering the fact that Radiant
doesn’t presently give you a way to add stuff the admin interface. I
still favor making comments a separate model object though, and here’s
why: Pages have URLs layouts, etc… Comments do not have their own URL
(nor do then need a layout). Comments are data that is associated with
pages, therefore they should not be implemented as pages themselves.

Your method is interesting though, and would prove very useful in
certain circumstances.

–
John L.
http://wiseheartdesign.com

smc7000 · September 7, 2006, 2:46pm

I haven’t had a chance this week to check out your code, Sean, (planning
to this weekend) but I’m hoping that comments offer the same functionality
in general? ie you can turn comments on and off for particular posts

I’m big on user choice…

Lachlan Hardy

Lachlan,

Currently, if the Commentable behavior isn’t assigned to the page, it
doesn’t receive comments. However, once the page has comments and the
behavior, you could easily turn them off by removing the submission form
from the page, and then setting the status config option to 1 so an
hack-posted ones don’t get published.

Sean C.
seancribbs.com