Curse words

Hello, I just had a problem with someone cusing on my rail app now is
there somthing like Red Cloth that I can use to disable ‘dirty words’

Mohammad wrote:

Hello, I just had a problem with someone cusing on my rail app now is
there somthing like Red Cloth that I can use to disable ‘dirty words’

naughty_words = [‘poo’,‘darn’,‘sugar’,‘heffalumps’]
naughty_words.each do |cuss|
comment.gsub!(/\b#{cuss}\b/i, ‘FLUFFY BUNNIES’)
end

Needn’t be much more complicated than that…

Mohammad wrote:

Hello, I just had a problem with someone cusing on my rail app now is
there somthing like Red Cloth that I can use to disable ‘dirty words’

Is this for a forum? You could probably use a bayesian classifier to try
to flag messages that contain words that aren’t already on your
blacklist. There are at least two ruby projects out there.

http://bishop.rubyforge.org/

http://rubyforge.org/projects/classifier

alex

Mohammad wrote:

Alex Y. wrote:

Mohammad wrote:

Hello, I just had a problem with someone cusing on my rail app now is
there somthing like Red Cloth that I can use to disable ‘dirty words’

naughty_words = [‘poo’,‘darn’,‘sugar’,‘heffalumps’]
naughty_words.each do |cuss|
comment.gsub!(/\b#{cuss}\b/i, ‘FLUFFY BUNNIES’)
end

Needn’t be much more complicated than that…

Hmm. This is what I wrote
def show
@pm = Pm.find(params[:id])
if @pm.to_id != @session[:user].id
render :text => “Dont try to cheat the system.”
end
@body2 = filter(@pm.body)
end

def filter(text)
@naughty_words = [‘fuck’,‘ass’,‘bastered’]
@replace_with = [‘f***’,‘a**’,‘b*****’]
@count = 0
@naughty_words.each do |cuss|
text.gsub!(/\b#{cuss}\b/i, @replace_with[@count])
@count += 1
end
end

and its just displaying the ones that are in the @pm.body got any idea
why? did the gsub mess up somewhere, (not good with gsub sorry).

Your not returning the text thats why.
def filter(text)
@naughty_words = [‘fuck’,‘ass’,‘bastered’]
@replace_with = [‘f***’,‘a**’,‘b*****’]
@naughty_words.each_with_index do |cuss,count|
text.gsub!(/\b#{cuss}\b/i, @replace_with[count])
end
text
end

Alex Y. wrote:

Mohammad wrote:

Hello, I just had a problem with someone cusing on my rail app now is
there somthing like Red Cloth that I can use to disable ‘dirty words’

naughty_words = [‘poo’,‘darn’,‘sugar’,‘heffalumps’]
naughty_words.each do |cuss|
comment.gsub!(/\b#{cuss}\b/i, ‘FLUFFY BUNNIES’)
end

Needn’t be much more complicated than that…

Hmm. This is what I wrote
def show
@pm = Pm.find(params[:id])
if @pm.to_id != @session[:user].id
render :text => “Dont try to cheat the system.”
end
@body2 = filter(@pm.body)
end

def filter(text)
@naughty_words = [‘fuck’,‘ass’,‘bastered’]
@replace_with = [‘f***’,‘a**’,‘b*****’]
@count = 0
@naughty_words.each do |cuss|
text.gsub!(/\b#{cuss}\b/i, @replace_with[@count])
@count += 1
end
end

and its just displaying the ones that are in the @pm.body got any idea
why? did the gsub mess up somewhere, (not good with gsub sorry).

FLUFFY BUNNIES i can’t FLUFFY BUNNIES stand FLUFFY BUNNIES censors, the
FLUFFY BUNNIES!

Mohammad wrote:

Needn’t be much more complicated than that…

and its just displaying the ones that are in the @pm.body got any idea
why? did the gsub mess up somewhere, (not good with gsub sorry).

You’re using gsub!, which modifies the string in place. However, from
the looks of things, you want to return the corrected string for display
(I’m assuming that’s what the @body variable is for). The return value
of your filter() method will be the return value of the
@naughty_words.each() call, though, not the corrected string. I’ve
been caught out by this before - the return value of an each() call is
the original list, not anything that you did to it.

Also, the way you’re picking the replacement is actually wrong. It will
cycle through the replace_words one by one, then fail after 3
replacements.

You want something like this:

def filter(text)
@naughty_words = [‘poo’,‘darn’,‘sugar’,‘heffalumps’]
@replace_with = [‘p**’,‘d***’,‘s****’,‘h*********’]
@naughty_words.each do |cuss|
text.gsub!( /\b#{cuss}\b/i,
@replace_with[@naughty_words.index(cuss)])
end
return text
end

Better would be to use a hash:

def filter(text)
@naughty_words = {‘poo’ => ‘p**’,
‘darn’ => ‘d***’,
‘sugar’ => ‘s****’,
‘heffalumps’ => ‘h*********’}

@naughty_words.each_pair do |cuss, replacement|
text.gsub!( /\b#{cuss}\b/i, replacement )
end
end

Note that in the second case I’m not returning text. That’s because
gsub! acts in place, so in your original show method you don’t need to
assign to @body2 - you can just use the @pm.body value directly.

Hope that makes things a little clearer…

First, gsub with an exclamation mark on the end modifies the string
in-place, so the filter function will modify @pm.body. Also, your
filter function doesn’t return a string, it returns the
@naughty_words array.

Before you read any further, please note that some of my suggestions
may be slightly ridiculous.

Perhaps a good idea would be to just have a naughty_words array
rather than both naughty_words and replace_with. Like this:

NAUGHTY_WORDS = %w(roses kittens)

def filter(text)
text_to_filter = text.dup
NAUGHTY_WORDS.each do |word|
text_to_filter.gsub!(/\b#{word}\b/i, word[0,1] + (“*” *
(word.size-1)))
end
text_to_filter
end

First, gsub with an exclamation mark on the end modifies the string
in-place, so the filter function will modify @pm.body. Also, your
filter function doesn’t return a string, it returns the
@naughty_words array.

Also, perhaps a good idea would be to just have a naughty_words array
rather than both naughty_words and replace_with. Like this:

NAUGHTY_WORDS = %w(roses kittens)

def filter(text)
text_to_filter = text.dup
NAUGHTY_WORDS.each do |word|
text_to_filter.gsub!(/\b#{word}\b/i, word[0,1] + (“*” *
(word.size-1)))
end
text_to_filter
end

puts filter(“Raindrops on roses and whiskers on kittens.”)

This outputs “Raindrops on r**** and whiskers on k******.” Assuming
you want to preserve the first letter of naughty words…

First, it duplicates the string passed to it, so it doesn’t operate
on the actual body of the message you’d like to send. From the way
you call the function, that seems to be what you’re expecting to
happen. Then, it iterates through the naughty words, nothing new with
that. For each naughty word, it calls gsub! on the duplicated string,
looking for the word and replacing it with the word’s first letter
and asterisks for the rest of the word’s letters. After iterating
through all the naughty words, it returns the cleaned text.

Oh, and I’ve made the naughty characters a constant. Maybe a good
idea, maybe not? I’m not sure. You could also have a naughty_words
method that just returns the array of naughty words:

def naughty_words
%w(roses kittens)
end

Then later if you want to keep the list of naughty words somewhere
else, like in your database or in a file, the naughty_words method
could take care of reading the words from that other place and just
return an array of words.

You might also want to put your filtering in a helper method, and
call it from a view, instead of setting the @body2 instance variable.

Also maybe just replacing naughty words with four asterisks would be
best, then they’re obfuscated more. You could do this:

def filter(text)
text_to_filter = text.dup
naughty_words.each do |word|
text_to_filter.gsub!(/\b#{word}\b/i, “****”)
end
text_to_filter
end

Or in one line:

def filter(text)
text.gsub(/\b(#{naughty_words.join(‘|’)})\b/i, “****”)
end

That one doesn’t use the in-place gsub (it returns a copy of the
string) and builds up a regular expression instead of iterating
through each word.

Replacing the naughty words with random characters would be cute. You
could do this…

CLEAN_CHARS = “!@$%*”

def naughty_words
%w(roses kittens)
end

def clean_word_for(word)
Array.new(word.size).fill{ CLEAN_CHARS.slice(rand
(CLEAN_CHARS.size),1) }.join
end

def filter(text)
text_to_filter = text.dup
naughty_words.each do |word|
text_to_filter.gsub!(/\b#{word}\b/i, clean_word_for(word))
end
text_to_filter
end

puts filter(“Raindrops on roses and whiskers on kittens. Roses and
roses.”)

Which outputs something like “Raindrops on *!@%% and whiskers on %!%$
$!@. *!@%% and *!@%%.” And you could shorten the filter method as
before:

def filter(text)
text.gsub(/\b(#{naughty_words.join(‘|’)})\b/i) { clean_word_for
($1) }
end

This uses the block form of gsub. See the documentation here: http://
ruby-doc.org/core/classes/String.html#M001889

– Michael D.
http://www.mdaines.com

Randy W. Sims wrote:

Wont that look through comment once for each word. That seems expensive.
Measure it… You might be surprised. What seems like it should be
expensive often isn’t, and vice versa…

match = words.keys.join(’|’)

look through text once making substitutions

text.gsub!(/\b(#{match})\b/) { |match| words[match] }

For large blocks of text, that’ll turn out to be very expensive. Two
things in regular expressions are slow: keeping back-references, and
backtracking. The alternation you’ve got there implies both.

The relative speed you’ll end up with will depend on the number of words
to be substituted, their density in the text, their similarity (I think

  • not entirely certain how much Ruby’s regex engine optimises
    alternations), and the statistics of the rest of the text. For an
    extremely naive test (attached), I’ve measured an each() loop being 3
    times faster than an alternation on longish (16KB-160KB) text blocks,
    but that’s with a very high match rate.

I’m sure Zed will pick holes in my methods (and they’re huge :slight_smile: but
the principle stands…

Alex Y. wrote:

Needn’t be much more complicated than that…

Wont that look through comment once for each word. That seems expensive.
Maybe an alternation would be more efficient for large blocks of text:

text = ‘This darn thing smells like poo.’

words = {
‘poo’ => ‘#@%’,
‘darn’ => ‘$@%!’,
}

create alternation

match = words.keys.join(’|’)

look through text once making substitutions

text.gsub!(/\b(#{match})\b/) { |match| words[match] }

Randy.

I’d like to be a bit more helpful on this topic but honestly I
couldn’t give a FLUFFY BUNNIES.


Giles B.
http://www.gilesgoatboy.org