Remove all illegal chars form string

Hi,

Is there a simple way to remove all but the legal chars from a string.
where the legal chars are for example: a-z A-Z 0-9
So everything should be removed from the string but these characters.
[email protected] Le3|§” --> “exampLe”

I don’t know very much about regular expressions, so I don’t know if
it’s possible with sub or gsub. My first Idea was to loop over the
string and check every character but I wondered if there is something
more simple or better.

Thanks

thomas coopman wrote:

Hi,

Is there a simple way to remove all but the legal chars from a string.
where the legal chars are for example: a-z A-Z 0-9
So everything should be removed from the string but these characters.
[email protected] Le3|�” --> “exampLe”

I don’t know very much about regular expressions, so I don’t know if
it’s possible with sub or gsub. My first Idea was to loop over the
string and check every character but I wondered if there is something
more simple or better.

Thanks

irb(main):006:0> s = “abcd??ABCD!!0123”
=> “abcd??ABCD!!0123”
irb(main):007:0> s
=> “abcd??ABCD!!0123”
irb(main):008:0> s.tr(’^a-zA-Z0-9’,’’)
=> “abcdABCD0123”

thomas coopman wrote:

Hi,

Is there a simple way to remove all but the legal chars from a string. where the legal chars are for example: a-z A-Z 0-9
So everything should be removed from the string but these characters.
[email protected] Le3|§” --> “exampLe”

I don’t know very much about regular expressions, so I don’t know if it’s possible with sub or gsub. My first Idea was to loop over the string and check every character but I wondered if there is something more simple or better.

Thanks

str.gsub(/[^a-zA-Z]/, ‘’) should do it.

On Jun 28, 2006, at 8:09 AM, thomas coopman wrote:

Hi,

Is there a simple way to remove all but the legal chars from a
string. where the legal chars are for example: a-z A-Z 0-9
So everything should be removed from the string but these characters.
[email protected] Le3|§” --> “exampLe”

string.delete("^a-zA-Z0-9")

Hope that helps.

James Edward G. II

“t” == thomas coopman [email protected] writes:

t> Is there a simple way to remove all but the legal chars from a
string.
t> where the legal chars are for example: a-z A-Z 0-9
t> So everything should be removed from the string but these characters.
t> “[email protected] Le3|§” --> “exampLe”

Well you can try with String#tr

moulon% ruby -e ‘p “[email protected] Le3|§”.tr("^a-zA-Z0-9", “”)’
“exampLe3”
moulon%

which means replace all characters, except a-z A-Z 0-9, with “”

Guy Decoux

sender: “thomas coopman” date: “Wed, Jun 28, 2006 at 10:09:35PM +0900” <<<EOQ
Hi,
Hi,

Is there a simple way to remove all but the legal chars from a string.
where the legal chars are for example: a-z A-Z 0-9
So everything should be removed from the string but these characters.
[email protected] Le3|§” --> “exampLe”
Taking your example, if legal chars are a-z A-Z 0-9 then
the output of:
[email protected] Le3|§”
should be:
“exampLe3” and not “exampLe”…

I don’t know very much about regular expressions, so I don’t know if
it’s possible with sub or gsub. My first Idea was to loop over the
string and check every character but I wondered if there is something
more simple or better.
Yes, this is why regexps were invented :slight_smile:

irb

irb(main):001:0> “[email protected] Le3|§”.gsub(/[^a-zA-Z0-9]/,’’)
=> “exampLe3”

Thanks
You’re welcome,
Alex

On 6/28/06, thomas coopman [email protected] wrote:

Hi,

Is there a simple way to remove all but the legal chars from a string. where the legal chars are for example: a-z A-Z 0-9
So everything should be removed from the string but these characters.
[email protected] Le3|§” --> “exampLe”

I don’t know very much about regular expressions, so I don’t know if it’s possible with sub or gsub. My first Idea was to loop over the string and check every character but I wondered if there is something more simple or better.

It’s easy with gsub and a regex:

stringToCheck.gsub!(/[^a-zA-Z0-9]/, “”)

The ‘^’ inverses the list of characters.

Les

On Jun 28, 2006, at 9:09 AM, thomas coopman wrote:

something more simple or better.

Thanks

The regexp for such a thing would be:

[email protected] Le3|§”.gsub(/[^a-zA-Z0-9]/, “”)
=> “exampLe3” (you listed 3 as a legal character in the above
email. /[^a-zA-Z]/ would remove numbers as well)

Probably about time to learn some regular expressions. Have a look
at http://www.regular-expressions.info/tutorial.html
You’ll really start to like them once you learn even just the basic
matching ideas.
-Mat

Also, if you’re willing to accept underscores in the accepted character
list, you could just use the \W character class, which is equal to
[^A-Za-z0-9_].

This is definitely regex territory. And gsub() is the thing:

ex = “[email protected] Le3|§”
puts ex.gsub( /[^A-Za-z0-9]/, ‘’ )

exampLe3

Or

puts “[email protected] Le3|§”.gsub( /[^A-Za-z0-9]/, ‘’ )

exampLe3

I assume you wanted the 3 in there, since you asked for numbers in your
range of characters. Whe doing something like this, in my opinion, it’s
best to not try to roll your own.

On 28/06/06, thomas coopman [email protected] wrote:

Is there a simple way to remove all but the legal chars from a string. where the legal chars are for example: a-z A-Z 0-9
So everything should be removed from the string but these characters.
[email protected] Le3|§” --> “exampLe”

Yes, there is:

str = “[email protected] Le3|§”
str.gsub(/[^a-z0-9]/i, ‘’) # => “exampLe3”

Paul.

On Jun 28, 2006, at 14:09, thomas coopman wrote:

something more simple or better.

Thanks

It’s dead easy:

test = “[email protected] Le3|§”
test.gsub(/[^A-Za-z0-9]/, ‘’)
=> “exampLe3”

Quick explanation:

[] defines a group you want to treat as one character. If you only
wanted vowels, ferinstance, you’d use [aeiou].

^ as the first character in a group means the opposite of that
group. Everything that isn’t a vowel would be like this: [^aeiou]

A-Z is a range of characters, and you can do smaller ranges like c-q
or whatever, the ordering used to determine the range is the
character encoding. This means you can just say A-z in place of A-Za-
z (at least in ASCII and compatible encodings - I don’t know about
anything else), but I think that tends to make things a little less
clear, especially since a-Z is invalid because a > Z in ASCII.

There are also shortcuts for some classes of characters, \d is
equivalent to [0-9], and \w is close to [A-z0-9] but also includes
the underscore character ‘_’.

So the regular expression says ‘match any single character that is
not in the ranges A-Z, a-z, or 0-9’. #gsub takes everything matched
by the regular expression, and replaces it with nothing.

matthew smillie.

Is there a simple way to remove all but the legal chars from
a string. where the legal chars are for example: a-z A-Z 0-9
So everything should be removed from the string but these
characters. “[email protected] Le3|§” --> “exampLe”

p “[email protected] Le3|§”.gsub(/[^a-zA-Z0-9]/, “”)

gegroet,
Erik V. - http://www.erikveen.dds.nl/

Is there a simple way to remove all but the legal chars from a string. where the legal chars are for example: a-z A-Z 0-9
So everything should be removed from the string but these characters.
[email protected] Le3|§” --> “exampLe”

I don’t know very much about regular expressions, so I don’t know if it’s possible with sub or gsub. My first Idea was to loop over the string and check every character but I wondered if there is something more simple or better.

str="[email protected]  L_e3|�"
puts str.gsub(/[^a-zA-Z0-9]/, "")
# yields:
#	exampLe3

a bit shorter would be

puts str.gsub(/\W/, "")
# but "word"-characters (\w) and "non-word"-characters (\W) also
# contain the' _', so this would yield:
#	exampL_e3

Benedikt

ALLIANCE, n. In international politics, the union of two thieves who
have their hands so deeply inserted in each other’s pockets that
they cannot separately plunder a third.
(Ambrose Bierce, The Devil’s Dictionary)

On Wed, 2006-06-28 at 22:09 +0900, thomas coopman wrote:

Is there a simple way to remove all but the legal chars from a string.
where the legal chars are for example: a-z A-Z 0-9
So everything should be removed from the string but these characters.
[email protected] Le3|§” --> “exampLe”

I don’t know very much about regular expressions, so I don’t know if
it’s possible with sub or gsub. My first Idea was to loop over the
string and check every character but I wondered if there is something
more simple or better.

An except from my upcoming book, Ruby Phrasebook:

“”"
new_password = gets
if new_password.count ‘^A-Za-z._’ != 0 then
puts “Bad Password”
else
#do something
end

This works by using a special syntax that’s shared by .count, .tr,
delete, and squeeze. A parameter beginning with ^ negates the list; the
list consists of any valid characters in the active character set and
may contain ranges formed with -. If more than one parameter list is
given to these functions, the lists of characters are intersected using
set logic[md]that is, only characters in both lists are used for
filtering.

You might also want to simply replace all “evil” characters with _ (such
as perhaps from a CGI form post):

evil_input = ‘cat /etc/passwd

evil_input.tr(’./`’, ‘_’)

#=> “_cat etc_passwd
“”"

In your specific question, you will want to use .delete:

[email protected] Le3|§’.delete ‘^A-Za-z’
#=> “exampLe”

On Wednesday 28 June 2006 14:09, thomas coopman wrote:

Is there a simple way to remove all but the legal chars from a string.
where the legal chars are for example: a-z A-Z 0-9 So everything should be
removed from the string but these characters. “[email protected] Le3|§” --> “exampLe”

[email protected] Le3|§”.gsub(/\W/, ‘’)

will return

“exampLe3”

I strongly suggest you learn about regular expressions. You can start
here,
http://en.wikipedia.org/wiki/Regular_expression , there are many links
to
many tutorials. Here, http://www.rubycentral.com/book/tut_stdtypes.html
, you
can find info on ruby regular expressions ; though it might be best to
get
yourself a Ruby book.

Anselm

On 6/28/06, thomas coopman [email protected] wrote:

You’ve hit the nail on the head. Use gsub on the string.

[email protected] Le3|§”.gsub(/\W/, ‘’) # --> exampLe3

The \W in the regular expression matches every character that is not a
valid word character: i.e. [^a-zA-Z0-9_]

Blessings,
TwP

Thomas,

s = “[email protected] Le3|§”
s.gsub(/[^a-zA-Z0-9]/, ‘’) # => “exampLe”

Thanks,

David

So everything should be removed from the string but these characters.
[email protected] Le3|§” --> “exampLe”

I don’t know very much about regular expressions, so I don’t
know if it’s possible with sub or gsub.

Character sets with ranges: [a-z]
Negated sets: [^a-z]

[email protected] Le3|§”.gsub(/[^a-zA-Z0-9]/,’’) => “exampLe3”

ben

thomas coopman wrote:

Hi,

Is there a simple way to remove all but the legal chars from a string. where the legal chars are for example: a-z A-Z 0-9
So everything should be removed from the string but these characters.
[email protected] Le3|§” --> “exampLe”

The easiest way to find stuff is to search comp.lang.ruby through
Google groups:

http://groups.google.com/group/comp.lang.ruby/browse_frm/thread/c9b63420fe8f66a9?q=remove+non-ASCII&

ruby-talk-google was set up in late April and doesn’t have much
searchable history

This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.

| Privacy Policy | Terms of Service | Remote Ruby Jobs