Hi,
Is there a simple way to remove all but the legal chars from a string.
where the legal chars are for example: a-z A-Z 0-9
So everything should be removed from the string but these characters.
“exam@p Le3|§” --> “exampLe”
I don’t know very much about regular expressions, so I don’t know if
it’s possible with sub or gsub. My first Idea was to loop over the
string and check every character but I wondered if there is something
more simple or better.
Thanks
thomas coopman wrote:
Hi,
Is there a simple way to remove all but the legal chars from a string.
where the legal chars are for example: a-z A-Z 0-9
So everything should be removed from the string but these characters.
“exam@p Le3|�” --> “exampLe”
I don’t know very much about regular expressions, so I don’t know if
it’s possible with sub or gsub. My first Idea was to loop over the
string and check every character but I wondered if there is something
more simple or better.
Thanks
irb(main):006:0> s = “abcd??ABCD!!0123”
=> “abcd??ABCD!!0123”
irb(main):007:0> s
=> “abcd??ABCD!!0123”
irb(main):008:0> s.tr(’^a-zA-Z0-9’,’’)
=> “abcdABCD0123”
thomas coopman wrote:
Hi,
Is there a simple way to remove all but the legal chars from a string. where the legal chars are for example: a-z A-Z 0-9
So everything should be removed from the string but these characters.
“exam@p Le3|§” --> “exampLe”
I don’t know very much about regular expressions, so I don’t know if it’s possible with sub or gsub. My first Idea was to loop over the string and check every character but I wondered if there is something more simple or better.
Thanks
str.gsub(/[^a-zA-Z]/, ‘’) should do it.
On Jun 28, 2006, at 8:09 AM, thomas coopman wrote:
Hi,
Is there a simple way to remove all but the legal chars from a
string. where the legal chars are for example: a-z A-Z 0-9
So everything should be removed from the string but these characters.
“exam@p Le3|§” --> “exampLe”
string.delete("^a-zA-Z0-9")
Hope that helps.
James Edward G. II
“t” == thomas coopman [email protected] writes:
t> Is there a simple way to remove all but the legal chars from a
string.
t> where the legal chars are for example: a-z A-Z 0-9
t> So everything should be removed from the string but these characters.
t> “exam@p Le3|§” → “exampLe”
Well you can try with String#tr
moulon% ruby -e ‘p “exam@p Le3|§”.tr(“^a-zA-Z0-9”, “”)’
“exampLe3”
moulon%
which means replace all characters, except a-z A-Z 0-9, with “”
Guy Decoux
sender: “thomas coopman” date: “Wed, Jun 28, 2006 at 10:09:35PM +0900” <<<EOQ
Hi,
Hi,
Is there a simple way to remove all but the legal chars from a string.
where the legal chars are for example: a-z A-Z 0-9
So everything should be removed from the string but these characters.
“exam@p Le3|§” --> “exampLe”
Taking your example, if legal chars are a-z A-Z 0-9 then
the output of:
“exam@p Le3|§”
should be:
“exampLe3” and not “exampLe”…
I don’t know very much about regular expressions, so I don’t know if
it’s possible with sub or gsub. My first Idea was to loop over the
string and check every character but I wondered if there is something
more simple or better.
Yes, this is why regexps were invented
irb
irb(main):001:0> “exam@p Le3|§”.gsub(/[^a-zA-Z0-9]/,’’)
=> “exampLe3”
Thanks
You’re welcome,
Alex
On 6/28/06, thomas coopman [email protected] wrote:
Hi,
Is there a simple way to remove all but the legal chars from a string. where the legal chars are for example: a-z A-Z 0-9
So everything should be removed from the string but these characters.
“exam@p Le3|§” → “exampLe”
I don’t know very much about regular expressions, so I don’t know if it’s possible with sub or gsub. My first Idea was to loop over the string and check every character but I wondered if there is something more simple or better.
It’s easy with gsub and a regex:
stringToCheck.gsub!(/[^a-zA-Z0-9]/, “”)
The ‘^’ inverses the list of characters.
Les
On Jun 28, 2006, at 9:09 AM, thomas coopman wrote:
something more simple or better.
Thanks
The regexp for such a thing would be:
“exam@p Le3|§”.gsub(/[^a-zA-Z0-9]/, “”)
=> “exampLe3” (you listed 3 as a legal character in the above
email. /[^a-zA-Z]/ would remove numbers as well)
Probably about time to learn some regular expressions. Have a look
at Regular Expression Tutorial - Learn How to Use Regular Expressions
You’ll really start to like them once you learn even just the basic
matching ideas.
-Mat
Also, if you’re willing to accept underscores in the accepted character
list, you could just use the \W character class, which is equal to
[^A-Za-z0-9_].
This is definitely regex territory. And gsub() is the thing:
ex = “exam@p Le3|§”
puts ex.gsub( /[^A-Za-z0-9]/, ‘’ )
exampLe3
Or
puts “exam@p Le3|§”.gsub( /[^A-Za-z0-9]/, ‘’ )
exampLe3
I assume you wanted the 3 in there, since you asked for numbers in your
range of characters. Whe doing something like this, in my opinion, it’s
best to not try to roll your own.
On 28/06/06, thomas coopman [email protected] wrote:
Is there a simple way to remove all but the legal chars from a string. where the legal chars are for example: a-z A-Z 0-9
So everything should be removed from the string but these characters.
“exam@p Le3|§” → “exampLe”
Yes, there is:
str = “exam@p Le3|§”
str.gsub(/[^a-z0-9]/i, ‘’) # => “exampLe3”
Paul.
On Jun 28, 2006, at 14:09, thomas coopman wrote:
something more simple or better.
Thanks
It’s dead easy:
test = “exam@p Le3|§”
test.gsub(/[^A-Za-z0-9]/, ‘’)
=> “exampLe3”
Quick explanation:
[] defines a group you want to treat as one character. If you only
wanted vowels, ferinstance, you’d use [aeiou].
^ as the first character in a group means the opposite of that
group. Everything that isn’t a vowel would be like this: [^aeiou]
A-Z is a range of characters, and you can do smaller ranges like c-q
or whatever, the ordering used to determine the range is the
character encoding. This means you can just say A-z in place of A-Za-
z (at least in ASCII and compatible encodings - I don’t know about
anything else), but I think that tends to make things a little less
clear, especially since a-Z is invalid because a > Z in ASCII.
There are also shortcuts for some classes of characters, \d is
equivalent to [0-9], and \w is close to [A-z0-9] but also includes
the underscore character ‘_’.
So the regular expression says ‘match any single character that is
not in the ranges A-Z, a-z, or 0-9’. #gsub takes everything matched
by the regular expression, and replaces it with nothing.
matthew smillie.
Is there a simple way to remove all but the legal chars from
a string. where the legal chars are for example: a-z A-Z 0-9
So everything should be removed from the string but these
characters. “exam@p Le3|§” → “exampLe”
p “exam@p Le3|§”.gsub(/[^a-zA-Z0-9]/, “”)
gegroet,
Erik V. - http://www.erikveen.dds.nl/
Is there a simple way to remove all but the legal chars from a string. where the legal chars are for example: a-z A-Z 0-9
So everything should be removed from the string but these characters.
“exam@p Le3|§” --> “exampLe”
I don’t know very much about regular expressions, so I don’t know if it’s possible with sub or gsub. My first Idea was to loop over the string and check every character but I wondered if there is something more simple or better.
str="exam@p L_e3|�"
puts str.gsub(/[^a-zA-Z0-9]/, "")
# yields:
# exampLe3
a bit shorter would be
puts str.gsub(/\W/, "")
# but "word"-characters (\w) and "non-word"-characters (\W) also
# contain the' _', so this would yield:
# exampL_e3
Benedikt
ALLIANCE, n. In international politics, the union of two thieves who
have their hands so deeply inserted in each other’s pockets that
they cannot separately plunder a third.
(Ambrose Bierce, The Devil’s Dictionary)
On Wed, 2006-06-28 at 22:09 +0900, thomas coopman wrote:
Is there a simple way to remove all but the legal chars from a string.
where the legal chars are for example: a-z A-Z 0-9
So everything should be removed from the string but these characters.
“exam@p Le3|§” --> “exampLe”
I don’t know very much about regular expressions, so I don’t know if
it’s possible with sub or gsub. My first Idea was to loop over the
string and check every character but I wondered if there is something
more simple or better.
An except from my upcoming book, Ruby Phrasebook:
“”"
new_password = gets
if new_password.count ‘^A-Za-z._’ != 0 then
puts “Bad Password”
else
#do something
end
This works by using a special syntax that’s shared by .count, .tr,
delete, and squeeze. A parameter beginning with ^ negates the list; the
list consists of any valid characters in the active character set and
may contain ranges formed with -. If more than one parameter list is
given to these functions, the lists of characters are intersected using
set logic[md]that is, only characters in both lists are used for
filtering.
You might also want to simply replace all “evil” characters with _ (such
as perhaps from a CGI form post):
evil_input = ‘cat /etc/passwd
’
evil_input.tr(’./`’, ‘_’)
#=> “_cat etc_passwd”
“”"
In your specific question, you will want to use .delete:
‘exam@p Le3|§’.delete ‘^A-Za-z’
#=> “exampLe”
On Wednesday 28 June 2006 14:09, thomas coopman wrote:
Is there a simple way to remove all but the legal chars from a string.
where the legal chars are for example: a-z A-Z 0-9 So everything should be
removed from the string but these characters. “exam@p Le3|§” → “exampLe”
“exam@p Le3|§”.gsub(/\W/, ‘’)
will return
“exampLe3”
I strongly suggest you learn about regular expressions. You can start
here,
Regular expression - Wikipedia , there are many links
to
many tutorials. Here, http://www.rubycentral.com/book/tut_stdtypes.html
, you
can find info on ruby regular expressions ; though it might be best to
get
yourself a Ruby book.
Anselm
On 6/28/06, thomas coopman [email protected] wrote:
You’ve hit the nail on the head. Use gsub on the string.
“exam@p Le3|§”.gsub(/\W/, ‘’) # → exampLe3
The \W in the regular expression matches every character that is not a
valid word character: i.e. [^a-zA-Z0-9_]
Blessings,
TwP
Thomas,
s = “exam@p Le3|§”
s.gsub(/[^a-zA-Z0-9]/, ‘’) # => “exampLe”
Thanks,
David
So everything should be removed from the string but these characters.
“exam@p Le3|§” --> “exampLe”
I don’t know very much about regular expressions, so I don’t
know if it’s possible with sub or gsub.
Character sets with ranges: [a-z]
Negated sets: [^a-z]
“exam@p Le3|§”.gsub(/[^a-zA-Z0-9]/,’’) => “exampLe3”
ben
thomas coopman wrote:
Hi,
Is there a simple way to remove all but the legal chars from a string. where the legal chars are for example: a-z A-Z 0-9
So everything should be removed from the string but these characters.
“exam@p Le3|§” → “exampLe”
The easiest way to find stuff is to search comp.lang.ruby through
Google groups:
http://groups.google.com/group/comp.lang.ruby/browse_frm/thread/c9b63420fe8f66a9?q=remove+non-ASCII&
ruby-talk-google was set up in late April and doesn’t have much
searchable history