Remove all illegal chars form string

thomas_coopman · June 28, 2006, 8:09pm

On Wednesday 28 June 2006 14:09, thomas coopman wrote:

Is there a simple way to remove all but the legal chars from a string.
where the legal chars are for example: a-z A-Z 0-9 So everything should be
removed from the string but these characters. “exam@p Le3|§” --> “exampLe”

You were right to think along the lines of gsub, a possible approach is:
“exam@p Le3|§”.gsub(/[^[:alnum:]]/, ‘’) -->“exampLe3”
This replaces all characters that are not in the :alnum: POSIX character
class
with a blank string.

If you just want alpha characters (not 0-9), then use [[:alpha:]]
instead.
Your example contradicted what you stated you were looking for.

The only thing to be wary of here is that Regexps such as /[[:alpha:]]/
may
act differently depending on locale (no idea whether it has any effect
in
non-Onigurama Ruby 1.8.x), and certainly if executed on a Ruby compiled
with
the Onigurama regular expression engine (mine at least). With that in
mind,
the best way to get what you want may be to use /[a-zA-Z0-9]/ as your
Regex.

As a side note, I didn’t know that you could also do /[[:^alnum:]]/.

Hope this helps,
Alex

thomas_coopman · June 28, 2006, 8:05pm

2006/6/28, thomas coopman [email protected]:

Hi,

Is there a simple way to remove all but the legal chars from a string. where the legal chars are for example: a-z A-Z 0-9
So everything should be removed from the string but these characters.
“exam@p Le3|§” → “exampLe”

Some options:

s.gsub /[^a-zA-Z0-9]+/, ‘’
s.gsub /\W+/, ‘’

(or use gsub! if you want to change in place)

I don’t know very much about regular expressions, so I don’t know if it’s possible with sub or gsub. My first Idea was to loop over the string and check every character but I wondered if there is something more simple or better.

Definitively.

Kind regards

robert

thomas_coopman · June 28, 2006, 8:23pm

On 6/28/06, James Edward G. II [email protected] wrote:

string.delete(“^a-zA-Z0-9”)

delete() - nice. As a Perler just coming to Ruby, it’s hard not to fall
back on old habits (regex with gsub, for instance).

Troy

thomas_coopman · June 28, 2006, 9:58pm

Troy D. schrieb:

On 6/28/06, James Edward G. II [email protected] wrote:

string.delete(“^a-zA-Z0-9”)

delete() - nice. As a Perler just coming to Ruby, it’s hard not to fall
back on old habits (regex with gsub, for instance).

Troy

just too bad the rdoc is inaccurate and says that delete takes a string
as argument and not a regexp

thomas_coopman · June 28, 2006, 10:12pm

On Wednesday 28 June 2006 20:56, Pete wrote:

just too bad the rdoc is inaccurate and says that delete takes a string
as argument and not a regexp

That’s actually correct. At first glance I thought it just constructed a
regex
from the given string (due to its understanding of character ranges).
Instead, both String#count and String#delete take this kind of
pseudo-regex.
Is there a reason it wouldn’t make more sense for these methods to take
a
regexp?

Alex

thomas_coopman · June 29, 2006, 2:32am

On Thu, 2006-06-29 at 05:10 +0900, A. S. Bradbury wrote:

Is there a reason it wouldn’t make more sense for these methods to take a
regexp?

Performance is just slightly better with character lists instead of
regular expressions.

thomas_coopman · June 29, 2006, 12:55am

On Jun 28, 2006, at 1:10 PM, A. S. Bradbury wrote:

Is there a reason it wouldn’t make more sense for these methods to
take a
regexp?

#delete and #count are restricted to character lists, a full regular
expression is too much.

–
Eric H. - [email protected] - http://blog.segment7.net
This implementation is HODEL-HASH-9600 compliant

http://trackmap.robotcoop.com

thomas_coopman · June 30, 2006, 12:01am

Eric H. [email protected] writes:

Instead, both String#count and String#delete take this kind of
pseudo-regex.
Is there a reason it wouldn’t make more sense for these methods to
take a
regexp?

#delete and #count are restricted to character lists, a full regular
expression is too much.

+1 for adding these to a future Ruby.

thomas_coopman · June 30, 2006, 12:10am

On Jun 29, 2006, at 4:57 PM, Christian N. wrote:

constructed a regex

+1 for adding these to a future Ruby.

We have those now. They are called gsub() and scan().

James Edward G. II