Removing Non Alpha & Numeric Characters From String

Hello.

We have book titles as a column in our database, which I would like to
use in our URLS for SEO purposes. Given that these are titles, they
include characters other than alphabets and numbers (e.g. punctuation,
blanks, foreign characters in some cases).

What’s the easiest way to do this? Here is some more information:

Original string:

  On One Flower: Butterflies, Ticks and a Few More Icks

What I would like to see:

  on-one-flower-butterflies-ticks-and-a-few-more-icks

I’m currently doing something like this; is there a better way?

  title.squeeze.downcase.tr("(),? !':.[]", "-").gsub('--', '-')

Thanks in advance.

This is arguably a tad bit prettier:

title.downcase.gsub(/[^a-z ]/, ‘’).gsub(/ /, ‘-’)

Not sure if it’s that much better…

Roy P. wrote:

This is arguably a tad bit prettier:

title.downcase.gsub(/[^a-z ]/, ‘’).gsub(/ /, ‘-’)

Not sure if it’s that much better…

That works great (and looks prettier :-)! Thanks.

Well, Hassan makes a good point that this will eat any non-ascii
characters. Consider whether you want to do that. If you don’t, you’ll
likely have to url-encode the result (I don’t think e.g., accented
characters are usable in URLs, are they?

On Oct 2, 2008, at 10:12 PM, Hassan S. wrote:

Firefox 2 turns this into: http://localhost/sample/Chrétien.txt
while Safari requests http://localhost/sample/Chrétien.txt

Even though Safari does indeed display the accentuated characters in
its UI, it does encode the URL properly when sending the HTTP request
to the server… take a look at your log…

But the main thing is that, regardless, the non-US-ASCII name is used
to match the resource in the file system.

Well, yes… once it has been decoded from the HTTP request back to
its original form…

Cheers,


PA.
http://alt.textdrive.com/nanoki/

Hi, guys. I ended up doing this for now and it works for us (for now):

title.downcase.gsub(/[(,?!'":.)]/, ‘’).gsub(’ ', ‘-’).gsub(/-$/, ‘’)

Also, since this is for SEO purposes only, I basically don’t use the
parameter in my code. For example, I have a /mycontroller/:id/:dummy/
in my routes.rb (dummy being the above book title).

Thanks.

Petite A. wrote:

On Oct 2, 2008, at 10:12 PM, Hassan S. wrote:

Firefox 2 turns this into: http://localhost/sample/Chrétien.txt
while Safari requests http://localhost/sample/Chr�tien.txt

Even though Safari does indeed display the accentuated characters in
its UI, it does encode the URL properly when sending the HTTP request
to the server… take a look at your log…

But the main thing is that, regardless, the non-US-ASCII name is used
to match the resource in the file system.

Well, yes… once it has been decoded from the HTTP request back to
its original form…

Cheers,


PA.
http://alt.textdrive.com/nanoki/

On Oct 3, 2008, at 4:54 PM, Ben K. wrote:

Hi, guys. I ended up doing this for now and it works for us (for
now):

title.downcase.gsub(/[(,?!'":.)]/, ‘’).gsub(’ ', ‘-’).gsub(/-$/,
‘’)

What about multiple dashes in the middle of the title?

For example, given:

Primetime Emmy Award for Outstanding Lead Actress - Miniseries or a
Movie

One would expect:

primetime-emmy-award-for-outstanding-lead-actress-miniseries-or-a-movie

Note the transition between ‘Actress’ and ‘Miniseries’.

Cheers,


PA.
http://alt.textdrive.com/nanoki/