Reg Expression Help Please?

Hi all

I’m having a bit of trouble trying to acheive something, maybe someone
can help?

I have a model Article, which has an attribute ‘body’

The body is a text column in which people can add text and HTML.

I’d like to edit certain properties of some of the HTML tags, for
example, converting all spaces inside tags to  

I presume using a reg expression is the best way to achieve this, I’m
just not sure of how to word an expression to scan for only characters
within the html tags.

any ideas?

Ta

Gavin

You don’t want to fall into the rat hole of parsing HTML with
regexes. You need a parsing library like hpricot or similar.

http://wiki.github.com/why/hpricot

good luck!
Tim

Use a CSS rule to style it?

code { white-space: pre; }

or:

code { white-space: nowrap; }

You’re just asking for a headache if you try to match paired and
possibly nested HTML tags with a Regexp. (Not that it can’t be done,
but it gets ugly fast and you need a very capable regular expression
engine like Oniguruma from Ruby1.9)

-Rob

http://www.w3.org/TR/CSS2/text.html#white-space-prop

On Apr 29, 2009, at 2:09 PM, Gavin wrote:

I’d like to edit certain properties of some of the HTML tags, for
Gavin

Rob B. http://agileconsultingllc.com
[email protected]

The body is a text column in which people can add text and HTML.

If this is the use case, then you want to protect what the user
enters.

Using RedCloth or other markup library might do a better job of what
you are trying to achieve.

I’d like to edit certain properties of some of the HTML tags, for
example, converting all spaces inside tags to  

Take a look at CGI#escape and CGI#escapeElement

I was planning on editing the text blob before it’s saved to the
database.

So a blob like:
“This is a big blob of text
This is the code part
This is another line”

would be converted to:
“This is a big blob of text
This is the code part
This is another line”

And then called back as <%= @article.body %> in the view.

I thought that was a simple option.

Would hpricot be appropriate here?

I should also add that I plan on having a safe-list of tags, so any
potentially harmful tags like would be removed

Does that clarify at all?

Thanks

I had planned on formatting the code tags with CSS as you suggested
Rob but I also need to wrap specific words in spans to specify their
colour

Actually…

Just found this => http://coderay.rubychan.de/

looks perfect for my needs

Thanks for your suggestions guys

Hi all

Managed to solve this issue with CodeRay

The site is now up and running - here is a quick tutorial on how I did
it incase anybody else wants to do the same:

:slight_smile: