Using gsub to remove embedded newlines in HTML file

I have an HTML file that is in a string.

I want to use gsub! to recursively remove any embedded newlines and
whitespace within two known delimeters.

Given a string that includes this kind of string:

~^LNK:http://slashdot.org/login.pl?op=newuserform~
Create a new account
^~

I want to replace the above with:

~^LNK:http://slashdot.org/login.pl?op=newuserform~Create a new account^~

(stripping out the newlines and whitespace)

Having trouble writing the regex for this.

I think I want something like:

/~^LNK:.?([\s\r\n])+.?^~/

that I could use in:

str.gsub!(/~^LNK:.?([\s\r\n])+.?^~/, ‘’)

to replace all of the whitespace, or potential newline characters with
null strings.

But I don’t think this will work because I really need to loop within
each substring of my large HTML string. The thing about gsub is that it
will substitute the entire matched string.

Do I need to scan out the ~^LNK.*?^~, operate on those and then put them
back into the larger string?

I’m not sure I’m asking this very well, so I apologize if that’s the
case.

Thanks,
Wes

Something like:

@html.scan(/~\^LNK:.*?\^~/mi).each do |link_line|
  new_link_line = link_line.gsub(/[\s\r\n]/, '')
  @html.gsub!(/#{link_line}/mi, new_link_line)
end

Wes G. wrote:

Something like:

@html.scan(/~\^LNK:.*?\^~/mi).each do |link_line|
  new_link_line = link_line.gsub(/[\s\r\n]/, '')
  @html.gsub!(/#{link_line}/mi, new_link_line)
end

This seems to work well:

@html.scan(/~^LNK:.*?^~/mi).each do |link_line|
new_link_line = link_line.gsub(/[\t\r\n]/, ‘’)
@html.gsub!(/#{Regexp.escape(link_line)}/mi, new_link_line) if
link_line != new_link_line
end

I wonder if I could have done with with one @html.gsub!() command, but
this is much more understandable to me anyway so I’ll stick with this.

Thanks,
Wes

Wes G. wrote:

This seems to work well:

@html.scan(/~^LNK:.*?^~/mi).each do |link_line|
new_link_line = link_line.gsub(/[\t\r\n]/, ‘’)
@html.gsub!(/#{Regexp.escape(link_line)}/mi, new_link_line) if
link_line != new_link_line
end

You can use a block with gsub:
@html.gsub!(/~^LNK:.*?~/mi) { |s| s.gsub /\s/, ‘’ }

or something like that.

Good luck.

Thanks. That is the Ruby way to do it, and that’s what I wanted to
know :).

I’ve used blocks with gsub but I keep forgetting that I can put anything
in there - so far I’ve only used backrefs to pull out pieces of the
matching regex to rearrange things.

Wes

This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.

| Privacy Policy | Terms of Service | Remote Ruby Jobs