rpheath wrote:
Thanks for the reply. I’m relatively new to regular expressions, and
misinterpretted the ^s and $s. I was thinking they were for that
specific check, so it was either the first string “|” (or) the second
string.
Here’s sample text that would be passed into it.
This is the first sentence. Now I'll post a code snippet:
def strip_blocks(text)
text.gsub([regex],'')
end
This is another sentence before the block quote.
This is a quote
This is one more sentence
----------------------
What I would like to have left is this:
This is the first sentence. Now I'll post a code snippet:
This is another sentence before the block quote.
This is one more sentence
----------------------
Hopefully that helps. Sorry the question is not organized and kind of
basic, but I’m new to this. Thanks again for any help.
Try this. It uses the “non-greedy” operator ‘?’ and multiline
case-insensitive matching. Not using the ‘non-greedy’ operator would
gobble up everything between two tags, including nested tags of the
same name. This is probably not what you would want.
def remove_tag_block(tag, text)
text.gsub(/<#{tag}>.*?</#{tag}>/im, ‘’)
end
irb(main):054:0> text
=> “
This is the first sentence. Now I’ll post a code
snippet:
\n\n
\ndef strip_blocks(text)\n
text.gsub([regex],‘’)\nend\n
\n\n
This is another sentence before
the block quote.
\n\n
\n This is a
quote
\n
\n\n
This is one more sentence
”
irb(main):055:0> t=remove_tag_block(“pre”, text)
=> “
This is the first sentence. Now I’ll post a code
snippet:
\n\n\n\n
This is another sentence before the block
quote.
\n\n
\n This is a
quote
\n
\n\n
This is one more sentence
”
irb(main):056:0> remove_tag_block(“blockquote”, t)
=> “
This is the first sentence. Now I’ll post a code
snippet:
\n\n\n\n
This is another sentence before the block
quote.
\n\n\n\n
This is one more sentence
”
The problem is that this won’t work with nested tags, e.g.
irb(main):065:0>
x=“
”
=> “
”
irb(main):066:0> remove_tag_block(“table”, x)
=> “”
This is because regular regular expressions can’t match nested
pairs, such as “((()(())()))”. I think I read somewhere a phrase that
regexp’s can’t count. You have to use recursive regular expressions,
which are found in PCRE (Perl RE), but AFAIK not in the current Ruby
regexp engine. Maybe Oniguruma has it - I dunno. I saw a PCRE extension
for Ruby somewhere, but I don’t know anything about it.
The Perl RE for matching nested parentheses is apparently as follows
(from
The Joy of Regular Expressions [1] — SitePoint)
(((?>[^()]+)|(?R))*)
I believe that to do this correctly without PCRE, you have to resort to
some text parsing or use a SAX parser or similar. Maybe some Ruby guru
(i.e. not me) will be able to pull out an RE or some easy way to do
this.