On Fri, Jul 15, 2011 at 07:41:17PM +0900, Rousan M. wrote:
to
<message value=“Validating element:”
I tried using gsub() with regex but so far haven’t been successful.
It seems to me you should make use of the non-greedy modifier, which is
?, for .* to indicate you want to match any characters up to a
particular
matching string in this case. Why don’t you share what you have for an
attempt at a useful regex, then we can offer modifications to yours
rather than just providing a complete solution from scratch?
This page offers some information about special characters in regexen:
an example of what you’ve tried would make things much better and
easier (no one wants to do your work for you!) i had a similar issue
dealing with parsing html pages, and i wound up writing a method for the
String class which replaces the text between one marker within a string
and another. it takes 3 or 4 arguments - 1st is a sub-string that is
the starting point marker, 2nd a sub-string that is the end point
marker, 3rd the new text to go between the markers, and an optional 4th
which makes the method global - and it happens to work with your
example
some things to maybe think about:
convert the first two arguments to Regexp’s. the =~ operator will
give you the index of your Regexp within the main string… this can be
very useful. i make a range between the index of the first marker and
the second (actually the index of the end of the first and the beginning
of second, but you get the idea,) and iterate through each index of the
string between them to create a new string to be replaced, and then use #sub! (or #gsub! if global is true) to replace it with the 3rd argument
of the method.
i’m sure that other folks have come up with better ways to do this as
well… show us what you’re working with!
Thank you all for the response.
I will try to elaborate on this. All I want is to parse an xml file
which contains special characters. But my parser fails because it cannot
open the file correctly (because of the special characters). So I
decided to write the file contents to a new file by removing the special
characters and then parse it. My sample input file(input.xml) with
special characters is as follows:
But the above code replaces the “…” content all
together and my output.xml file is:
My problem is solved as I am not using the message tag in my parser. But
ideally I want to remove only the content between the message tag
without
removing the tag all together. If anyone knows how to do it(preferably
in a single line) please share it with me.
This sample is not like your original example, which wasn’t even
valid XML. However, if you’re working with XML you shouldn’t be
wasting time with any regex-based approach. Use nokogiri, which
can parse the above example just fine, and with which you can
easily accomplish your goal.
This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.