Hi All,
I am trying to parse out a list of elements from a set of xml file
which match a given regular expression. I am sure there is probably a
way to do this using an xml parsing library, but I thought it might
be just as easy to do so with regular expressions.
My thought was to do the following:
Iterate through a set of files in a directory.
Search each file for a set of lines which match a given regular
expression.
Add the capture group in each match to an array.
Sort the array and remove any duplicate values
print the results.
Here are the steps I have tried in building my script:
First, I tested to make sure my regular expression actually matched
against the pattern I was seeking. This seemed to work as expected.
regexp = Regexp.new(/<Font-family codeSet="\w*" fontId="\d*">(\w*)
</Font-family>/m)
string = %q(Helvetica</Font-
family>)
if string =~ regexp
puts “yes, there is a match. #{$1}”
end
Returns >> yes, there is a match. Helvetica
Then, I tested a different method which would add the matches to an
array. This also seemed to work as expected.
regexp = Regexp.new(/<Font-family codeSet="\w*" fontId="\d*">(\w*)
</Font-family>/m)
string = %q(Helvetica</Font-
family>)
a = regexp.match(string)
puts a[1]
Returns >> Helvitica
Next, I tested opening a file and returning all lines. This seemed to
work as well.
file = File.new(’/Users/donlevan/Desktop/DDRs/Apple Dealer Price
List.xml’)
file.each do |line|
puts line
end
Returns >> <?xml version="1.0" encoding="UTF-16"?>
… end of file
Where I am getting stuck is in the next code fragment, in which I am
testing each line to see if there is a match. There should be as the
string I used above for testing was pulled directly from one line of
the file. Unfortunately, I get an error and no -matches.
regexp = Regexp.new(/<Font-family codeSet="\w*" fontId="\d*">(\w*)
</Font-family>/m)
file = File.new(’/Users/donlevan/Desktop/DDRs/Apple Dealer Price
List.xml’)
file.each do |string|
if string =~ regexp
puts “yes, there is a match. #{$1}”
end
end
Returns >>
RubyMate r6354 running Ruby r1.8.6 (/usr/local/bin/ruby)
untitled
/Users/donlevan/Library/Application Support/TextMate/Support/lib/
scriptmate.rb:29: warning: Insecure world writable dir /Users/
donlevan/Library/Application Support in PATH, mode 040706
Program exited.
I would be grateful for any assistance. Thanks so much.
Don L.
Brooklyn, New York