Sgub stretching over several lines

jandot · August 5, 2007, 5:23am

Hi,

I am trying to do a in a replace a long (multiple line) string:

string = string.gsub(/<h3
class="field-label>audience</h3>
<div class="field-items>
<div class="field-item>/, ‘’)

It somehow doesn’t seem to work.

I would like to know how to use a wildcard like ‘*’, ‘%’ or ‘…’ like
below:

string = string.gsub(/<h … tem>/, ‘’)

Thanks!
Ask

jandot · August 5, 2007, 9:41am

Jan A. wrote:

Hi,

I am trying to do a in a replace a long (multiple line) string:

string = string.gsub(/<h3
class="field-label>audience</h3>
<div class="field-items>
<div class="field-item>/, ‘’)

It somehow doesn’t seem to work.

I would like to know how to use a wildcard like ‘*’, ‘%’ or ‘…’ like
below:

string = string.gsub(/<h … tem>/, ‘’)

Thanks!
Ask

string = string.gsub(/<h.*tem>/, ‘’)

.*

the . regexp wildcard means, ‘any’ character (space, symbol, letter)

the * regexp wildcard is an operator saying: any regexp match before

me can appear 0 or more times:

i.e, you get whatever character you want, however many times you want
it, between the ‘<h’ string and the ‘tem>’ string.
hth

happy sunday btw

jandot · August 5, 2007, 10:53am

On 5 Aug 2007, at 16:41, Shai R. wrote:

It somehow doesn’t seem to work.
string = string.gsub(/<h.*tem>/, ‘’)

happy sunday btw

Posted via http://www.ruby-forum.com/.

.* won’t match over multiple lines without the m modifier on the
RegExp, which I think is the OP’s problem:

irb(main):021:0> string = “Hi\nJan\nAsk”
=> “Hi\nJan\nAsk”
irb(main):022:0> string.gsub(/Hi.*Ask/,‘Hi Jane Ask’)
=> “Hi\nJan\nAsk”
irb(main):023:0> string.gsub(/Hi.*Ask/m,‘Hi Jane Ask’)
=> “Hi Jane Ask”

Alex G.

Bioinformatics Center
Kyoto University

jandot · August 5, 2007, 11:08am

Alex G. wrote:

.* won’t match over multiple lines without the m modifier on the
RegExp, which I think is the OP’s problem:

That can’t be the OP’s problem since the OP doesn’t actually use .* in
his
regexp (or any other kind of wildcard). He was asking how to use
wildcards
so he could simplify his regexp (and make it work).
To know why his original regexp didn’t work, we’d have to see the string
it’s
supposed to match, I suppose.

jandot · August 6, 2007, 3:12am

Alex & Sebastian,

Thanks for taking the time to reply. The string.gsub(/start.*end/m,
‘some_value’) did indeed help, but I am afraid my problem is a bit more
complicated.

I am basically trying to cleanup a long xml file. A typical part of the
string looks like this:

<h3 class="field-label">audience</h3>

      &lt;div class=&quot;field-item&quot;&gt;Public&lt;/div&gt;

  &lt;/div&gt;

</div>

<div class="field field-type-text field-field-creator">
<h3 class="field-label">creator</h3>
<div class="field-items">
<div class="field-item">Tom Jones</div>
</div>
</div>

I am trying to format it like this:
Public
Tom Jones

So the problem is that the values in the xml change throughout the
string, so I cannot do a pattern match for them directly. Any ideas
would be hugely appreciated!

Jan

jandot · August 6, 2007, 4:20am

Alex G. wrote:

On 6 Aug 2007, at 10:12, Jan A. wrote:

Posted via http://www.ruby-forum.com/.
Without knowing the whole problem it is difficult to say what the
best solution is, but for the string you post above, I would clean it
up and parse with something like Hpricot:

Thanks, I will have a try.

By the way, I see you are in Kyoto. I am studying at Tsukuba University
(about an hour from Tokyo), so if you come to the big city, I owe you a
beer!

jandot · August 6, 2007, 4:09am

On 6 Aug 2007, at 10:12, Jan A. wrote:

Posted via http://www.ruby-forum.com/.
Without knowing the whole problem it is difficult to say what the
best solution is, but for the string you post above, I would clean it
up and parse with something like Hpricot:

require ‘rubygems’
require ‘hpricot’

string = DATA.read #read in string

string.gsub!(/</,‘<’) #Convert lt and gt symbols to real <>
string.gsub!(/>/,‘>’)
string.gsub!(/"/,‘"’) #Put in quotes

doc = Hpricot(string) #Parse with Hpricot

fields = [‘audience’,‘creator’] #Create array of ‘fields’ to extract

fields.each do |f| #For each field…
el = doc.search(“//div[@class=‘field field-type-text field-field-#
{f}’]”) #…find appropriate divs
el.each do |e| # for each field div…
puts “<#{f}>” + e.at(“//div[@class=‘field-item’]”).inner_html +
“</#{f}>” #print data
end
end

END
<div class="field field-type-text field-field-audience">

<h3 class="field-label">audience</h3>

       &lt;div class=&quot;field-item&quot;&gt;Public&lt;/div&gt;

   &lt;/div&gt;

</div>

<div class="field field-type-text field-field-creator">
<h3 class="field-label">creator</h3>
<div class="field-items">
<div class="field-item">Tom Jones</div>
</div>
</div>

Alex G.

Bioinformatics Center
Kyoto University