Hi folks,
I’m trying to code a ruby script that select the content of a HTML
table in a HTML page.
I used rubular to test my regexp syntax which is
/ <td class=“TabIntCenContenuto”[^>]>(.) /
with rubular the result of my expression is :
Result 1
- 12345678
Result 2
- SAN FRANCESCO DA PAOLA
Result 3
- Via San Francesco Da Paola, 10
Result 4
- 10123
Result 5
- TORINO
etc…
But with my script :
File.open(‘D:/testt/1.txt’, ‘r’) do |filein|
while line = filein.gets
p line if line =~ /<td class=“TabIntCenContenuto”[^>]*>/ … line
=~ //A /
end
fileout.puts p
end
end
I got this result
“<td class=“TabIntCenContenuto”>12345678 \n”
“<td class=“TabIntCenContenuto”>SAN FRANCESCO DA PAOLA </
td>\n”
“<td class=“TabIntCenContenuto”>Via San Francesco Da Paola,
10 \n”
“<td class=“TabIntCenContenuto”>10123 \n”
“<td class=“TabIntCenContenuto” align=“left”>TORINO \n”
I thought the … between 2 “line =~” was like (…) in rubular which
let catch the content ??
Moreover I would like to transform this html code in XML. But I can"t
find an idea how to transform these HTML line in XML.
12345678
But there is no attribut ‘name’ or wathever in the
so making and
match/replace would be difficult ?
…
So, if someone can help me I would be very grateful.
Nice day
|
Any specific reason you can’t use hpricot or other HTML parsers?
Jayanth
I’ll second that. Hpricot is really quite remarkable. It’ll almost
certainly save you days and days of pain. Unless you are doing this
for fun / learning, of course.
On Fri, Jul 25, 2008 at 8:44 AM, Srijayanth S.
[email protected] wrote:
with rubular the result of my expression is :
etc…
end
I thought the … between 2 “line =~” was like (…) in rubular which
Nice day
–
Me, I imagine places that I have never seen / The colored lights in
fountains, blue and green / And I imagine places that I will never go
/ Behind these clouds that hang here dark and low
But it’s there when I’m holding you / There when I’m sleeping too /
There when there’s nothing left of me / Hanging out behind the
burned-out factories / Out of reach but leading me / Into the
beautiful sea
On Fri, Jul 25, 2008 at 8:44 AM, Srijayanth S.
[email protected] wrote:
Any specific reason you can’t use hpricot or other HTML parsers?
I didn’t know this tool for ruby. I used once a parser named tidy but
that’s all. I’ll try now and let you know.
On Jul 25, 9:58 am, Sebastian H. [email protected]
wrote:
So as a summary: It doesn’t do what you thought it did. As a matter of fact it
doesn’t do anything sane. So just keep as far away from it as possible.
So i was wrong … Thanks you for your explaination of this wrong use
of the loop.
Thanks you.
[email protected] wrote:
I thought the … between 2 “line =~” was like (…) in rubular which
let catch the content ??
Generally in ruby … denotes a range. Like starting_value … end_value.
In this case though it denotes a flip flop, which is evil and should
never
ever be used because it makes my head hurt. Here’s what it does though:
some_loop {
do_something if foo … bar
}
This will do nothing until foo is true. When foo is true it will
do_something.
It will then keep doing_something in every iteration of the loop until
bar
becomes true. After bar became true it will stop doing_something until
foo is
true again.
So as a summary: It doesn’t do what you thought it did. As a matter of
fact it
doesn’t do anything sane. So just keep as far away from it as possible.
HTH,
Sebastian
From: [email protected] [mailto:[email protected]]
I’m trying to code a ruby script that select the content of a HTML
table in a HTML page.
I used rubular to test my regexp syntax which is
/ <td class=“TabIntCenContenuto”[^>]>(.) /
the re is fine, you can use that
with rubular the result of my expression is :
Result 1
1. 12345678
Result 2
1. SAN FRANCESCO DA PAOLA
Result 3
1. Via San Francesco Da Paola, 10
Result 4
1. 10123
Result 5
1. TORINO
etc…
But with my script :
File.open(‘D:/testt/1.txt’, ‘r’) do |filein|
while line = filein.gets
p line if line =~ /<td class=“TabIntCenContenuto”[^>]*>/ … line
=~ //A /
end
fileout.puts p
end
end
I got this result
“<td class="TabIntCenContenuto">12345678 \n”
"<td class="TabIntCenContenuto">SAN FRANCESCO DA PAOLA </
td>\n"
"<td class="TabIntCenContenuto">Via San Francesco Da Paola,
10 \n"
“<td class="TabIntCenContenuto">10123 \n”
“<td class="TabIntCenContenuto" align="left">TORINO \n”
you already got it, but you did not capture
sample code & run,
botp@botp-desktop:~$ cat test.rb
File.open(‘test.txt’) do |f|
while line = f.gets
if line=~/<td class=“TabIntCenContenuto”[^>]>(.) /
p $1
end
end
end
botp@botp-desktop:~$ ruby test.rb
“12345678”
“SAN FRANCESCO DA PAOLA”
“Via San Francesco Da Paola,10”
“10123”
“TORINO”
I thought the … between 2 “line =~” was like (…) in rubular which
let catch the content ??
you are making it harder. keep it simple.
Moreover I would like to transform this html code in XML. But I can"t
find an idea how to transform these HTML line in XML.
12345678
But there is no attribut ‘name’ or wathever in the so making and
match/replace would be difficult ?
if the html is nicely formatted, you can loop through the table.
if you want to be sure, try outputting all the data you can capture
first. Then output that again with xml tags inserted.
do not worry. xml, like html, is just text w tags. Manipulating text is
a good learning exercise for ruby.
kind regards -botp