Hi Carlos,
Carlos wrote:
/(?:(1).(3)|(1)).(5)/
(The ‘1’ will come either on the first or third array position, you’ll have
to take care of that.)
Yes, you understand exactly what my problem is.
Actually I guessed it as you said even if I couldn’t explain it as well
as you did.
The solution I found was using 2 regexes.
First, I try to find a match assuming “three” is there.
If it fails, I try to find a match without “three”.
This solved my problem.
But I wanted to know that if there’s a one-shot solution.
This is the actual problem, just in case someone wants to know.
html = <<END
2004 Used
<a name="210819526" href="210819526.html">BMW 325Ci
Coupe</a><br />
</h5></td>
<td class="mileage">
<span class="body20">38,604<br /></span><span
class=“body30”>Mileage
$24,995
<br />
</span>
<span class="body30">Price</span>
</td>
<td class="distanceFromZip">
<div class="zip">
<span class="body20">0 mi<br /></span><span
class=“body30”>from ZIP
<td class="productTileCell" rowspan="2" valign="top">
<div class="srlProductContainer">
</div>
</td>
<a href=210819526.html><img
src=“http://images.autotrader.com/images/2006/10/16/210/819/1092478286.210819526.IM1.MAIN.60x45_A.60x45.jpg”
border=“0” bordercolor=“#000000” width=“60” height=“45”>
<div class="body40" style="padding-bottom:3px">
<img
src=“http://www.autotrader.com/img/fyc/icn_camera_17x17.gif”
alt=“Actual Photo Available” width=“17” height=“17” border=“0”
/> 9 Photos
<img src="http://www.autotrader.com/img/blank_dot.gif"
width=“60” height=“1” />
<p class="color body20">Color - Mystic Blue
Metallic
<p class="description">Dark Blue/Beige, Premium Pkg,
Xenon Light, Single Compact Disc, Dual Power Seats, Memory Seat, Still
under Free BMW Maintenance and 4yr/50k Factory…
<p class="vin">VIN WBABV13454JT20104</p>
<div class="body40" style="padding-top:5px;"><a
name=“210819526” href=“210819526.html”>View Car Details
</div></td>
<td> </td>
<td valign="top" class="right body30">
<p class="dealername">
<a name="210819526" href="210819526.html">null</a>
<br />
</p>
<br />
</td>
END
def parse_row row
m = row.scan(/.+?
(\d{4}) Used</h5>.+?
.+?<a name="\d+"
href="(\d+.html)">(.+?)</a><br />.+?</h5>.+?<span
class="body20">([0-9,]+)<br /></span><span
class="body30">Mileage</span>.+?($[0-9,]+).+?(http://[^"]+?.jpg).+?Color
- (.+?)</p>/m)
if m[0].nil?
m = row.scan(/.+?(\d{4}) Used</h5>.+?
.+?<a name="\d+"
href="(\d+.html)">(.+?)</a><br />.+?</h5>.+?<span
class="body20">([0-9,]+)<br /></span><span
class="body30">Mileage</span>.+?($[0-9,]+).+?(http://[^"]+?.jpg)?.+?Color
- (.+?)</p>/m)
end
m[0]
end
p parse_row(html)
Sorry about the messy code.
Thanks.
Sam