[ask]How to remove HTML part of a text

inittial_d · December 25, 2009, 2:57pm

i’m using Hpricot to parsing html and my output is:

GrÃ¶ÃŸe: 37 - 54
Maßtabelle

does anyone knows how to get the string “GrÃ¶ÃŸe: 37 - 54” and remove the
rest of that string ?

thanks

inittial_d · December 25, 2009, 4:13pm

sorry for triple post…problem solved, i used nokogiri instead of
hpricot

inittial_d · December 25, 2009, 3:40pm

sorry for double posting, it seems there is no edit post feature…

I have a problem with HTML parsing issue. I’ll try to explain my problem
as clear as I can, and I hope someone can help me with this.

I’ve been given a task to fetch a specific data from HTML page. I’m
planning to use hpricot plugin to do this.
It’s an online shop page, and I have to fetch cloth size information.

The product information part of the page can be in either of these 2
formats:

... Some informations ...

Available in:

... (The data I want to fetch) ...

OR

... Some informations ...

... Content ...	Available in:
... Content ...	... (The data I want to fetch) ...

The clue is: The row whose data I want to fetch, is always preceeded by
a row containing a string “Available in”.
And I want to fetch NOT the content of the row, BUT the content of the
last cell (

) contained inside the row.

It’s complex, and I have no idea on what to do here. Can someone help me
with this?
Thanks for the concern.

PS: The table snippet I post above may be contained inside another
table.
Apparently, the online shop use tables to do page formatting…

inittial_d · December 26, 2009, 4:25am

Darmanto Lie wrote:

thanks
“Gradhbbee: 37 - 54
”[ /^(.?)</, 1 ]
==>“Gradhbbee: 37 - 54”
“Gradhbbee: 37 - 54
”[ /^(.?)(?=<)/]
==>“Gradhbbee: 37 - 54”
“Gradhbbee: 37 - 54
”[ /^[^<]*/]
==>“Gradhbbee: 37 - 54”