Hi,
I am doing string search is one html file usign ruby.
If the seach sting is htmlentities means I have not match that word.
How can i do it. Please any one help me…
regards,
S.Sangeetha.
Hi,
I am doing string search is one html file usign ruby.
If the seach sting is htmlentities means I have not match that word.
How can i do it. Please any one help me…
regards,
S.Sangeetha.
On 7/23/07, geetha [email protected] wrote:
Hi,
I am doing string search is one html file usign ruby.
If the seach sting is htmlentities means I have not match that word.
How can i do it. Please any one help me…regards,
S.Sangeetha.
We might be able to help you better if you post the data and what you
expect to get out from it exactly.
Robert
On 7/23/07, Robert D. [email protected] wrote:
expect to get out from it exactly.
Robert
–
Robert:
If search string has html entities, then do not proceed with search.
Well, its very hard to define if query string has HTML entities or
not? For example, do you consider following string has HTML entities?
b = “hello world and so what; and < and there we go >”
dunno yes and no, but if your answer is yes, and string b
HAS HTML
entities then:
require ‘cgi’
escaped_html = CGI::escapeHTML(b)
if escaped_html != b
end
if you want a strict validation of HTML tags, and whether query is a
valid HTML, then hpricot may help.
–
Let them talk of their oriental summer climes of everlasting
conservatories; give me the privilege of making my own summer with my
own coals.
On 7/23/07, hemant [email protected] wrote:
We might be able to help you better if you post the data and what you
not? For example, do you consider following string has HTML entities?string contains html entities
end
if you want a strict validation of HTML tags, and whether query is a
valid HTML, then hpricot may help.
Some tips on asking questions:
–
Let them talk of their oriental summer climes of everlasting
conservatories; give me the privilege of making my own summer with my
own coals.
On 7/23/07, Alex Y. [email protected] wrote:
Well, its very hard to define if query string has HTML entities or
not?
No it’s not…
Honestly, its up to user. Unless we are talking about valid XHTML,
which is definitely defined.
irb(main):002:0> re = REXML::Text::REFERENCE
I thought about this, when I was posting that response, but somehow i
felt user is not looking for valid HTML, but just if it contains HTML
entities or not?
Hpricot is certainly one tool you should consider.
also Rexml and Scrubyt.
Scrubyt is more for web-scraping but if you can scrape it, you can
remove it too.
hemant wrote:
We might be able to help you better if you post the data and what you
not?
No it’s not…
For example, do you consider following string has HTML entities?
b = “hello world and so what; and < and there we go >”
dunno yes and no, but if your answer is yes,
Then you’d be wrong.
irb(main):001:0> require ‘rexml/text’
=> true
irb(main):002:0> re = REXML::Text::REFERENCE
=> /(?:&([\w:][-\w\d.:]*);|&#\d+;|&#x[0-9a-fA-F]+;)/
irb(main):003:0> “this & that” =~ re
=> 5
irb(main):004:0> “hello world and so what; and < and there we go >” =~
re
=> nil
Admittedly I’m not scanning for all defined HTML entities, just for
valid XML entities, but given that one’s a superset of the other, and
undefined entity references probably shouldn’t occur within a valid HTML
document anyway, it’s good enough for most purposes…
This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.
Sponsor our Newsletter | Privacy Policy | Terms of Service | Remote Ruby Jobs