Hello,
I have a problem while parsing an RSS file. I try to open a URL via
open-uri and it usually works fine, but with the RSS URLs from ccMixter
I get a parse error. It’s a bit strange because if i download the file
and try to open it, it works fine.
I tried:
rss =
RSS::Parser.parse(“ccMixter (remix,editorial_pick)”,false)
And got:
RSS::NotWellFormedError: This is not well formed XML
Missing end tag for ‘html’ (got “head”)
Line:
Position:
Last 80 unconsumed characters:
from /usr/lib/ruby/1.8/rss/rexmlparser.rb:24:in _parse' from /usr/lib/ruby/1.8/rss/parser.rb:163:in
parse’
from /usr/lib/ruby/1.8/rss/parser.rb:78:in `parse’
from (irb):43
If i save the file and try to open it, it works fine:
rss = RSS::Parser.parse(“query”,false)
Imho there should be no difference between open a local file or an URL.
Thanks for all the help I got the last days from this list,
Patrick
Hi,
In [email protected]
“REXML/RSS parse error” on Thu, 7 Dec 2006 19:45:53 +0900,
Patrick P. [email protected] wrote:
RSS::NotWellFormedError: This is not well formed XML
Missing end tag for ‘html’ (got “head”)
I got some garbages after RSS 2.0:
% ruby -r open-uri -e ‘puts
open(“ccMixter (remix,editorial_pick)”).read’
| tail -n 25
“/web/ccmixter/www/cclib/cc-util.php”(205): Cannot modify header
information - headers already sent by (output started at
/web/ccmixter/www/cclib/cc-feed.php:432) [2006-12-07 07:10
am][138.243.129.4][/media/api/query?score=400&sinceu=1157536651&limit=25&tags=remix+editorial_pick&rand=1&format=rss]
body {
font-size: 11px;
font-family: Verdana, sans-serif;
background-color: #F99;
margin: 4%;
text-align: center;
}
wups, ccMixter is experiencing technical difficulties...
If you were in the middle of an upload or posting a message it
probably worked OK
but you should click here to get back to the
site's home page or
use your browser's BACK button to return to the site and make
sure.
The admins have been notified of the problem and will look into
it very shortly.
Thanks,
Kouhei S. schrieb:
I got some garbages after RSS 2.0:
Thank you, I hadn’t seen it. I’ve written an e-mail to them, but the
most RSS reader are able to parse this malicious file. Do you know any
way to force the parser to read this file. For RSS It would be ok, to
stop parsing after the closing RSS tag.
Thanks,
Patrick
Hi,
In [email protected]
“Re: REXML/RSS parse error” on Thu, 7 Dec 2006 23:56:38 +0900,
Patrick P. [email protected] wrote:
most RSS reader are able to parse this malicious file. Do you know any
way to force the parser to read this file. For RSS It would be ok, to
stop parsing after the closing RSS tag.
What about gsub(/</rss>.*\z/m, ‘’)?
Thanks,
Kouhei S. schrieb:
What about gsub(/</rss>.*\z/m, ‘’)?
Yes, that works very well :-). I’m very happy now g. Thank you very
much,
Patrick