Please enjoy a succulent, new Hpricot. A bit faster, some Ruby 1.9 support, and assorted fixes. gem install hpricot --source http://code.whytheluckystiff.net It should show up at Rubyforge in a bit. I'm sure you're wondering what's the reason for Hpricot updates, in the face of heated competition from the Nokogiri and LibXML libraries. Remember that Hpricot has no dependencies and is smaller than either of those libs. Hpricot uses its own Ragel-based parser, so you have the freedom to hack the parser itself, the code is dwarven by comparison. Best of all, Hpricot has run on JRuby in the past. And I am in the process of merging some IronRuby code[1] and porting 0.7 to JRuby. This means your code will run on a variety of Ruby platforms without alteration. That alone makes it worthwhile, wouldn't you agree? Clearly, the benchmarks you see on Ruby Inside are skewed to favor Nokogiri. They parse XML through Hpricot without using Hpricot.XML(), which is not only wrong, but puts XML through needless HTML cleanup operations. I am sure that Hpricot 0.7 still fares slower on large documents. However, for instance, try testing a large amount of small documents (a much more common scenario) with this latest version. You have to question a benchmark that is entirely based on two XML documents. What about HTML fix ups? What about various platforms and CPUs? Why not treat Hpricot fairly and use it properly in the benchmarks? It reeks of something. _why [1] http://github.com/nrk/ironruby-hpricot/tree/master
on 2009-03-17 19:12
on 2009-03-17 19:53
_why <why@ruby-lang.org> wrote: > I'm sure you're wondering what's the reason for Hpricot updates, in > the face of heated competition from the Nokogiri and LibXML > libraries. Remember that Hpricot has no dependencies and is smaller > than either of those libs. Hpricot uses its own Ragel-based > parser, so you have the freedom to hack the parser itself, the code > is dwarven by comparison. Also, isn't Hpricot more accepting of skanky HTML? m.
on 2009-03-17 19:57
Firstly, major props, and keep up the good work... _why wrote: > You have to question a benchmark that is entirely based on two XML > documents. What about HTML fix ups? What about various platforms > and CPUs? Why not treat Hpricot fairly and use it properly in the > benchmarks? It reeks of something. Here's what I use N for: //form[ ./descendant::fieldset[ ./descendant::legend and ./descendant::li[ ./descendant::label and ./descendant::input ] ] ] I generate that from some N::HTML::Builder code, form{ fieldset { etc } }, which turns into a DOM containing <form><fieldset> etc </fieldset></form>. The goal is an assertion like this: assert_xhtml do h2 'Sales' select! :size => SaleController::LIST_SIZE do option names[1] option names[0] end end The point is to match an example HTML to a target HTML. I first tried it by walking that object model myself, recursing thru all DOM children to find the ones that match. However, as the recursion got more complex, I was "adding epicycles" to the code. I backed off and rewrote, by first converting all the example HTML into one jiy-normous XPath, shown above. I have to do it like this because the example HTML could contain _anything_, and I need the query to run fast and absolutely stable. My assert_xhtml should not fail if the target code has the correct HTML subset - or vice versa. I can't do that anywhere except LibXML, and I need to keep that easy to install. And, in the grand scheme of things, I don't think _you_ have room to complain about your libraries' adoption rates!
on 2009-03-17 20:00
matt neuburg wrote:
> Also, isn't Hpricot more accepting of skanky HTML? m.
Yeah, and
A> that can sometimes slow it down!
B> we don't have any in my shop...
tidy -asxhtml -i -wrap 130 -m file.html
on 2009-03-17 20:05
On Mar 17, 2009, at 11:47 , matt neuburg wrote: > _why <why@ruby-lang.org> wrote: > >> I'm sure you're wondering what's the reason for Hpricot updates, in >> the face of heated competition from the Nokogiri and LibXML >> libraries. Remember that Hpricot has no dependencies and is smaller >> than either of those libs. Hpricot uses its own Ragel-based >> parser, so you have the freedom to hack the parser itself, the code >> is dwarven by comparison. > > Also, isn't Hpricot more accepting of skanky HTML? no. I've had a bug open for years on hpricot because it couldn't deal with the relatively simple forms on the trackers on rubyforge.org. nokogiri dealt with it perfectly and since mechanized migrated from hpricot to nokogiri I've had fewer issues overall. I should reemphasize... YEARS. Even the bug tracker has since disappeared. This is where nokogiri really shines IMBO(*). *) in my _biased_ opinion. I work/hang out with aaron patterson on a regular basis. That said, he fixes bugs I (and others--I watch) report in a timely basis.
on 2009-03-17 20:11
On Mar 17, 2009, at 11:08 , _why wrote: > You have to question a benchmark that is entirely based on two XML > documents. What about HTML fix ups? What about various platforms > and CPUs? Why not treat Hpricot fairly and use it properly in the > benchmarks? It reeks of something. You _do_ have to question it (as you should question all benchmarks, really)... But that question should come in the form of a bug report, or a patch. To do otherwise... reeks of something.
on 2009-03-17 21:42
On Tue, Mar 17, 2009 at 19:08, _why <why@ruby-lang.org> wrote: > Best of all, Hpricot has run on JRuby in the past. And I am in the > process of merging some IronRuby code[1] and porting 0.7 to It seems like my port of Hpricot to IronRuby did not go unnoticed despite having kept quiet about it so far :-) By the way, porting 0.7 to IronRuby is on my radar: I am just not sure about how long this will take (I am pretty busy as of lately) but staying up to date with the current latest version of Hpricot is indeed something I want to achieve. PS: thanks for this new release of Hpricot.
on 2009-03-17 21:50
On Mar 17, 2009, at 11:08 AM, _why wrote: > You have to question a benchmark that is entirely based on two XML > documents. What about HTML fix ups? What about various platforms > and CPUs? Why not treat Hpricot fairly and use it properly in the > benchmarks? It reeks of something. Don't be an ass. Code (and benchmark results) speak much louder than snark. Aaron has put the current benchmarks up on GitHub[1], and I'm sure he'll welcome any patches, additions, or corrections. ~ j. [1] http://github.com/tenderlove/xml_truth
on 2009-03-17 22:59
On Wed, Mar 18, 2009 at 03:08:39AM +0900, _why wrote: > than either of those libs. Hpricot uses its own Ragel-based > Nokogiri. They parse XML through Hpricot without using Hpricot.XML(), > which is not only wrong, but puts XML through needless HTML cleanup > operations. I am sure that Hpricot 0.7 still fares slower on large > documents. However, for instance, try testing a large amount of > small documents (a much more common scenario) with this latest > version. Thank you for pointing out my mistakes. The repository[1] is public in order to keep myself honest. Patches are welcome. > You have to question a benchmark that is entirely based on two XML > documents. What about HTML fix ups? What about various platforms > and CPUs? Why not treat Hpricot fairly and use it properly in the > benchmarks? It reeks of something. HTML fix ups will be tested as well. So will CSS searches, XPath searches, memory usage, and many other things. As I said[2], these benchmarks are not complete. If you're worried about being treated fairly, fork my repository and write tests. [1] https://github.com/tenderlove/xml_truth/tree [2] http://www.rubyinside.com/ruby-xml-performance-ben...
on 2009-03-18 00:30
On Wed, Mar 18, 2009 at 06:56:19AM +0900, Aaron Patterson wrote: > HTML fix ups will be tested as well. So will CSS searches, XPath > searches, memory usage, and many other things. As I said[2], these benchmarks > are not complete. If you're worried about being treated fairly, fork my > repository and write tests. No no, don't be silly, I'd much rather complain and be a sore loser. I insist. Look, I think I'd just rather see the benchmarks kept up by a third party who has nothing to gain and can show a more nuanced view of the scene. I really wish I could drop Hpricot (as RubyfulSoup did,) but I think it has its strengths. Let me ask you this. You're neck and neck with libxml-ruby. The bulk of your time is spent in the exact same HTML parser as libxml-ruby. Why the hyperfocus on benchmarks and declaring yourselves winners? You're never going to be too far off from their speed. So, I mean, it strikes me as adversarial and needless, if your library quality and bug fixing are of the sort that Ryan David has just touted. _why
on 2009-03-18 00:52
On Wed, Mar 18, 2009 at 08:26:38AM +0900, _why wrote: > third party who has nothing to gain and can show a more nuanced > view of the scene. I really wish I could drop Hpricot (as > RubyfulSoup did,) but I think it has its strengths. I agree, but who will write them? So far, we only have either poorly written ones, or speculation. Neither are good. I figured if I wrote these, put it on github, I could get other people to do the work. > Let me ask you this. You're neck and neck with libxml-ruby. The > bulk of your time is spent in the exact same HTML parser as > libxml-ruby. Why the hyperfocus on benchmarks and declaring > yourselves winners? You're never going to be too far off from > their speed. So, I mean, it strikes me as adversarial and needless, > if your library quality and bug fixing are of the sort that Ryan > David has just touted. I'm not sure that 10-20% difference is neck and neck. I actually found this result to be a surprise. I thought nokogiri would be slower. In fact, I am sure I will find parts that are slower. Once I do, I know where I can improve. I don't understand why you would think this is adversarial. As I said, these benchmarks are not finished. I am merely trying to collect data, and I want no emotions involved. I made my tests public so that hopefully I can remain unbiased. If it seems unfair or incorrect, tell me so. It won't hurt my feelings. My goal is to learn, and to write better software.
on 2009-03-18 03:25
_why wrote: > and CPUs? Why not treat Hpricot fairly and use it properly in the > benchmarks? It reeks of something. Welcome to my personal hell. - Charlie
on 2009-03-18 04:06
[snip the yak] We're missing you man. Forget the fruit. Just hang out with us mortals here a little. :) All the best, Sean
on 2009-03-18 14:04
On Tue, Mar 17, 2009 at 10:21 PM, Charles Oliver Nutter <charles.nutter@sun.com> wrote: >> You have to question a benchmark that is entirely based on two XML >> documents. What about HTML fix ups? What about various platforms >> and CPUs? Why not treat Hpricot fairly and use it properly in the >> benchmarks? It reeks of something. > > Welcome to my personal hell. Ironic that Peter just posted a positive note about new JRuby benchmarks ;-) http://rubyflow.com/items/1913
on 2009-03-19 06:57
John Wells wrote: > On Tue, Mar 17, 2009 at 10:21 PM, Charles Oliver Nutter >> Welcome to my personal hell. > > Ironic that Peter just posted a positive note about new JRuby benchmarks ;-) > > http://rubyflow.com/items/1913 I don't mind benchmarks as much as the constant cat-and-mouse game we have to play. Ultimately most of the microbenchmarks published are meaningless, but we have to spend a lot of time flexing that muscle to remain a contender. It's tiring :( - Charlie
on 2009-03-19 21:58
_why wrote: > Please enjoy a succulent, new Hpricot. A bit faster, some Ruby 1.9 > support, and assorted fixes. > > gem install hpricot --source http://code.whytheluckystiff.net > > It should show up at Rubyforge in a bit. ..... i am trying to install this gem : powerbook-g4-15-de-villa:/opt/local/bin villa$ sudo gem install hpricot --source http://code.whytheluckystiff.net Building native extensions. This could take a while... Successfully installed hpricot-0.7 1 gem installed Installing ri documentation for hpricot-0.7... Installing RDoc documentation for hpricot-0.7... powerbook-g4-15-de-villa:/opt/local/bin villa$ irb irb(main):001:0> require 'rubygems' => true irb(main):002:0> require 'hpricot' LoadError: Failed to load /usr/local/lib/ruby/gems/1.8/gems/hpricot-0.7/lib/hpricot_scan.bundle from /usr/local/lib/ruby/gems/1.8/gems/hpricot-0.7/lib/hpricot_scan.bundle from /usr/local/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:27:in `require' from /usr/local/lib/ruby/gems/1.8/gems/hpricot-0.7/lib/hpricot.rb:20 from /usr/local/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:32:in `gem_original_require' from /usr/local/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:32:in `require' from (irb):2 Does anybody know where is the error ? this is a powerbook g4 and tiger . thanks
on 2009-03-20 10:48
> and since mechanized migrated from hpricot > to nokogiri I've had fewer issues overall. I have had issues after mechanize migrated to nokogiri. In fact I am using the older mechanize without the dependency on nokogiri, until I am able to install nokogiri without a problem. (PS: For the record, I never use rubygems and never will for various reason, most importantly because I do not need and do not want automatic dependency handling without me controlling it, so a part of this issue is surely my own doing. But fact remains that the older mechanize at the moment works like a charm for me, whereas the newer mechanize does not work because I can not install nokogiri easily. Just for the record, the error with nokogiri "rake" is: "3) Failure: test_exslt(TestXsltTransforms) [./test/test_xslt_transforms.rb:76]: <"2009-03-20"> expected to be =~ </\d{4}-\d\d-\d\d[-|+]\d\d:\d\d/>. 348 tests, 939 assertions, 3 failures, 0 errors rake aborted!" Trying to use setup.rb on mechanize and nokogiri installs it of course but as expected a later error emerges: "lib/ruby/site_ruby/1.8/nokogiri.rb:6:in `require': no such file to load -- nokogiri/native (LoadError)" So for me the situation is reversed - with hpricot right now I do have less problems than with nokogiri/mechanize.
on 2009-03-20 11:57
Since Mechanize can use either Nokogiri or Hpricot as a backend, it seems like a good idea if neither were an actual dependency. Either that or fork the project, how about Wechanize ;-) But the first option seems the better course, I imagine other backends could be added eventually too, eg. libxml-ruby. T.
on 2009-03-20 14:11
trans wrote: > Since Mechanize can use either Nokogiri or Hpricot as a backend, it > seems like a good idea if neither were an actual dependency. Actually, IMO they should both be alternative dependencies. Which, of course, RubyGems doesn't support. But since Marc doesn't use RubyGems, it should work fine. jwm
on 2009-03-20 15:15
Marc Heiler wrote: > "3) Failure: > test_exslt(TestXsltTransforms) [./test/test_xslt_transforms.rb:76]: > <"2009-03-20"> expected to be =~ > </\d{4}-\d\d-\d\d[-|+]\d\d:\d\d/>. Add it to the do-list!: http://nokogiri.lighthouseapp.com/projects/19607-n...
on 2009-03-20 20:09
On Mar 20, 2009, at 02:45, Marc Heiler wrote: > (PS: For the record, I never use rubygems and never will for various > reason, most importantly because I do not need and do not want > automatic > dependency handling without me controlling it $ gem help install Usage: gem install GEMNAME [GEMNAME ...] [options] -- --build-flags [options] [...] Install/Update Options: [...] --ignore-dependencies Do not install any required dependent gems
on 2009-03-21 21:40
An error when I try to install hpircot : sh-3.2# gem19 install hpricot --source http://code.whytheluckystiff.net Building native extensions. This could take a while... ERROR: Error installing hpricot: ERROR: Failed to build gem native extension. /usr/local/bin/ruby19 extconf.rb install hpricot --source http://code.whytheluckystiff.net checking for main() in -lc... yes creating Makefile make gcc -I. -I/usr/local/include/ruby19-1.9.1/i386-darwin9.6.0 -I/usr/ local/include/ruby19-1.9.1/ruby/backward -I/usr/local/include/ ruby19-1.9.1 -I. -D_XOPEN_SOURCE -D_DARWIN_C_SOURCE -fno-common -O2 -g -Wall -Wno-parentheses -fno-common -pipe -fno-common -o hpricot_css.o -c hpricot_css.c hpricot_css.c: In function ‘hpricot_css’: hpricot_css.c:3399: warning: comparison between pointer and integer hpricot_css.c:3399: warning: ‘eof’ is used uninitialized in this function hpricot_css.rl:92: warning: ‘aps’ may be used uninitialized in this function gcc -I. -I/usr/local/include/ruby19-1.9.1/i386-darwin9.6.0 -I/usr/ local/include/ruby19-1.9.1/ruby/backward -I/usr/local/include/ ruby19-1.9.1 -I. -D_XOPEN_SOURCE -D_DARWIN_C_SOURCE -fno-common -O2 -g -Wall -Wno-parentheses -fno-common -pipe -fno-common -o hpricot_scan.o -c hpricot_scan.c hpricot_scan.rl: In function ‘our_rb_hash_lookup’: hpricot_scan.rl:162: error: ‘struct RHash’ has no member named ‘tbl’ make: *** [hpricot_scan.o] Error 1 Gem files will remain installed in /usr/local/lib/ruby/gems/1.9.1/gems/ hpricot-0.7 for inspection. Results logged to /usr/local/lib/ruby/gems/1.9.1/gems/hpricot-0.7/ext/ hpricot_scan/gem_make.out Some help ? Thanks
on 2009-03-21 22:46
le 21/03/2009 21:39, Hiro nous a dit: > creating Makefile > > Have you checked if latest hpricot version is 1.9 compatible ? May be it is not...
on 2009-03-22 04:55
Paganoni wrote: >> sh-3.2# gem19 install hpricot --source http://code.whytheluckystiff.net >> Building native extensions. This could take a while... >> ERROR: Error installing hpricot: > Have you checked if latest hpricot version is 1.9 compatible ? May be it > is not... I got the exact same result, with Ubuntu's out-of-the-box Ruby 1.9.0...
Please log in before posting. Registration is free and takes only a minute.
Existing account
(Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
Log in with Google account | Log in with Yahoo account
No account? Register here.