URI::merge bug?

Hi all,

I’m not sure whether this is a bug:

irb(main):013:0> a = URI.parse(“http://www.example.com/foo/bar?a=b”)
=> #<URI::HTTP:0xfdbb6d160 URL:http://www.example.com/foo/bar?a=b>
irb(main):014:0> b = URI.parse(“?a=c”)
=> #<URI::Generic:0xfdbb6b770 URL:?a=c>
irb(main):015:0> puts a.merge(b).to_s
http://www.example.com/foo/?a=c

I’d expect the last line to be “http://www.example.com/foo/bar?a=c” - it
seems to lose the last element of the path. Firefox seems to think do
what I’m expecting, anyway… Can someone more au fait with URI RFC’s
comment?

On 5/18/07, Alex Y. [email protected] wrote:

I’d expect the last line to be “http://www.example.com/foo/bar?a=c” - it
seems to lose the last element of the path. Firefox seems to think do
what I’m expecting, anyway… Can someone more au fait with URI RFC’s
comment?

Hi,

have alook at what “?a=c” parses to, i.e. if it is interpreted as
“/?a=c” that would explain the behaviour.

J.

Alex Y. [email protected] writes:

I’m not sure whether this is a bug:

irb(main):013:0> a = URI.parse(“http://www.example.com/foo/bar?a=b”)
=> #<URI::HTTP:0xfdbb6d160 URL:http://www.example.com/foo/bar?a=b>
irb(main):014:0> b = URI.parse(“?a=c”)
=> #<URI::Generic:0xfdbb6b770 URL:?a=c>
irb(main):015:0> puts a.merge(b).to_s
http://www.example.com/foo/?a=c

I’ll note that although firefox agrees with your expectations, lynx
agrees with the behavior of the uri module.

To understand what the uri module is doing, look at this:

irb(main):001:0> require ‘uri’
=> true
irb(main):002:0> b = URI.parse(“?a=c”)
=> #<URI::Generic:0xfdbccb1d0 URL:?a=c>
irb(main):003:0> b.scheme
=> nil
irb(main):004:0> b.userinfo || b.host || b.port
=> nil
irb(main):005:0> b.path
=> “”
irb(main):006:0> b.query
=> “a=c”
irb(main):007:0> b.fragment
=> nil

That is, the scheme and authority portions of the uri are nil, but
the path is present, as the empty string. When merging an empty path
with the path “/foo/bar” , the uri module comes up with “/foo/”. Not
a totally unreasonable choice.

In fact, this is a bug, but not the one you think. “?a=b” is a
malformed relative URI. You should get a parse error trying to create
that.

According to RFC2396, a relative URI consists of (section 5, near the
bottom of pg. 17):

  relativeURI   = ( net_path | abs_path | rel_path ) [ "?" query ]

  rel_path      = rel_segment [ abs_path ]

  rel_segment   = 1*( unreserved | escaped |
                      ";" | "@" | "&" | "=" | "+" | "$" | "," )

See the 1* part? That means that a relative uri path segment must
consist of at least one character. An empty path segment is illegal.
(Note that uri references that begin with ‘#’ are covered in section 4
of the RFC, and match the rule “URI-reference” rather than the rule
“relativeURI”)

Now, given that the URI module does indeed accept relative URIs like
this, perhaps we should redefine URI merging for these pathological
cases so that the URI module behaves as some particular well-known
browser does:

module URI
class Generic
def merge_like(browser, other)
if !other.absolute? and other.path and other.path.empty? and
not (other.userinfo || other.host || other.port) then
case browser
when :firefox, :netscape
other = other.dup
other.path = self.path
when :ie, :microsoft, :links
other = other.dup
if other.query || other.fragment
other.path = self.path
else
other.path = ‘.’
end
when :lynx
# we’re good already, so we don’t need to do
# this, but let’s pass the real merge function
# valid relative uris anyway, okay?
other = other.dup
if other.query
other.path = ‘.’
else
other.path = self.path
end
else
# Could someone test how opera handles the three links on
# http://snowplow.org/martin/relative_uri_test.html ?
raise “Unhandled browser type #{browser}”
end
end
return merge(other)
end
end
end

Daniel M. [email protected] writes:

In fact, this is a bug, but not the one you think. “?a=b” is a
malformed relative URI. You should get a parse error trying to create
that.

According to RFC2396, a relative URI consists of (section 5, near the
bottom of pg. 17):

You know what? I’m using an old RFC. Everything I said applies to
RFC 2396, but that’s not the current URI RFC. The current one is
RFC 3986, and by that one firefox is doing exactly the right thing.
(So, by the way, is Opera)

So it is a bug in URI, and the bug is “written to an old RFC”.

I don’t know how to explain Internet Explorer’s odd behavior with
regard to a relative uri of “”, but that’s Microsoft for you.

Daniel M. wrote:

RFC 2396, but that’s not the current URI RFC. The current one is
RFC 3986, and by that one firefox is doing exactly the right thing.
(So, by the way, is Opera)

So it is a bug in URI, and the bug is “written to an old RFC”.

I don’t know how to explain Internet Explorer’s odd behavior with
regard to a relative uri of “”, but that’s Microsoft for you.

Thanks - that pretty much confirms what I thought. Would a patch to
bring URI up to RFC 3986 compliance be a huge undertaking?

Nobuyoshi N. [email protected] writes:

Then, the test is wrong too?

Yes, it is. That test case is straight out of RFC 2396, and it is
clearly incorrect given the pseudocode in section 5.2.2 (pages 31-32)
of RFC 3986.

RFC 3986 updates the test cases, including them in section 5.4 (Not
in an appendix as RFC 2396 did) That test case is mentioned, and the
behavior is indeed what it looks like from the pseudocode.

We should probably check that ruby’s uri module passes all the tests
now in RFC 3986. If I have a chance I’ll go do that.

Hi,

At Sat, 19 May 2007 00:52:37 +0900,
Daniel M. wrote in [ruby-talk:252116]:

You know what? I’m using an old RFC. Everything I said applies to
RFC 2396, but that’s not the current URI RFC. The current one is
RFC 3986, and by that one firefox is doing exactly the right thing.
(So, by the way, is Opera)

So it is a bug in URI, and the bug is “written to an old RFC”.

Then, the test is wrong too?

Index: lib/uri/generic.rb

— lib/uri/generic.rb (revision 12297)
+++ lib/uri/generic.rb (working copy)
@@ -632,9 +632,4 @@ module URI
base_path.slice!(i - 1, 2)
end

  •    if base_path.empty?
    
  •      base_path = [''] # keep '/' for root directory
    
  •    else
    
  • base_path.pop
  •    end
    
       # RFC2396, Section 5.2, 6), c)
    

@@ -654,5 +649,10 @@ module URI
end

  •    add_trailer_slash = true
    
  •    add_trailer_slash = !tmp.empty?
    
  •    if base_path.empty?
    
  •      base_path = [''] # keep '/' for root directory
    
  •    elsif add_trailer_slash
    
  • base_path.pop
  •    end
       while x = tmp.shift
         if x == '..' && base_path.size > 1
    

Index: test/uri/test_generic.rb

— test/uri/test_generic.rb (revision 12297)
+++ test/uri/test_generic.rb (working copy)
@@ -297,9 +297,9 @@ class TestGeneric < Test::Unit::TestCase

http://a/b/c/d;p?q

-# ?y = http://a/b/c/?y
+# ?y = http://a/b/c/d;p?y
url = @base_url.merge(‘?y’)
assert_kind_of(URI::HTTP, url)

Daniel M. [email protected] writes:

RFC 3986 updates the test cases, including them in section 5.4 (Not
in an appendix as RFC 2396 did) That test case is mentioned, and the
behavior is indeed what it looks like from the pseudocode.

We should probably check that ruby’s uri module passes all the tests
now in RFC 3986. If I have a chance I’ll go do that.

I’ve converted all the test examples in that section of RFC 3986 into
test code; see below.

And even with Nakada-san’s patch, there’s still a problem or two with
our resolver relative to the RFC 3986 test cases. However, with that
patch it passes everything labeled in the RFC as “Normal Cases”
(section 5.4.1). It still has issues with the abnormal cases of
section 5.4.2

#! /usr/bin/env ruby

This is test only of the merge method based on the cases in the text

of RFC 3986, section 5.4. The test cases were cut-and-pasted

directly to reduce the chance of transcription error.

require ‘test/unit’
require ‘uri’
require ‘enumerator’

module URI
class TestGenericMerge < Test::Unit::TestCase
def setup
@url = ‘http://a/b/c/d;p?q
@base_url = URI.parse(@url)
end
def test_merge
testcases = %w[
“g:h” = “g:h”
“g” = “http://a/b/c/g
“./g” = “http://a/b/c/g
“g/” = “http://a/b/c/g/
“/g” = “http://a/g
“//g” = “http://g
“?y” = “http://a/b/c/d;p?y
“g?y” = “http://a/b/c/g?y
#s” = “http://a/b/c/d;p?q#s
“g#s” = “http://a/b/c/g#s
“g?y#s” = “http://a/b/c/g?y#s
“;x” = “http://a/b/c/;x
“g;x” = “http://a/b/c/g;x
“g;x?y#s” = “http://a/b/c/g;x?y#s
“” = “http://a/b/c/d;p?q
“.” = “http://a/b/c/
“./” = “http://a/b/c/
“…” = “http://a/b/
“…/” = “http://a/b/
“…/g” = “http://a/b/g
“…/…” = “http://a/
“…/…/” = “http://a/
“…/…/g” = “http://a/g

  "../../../g"    =  "http://a/g"
  "../../../../g" =  "http://a/g"

  "/./g"          =  "http://a/g"
  "/../g"         =  "http://a/g"
  "g."            =  "http://a/b/c/g."
  ".g"            =  "http://a/b/c/.g"
  "g.."           =  "http://a/b/c/g.."
  "..g"           =  "http://a/b/c/..g"

  "./../g"        =  "http://a/b/g"
  "./g/."         =  "http://a/b/c/g/"
  "g/./h"         =  "http://a/b/c/g/h"
  "g/../h"        =  "http://a/b/c/h"
  "g;x=1/./y"     =  "http://a/b/c/g;x=1/y"
  "g;x=1/../y"    =  "http://a/b/c/y"

  "g?y/./x"       =  "http://a/b/c/g?y/./x"
  "g?y/../x"      =  "http://a/b/c/g?y/../x"
  "g#s/./x"       =  "http://a/b/c/g#s/./x"
  "g#s/../x"      =  "http://a/b/c/g#s/../x"

  "http:g"        =  "http:g"
        ]
        testcases.each_slice(3) { |rel, eq, expected|
            rel.gsub!(/"/,'')
            expected.gsub!(/"/,'')
            assert_equal(expected, @base_url.merge(rel).to_s, rel)
        }
    end
end

end

END

Hi,

At Sat, 19 May 2007 04:03:13 +0900,
Daniel M. wrote in [ruby-talk:252151]:

I’ve converted all the test examples in that section of RFC 3986 into
test code; see below.

Thank you and sorry to be late.

And even with Nakada-san’s patch, there’s still a problem or two with
our resolver relative to the RFC 3986 test cases. However, with that
patch it passes everything labeled in the RFC as “Normal Cases”
(section 5.4.1). It still has issues with the abnormal cases of
section 5.4.2

URI.parse(“ftp://example.com/pub”).path returns “/pub” in 1.8
while “pub” in 1.9. Is 1.9 correct?

Index: lib/uri/generic.rb

— lib/uri/generic.rb (revision 12858)
+++ lib/uri/generic.rb (working copy)
@@ -617,10 +617,6 @@ module URI

 def merge_path(base, rel)
  •  # RFC2396, Section 5.2, 5)
    
  •  if rel[0] == ?/ #/
    
  •    # RFC2396, Section 5.2, 5)
    
  •    return rel
    
  •  else
    
  •  # RFC2396, Section 5.2, 5)
       # RFC2396, Section 5.2, 6)
       base_path = split_path(base)
    

@@ -632,8 +628,8 @@ module URI
base_path.slice!(i - 1, 2)
end

  •    if base_path.empty?
    
  •      base_path = [''] # keep '/' for root directory
    
  •    else
    
  • base_path.pop
  •  if (first = rel_path.first) and first.empty?
    
  •    base_path.clear
    
  •    rel_path.shift
       end
    

@@ -654,10 +650,15 @@ module URI
end

  •    add_trailer_slash = true
    
  •  add_trailer_slash = !tmp.empty?
    
  •  if base_path.empty?
    
  •    base_path = [''] # keep '/' for root directory
    
  •  elsif add_trailer_slash
    
  •    base_path.pop
    
  •  end
       while x = tmp.shift
    
  •      if x == '..' && base_path.size > 1
    
  •    if x == '..'
           # RFC2396, Section 4
           # a .. or . in an absolute path has no special meaning
    
  •        base_path.pop
    
  •      base_path.pop if base_path.size > 1
         else
           # if x == '..'
    

@@ -676,5 +677,4 @@ module URI
return base_path.join(‘/’)
end

  • end
    private :merge_path

Index: test/uri/test_generic.rb

— test/uri/test_generic.rb (revision 12858)
+++ test/uri/test_generic.rb (working copy)
@@ -1,9 +1,7 @@
require ‘test/unit’
require ‘uri’
+require ‘enumerator’

-module URI

-class TestGeneric < Test::Unit::TestCase
+class URI::TestGeneric < Test::Unit::TestCase
def setup
@url = ‘http://a/b/c/d;p?q
@@ -297,9 +295,9 @@ class TestGeneric < Test::Unit::TestCase

http://a/b/c/d;p?q

-# ?y = http://a/b/c/?y
+# ?y = http://a/b/c/d;p?y
url = @base_url.merge(‘?y’)
assert_kind_of(URI::HTTP, url)

  • assert_equal(‘http://a/b/c/d;p?y’, url.to_s)
  • url = @base_url.route_to(‘http://a/b/c/d;p?y’)
    assert_kind_of(URI::Generic, url)
    assert_equal(‘?y’, url.to_s)
    @@ -453,8 +451,8 @@ class TestGeneric < Test::Unit::TestCase

http://a/b/c/d;p?q

-# /./g = http://a/./g
+# /./g = http://a/g
url = @base_url.merge(‘/./g’)
assert_kind_of(URI::HTTP, url)

  • assert_equal(‘http://a/g’, url.to_s)
    url = @base_url.route_to(‘http://a/./g’)
    assert_kind_of(URI::Generic, url)
    @@ -465,5 +463,5 @@ class TestGeneric < Test::Unit::TestCase
    url = @base_url.merge(‘/…/g’)
    assert_kind_of(URI::HTTP, url)
  • assert_equal(‘http://a/g’, url.to_s)
    url = @base_url.route_to(‘http://a/../g’)
    assert_kind_of(URI::Generic, url)
    @@ -507,8 +505,8 @@ class TestGeneric < Test::Unit::TestCase

http://a/b/c/d;p?q

-# …/…/…/g = http://a/../g
+# …/…/…/g = http://a/g
url = @base_url.merge(‘…/…/…/g’)
assert_kind_of(URI::HTTP, url)

  • assert_equal(‘http://a/g’, url.to_s)
    url = @base_url.route_to(‘http://a/../g’)
    assert_kind_of(URI::Generic, url)
    @@ -520,5 +518,5 @@ class TestGeneric < Test::Unit::TestCase
    url = @base_url.merge(‘…/…/…/…/g’)
    assert_kind_of(URI::HTTP, url)

Hi Nakada,

I am a beginner of Ruby, I also meet this bug which you fixed.
Now my question is how I can get the file you patched?
I have visited the http://svn.ruby-lang.org/repos/ruby/trunk/lib/uri/
But I found the file is not patched.

So could you help me?

Many Thanks.

Nan

2007/7/31, Nobuyoshi N. [email protected]:

Hi,

At Tue, 31 Jul 2007 12:21:33 +0900,
nan wu wrote in [ruby-talk:262574]:

Now my question is how I can get the file you patched?
I have visited the http://svn.ruby-lang.org/repos/ruby/trunk/lib/uri/
But I found the file is not patched.

You can patch with “patch” command, or:
http://www.rubyist.net/~nobu/ruby/uri/generic.rb

Hi

Thanks a lot, it works well. ^O^

Nan

2007/7/31, Nobuyoshi N. [email protected]:

e$B$J$+$@$G$9!#e(B

At Tue, 31 Jul 2007 04:45:12 +0900,
Nobuyoshi N. wrote in [ruby-talk:262512]:

At Sat, 19 May 2007 04:03:13 +0900,
Daniel M. wrote in [ruby-talk:252151]:

I’ve converted all the test examples in that section of RFC 3986 into
test code; see below.

e$B8=:_$Ne(Buri.rbe$B$,e(BRFC 2396e$B%Y!<%9$Ge(BRFC
3986e$B$K=`5r$7$F$$$J$$$H$$$&7oe(B
e$B$G$9$,!"$I$&$7$^$7$g$&$+!#e(B

And even with Nakada-san’s patch, there’s still a problem or two with
our resolver relative to the RFC 3986 test cases. However, with that
patch it passes everything labeled in the RFC as “Normal Cases”
(section 5.4.1). It still has issues with the abnormal cases of
section 5.4.2

URI.parse(“ftp://example.com/pub”).path returns “/pub” in 1.8
while “pub” in 1.9. Is 1.9 correct?

e$B$"$H$3$N7o$b!#e(B

Nobuyoshi N. e$B$5$s$O=q$-$^$7$?e(B:

e$B8=:_$Ne(Buri.rbe$B$,e(BRFC 2396e$B%Y!<%9$Ge(BRFC 3986e$B$K=`5r$7$F$$$J$$$H$$$&7oe(B
e$B$G$9$,!"$I$&$7$^$7$g$&$+!#e(B

e$B$3$N=$@5$re(Br12860e$B$G$7$F$$$?$@$$$F$$$k$H;W$$$^$9!#e(B
ruby_1_8e$B$K$b<h$j9~$s$G$b$i$C$?$[$&$,$h$$$+$J$H;W$&$N$G$9$1$Ie(B
knue$B$5$s!"$I$s$J$b$N$G$7$g$&e(B?

At Thu, 25 Oct 2007 10:00:10 +0900,
akira yamada / やまだあきら wrote:

Nobuyoshi N. さんは書きました:

現在のuri.rbがRFC 2396ベースでRFC 3986ã«æº–æ‹ ã—ã¦ã„ãªã„ã¨ã„ã†ä»¶
ですが、どうしましょうか。

この修正をr12860でしていただいていると思います。
ruby_1_8にも取り込んでもらったほうがよいかなと思うのですけど
knuさん、どんなものでしょう?

 非互換性はあるものの、Firefox 2ã‚„IE 7ã‚‚RFC 3986ã«æº–æ‹ ã—ã¦ã„ã‚‹
ようなので追随しましょう。

 NEWS の Library updates にRFC 3986æº–æ‹ ã®é …ã‚’ã€ Compatibility
issues ã«ä»£è¡¨çš„ãªæŒ™å‹•å¤‰åŒ–ã®ä¾‹ã‚’è¨˜ã—ãŸé …ã‚’è¿½åŠ ã—ã¦ãã ã•ã„ã€‚


/
/__ __ Akinori.org / MUSHA.org
/ ) ) ) ) / FreeBSD.org / Ruby-lang.org
Akinori MUSHA aka / (_ / ( (__( @ iDaemons.org / and.or.jp

“Different eyes see different things,
Different hearts beat on different strings –
But there are times for you and me when all such things agree”

Hi,

In message “Re: URI::merge bug?”
on Tue, 31 Jul 2007 04:45:12 +0900, Nobuyoshi N.
[email protected] writes:

|Index: lib/uri/generic.rb
|===================================================================
|— lib/uri/generic.rb (revision 12858)
|+++ lib/uri/generic.rb (working copy)

Could you commit?

          matz.

RFC 3986 e$B$X$N99?7$K$OBgJQ;?@.$G$9!#D9$$4V$N:n6H$G$7$?$N$G!“e(B
e$B:Y$+$$JQ99$O3P$($F$$$^$;$s$,!”%I%a%$%sL>ItJ,$K$be(B %HH e$B$,e(B
e$B;H$($k$H!"AjBPE*$Je(B URI e$B$N@dBP2=$N>:Y$N$H$3$m$@$H;W$$$^$9!#e(B
e$B%F%9%H$K$O@'Hse(B URI Reference Resolution Examples Testing
e$B$r;H$C$F$/$@$5$$!#e(B

e$B9q:]2=$N$H$3$m$,$b$&>/$7?J$s$@$i@'Hse(B RFC 3987 (IRIs) e$B$K$be(B
e$BD)@o$7$F$$$?$@$-$?$$$H;W$$$^$9!#2?$+$4<ALd!“5?Ld$,$”$j$^$7$?$ie(B
e$B1sN8$J$/$*J9$-$/$@$5$$!#e(B

e$B59$7$/$*4j$$$7$^$9!#e(B Martin.

At 12:59 07/10/25, Akinori MUSHA wrote:

Akinori MUSHA aka / (_ / ( (__( @ iDaemons.org / and.or.jp

“Different eyes see different things,
Different hearts beat on different strings –
But there are times for you and me when all such things agree”

#-#-# Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-# http://www.sw.it.aoyama.ac.jp mailto:[email protected]