Forum: Ruby-core URI.(un)escape deprecated?

86c07aa43c6df798df005edd84ee8b56?d=identicon&s=25 Marc-Andre Lafortune (Guest)
on 2010-04-07 08:45
(Received via mailing list)
Hi.

Can someone point me to the rationale behind deprecating URI.escape
and URI.unescape?

What are we supposed to use instead?

Thanks,

Marc-Andr
947c97a2c119e85989d2ca63135a5b5e?d=identicon&s=25 Roger Pack (Guest)
on 2010-04-08 22:53
(Received via mailing list)
> Can someone point me to the rationale behind deprecating URI.escape
> and URI.unescape?
>
> What are we supposed to use instead?

CGI.escape maybe?
86c07aa43c6df798df005edd84ee8b56?d=identicon&s=25 Marc-Andre Lafortune (Guest)
on 2010-04-08 23:13
(Received via mailing list)
Hi,

On Thu, Apr 8, 2010 at 4:52 PM, Roger Pack <rogerdpack2@gmail.com>
wrote:
> CGI.escape maybe?

They might be close, but they are not the same

$ ruby -r cgi -r uri -e 'p URI.escape("the same?") == CGI.escape("the
same?")'
false

So, Yui, could you please tell us what motivated this change and what
we are supposed to use instead? No reference is given in r24773

Thanks,

Marc-Andr
52505bac1e0f5b3052ed89f63d10292d?d=identicon&s=25 Marcus Rueckert (Guest)
on 2010-04-08 23:14
(Received via mailing list)
On 2010-04-09 05:52:42 +0900, Roger Pack wrote:
> > Can someone point me to the rationale behind deprecating URI.escape
> > and URI.unescape?
> >
> > What are we supposed to use instead?
>
> CGI.escape maybe?

I only see CGI.escapeHTML ... i doubt thats what you want.

    darix
B11f10c4cd9d53970e7be20caa43f940?d=identicon&s=25 Tanaka Akira (Guest)
on 2010-04-08 23:23
(Received via mailing list)
2010/4/7 Marc-Andre Lafortune <ruby-core-mailing-list@marc-andre.ca>:
>
> Can someone point me to the rationale behind deprecating URI.escape
> and URI.unescape?

I think their concept is just wrong.
31ab75f7ddda241830659630746cdd3a?d=identicon&s=25 Austin Ziegler (austin)
on 2010-04-08 23:47
(Received via mailing list)
On Thu, Apr 8, 2010 at 5:22 PM, Tanaka Akira <akr@fsij.org> wrote:
> 2010/4/7 Marc-Andre Lafortune <ruby-core-mailing-list@marc-andre.ca>:
>> Can someone point me to the rationale behind deprecating URI.escape
>> and URI.unescape?
> I think their concept is just wrong.

The concepts may be wrong, but there is a difference between URI
escaping and CGI/HTML escaping. I think that some similar
functionality is still needed.

-austin
9361878d459f1709feec780518946ee5?d=identicon&s=25 NARUSE, Yui (Guest)
on 2010-04-09 03:50
(Received via mailing list)
> So, Yui, could you please tell us what motivated this change and what
> we are supposed to use instead? No reference is given in r24773

This is itroduced by the thread from [ruby-dev:38005]. (sorry for
missing refs)
http://blade.nagaokaut.ac.jp/cgi-bin/vframe.rb/rub...

In the thread we thought what is the use case of URI.encode,
and we concluded there is no use case.

That is because what URI.encode does is escaping other than reserved
and unreserved.
 mark          = "-" | "_" | "." | "!" | "~" | "*" | "'" |
                 "(" | ")"
 unreserved    = alphanum | mark
 reserved      = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" |
                 "$" | "," | "[" | "]"
This means URI.encode escapes characters which shouldn't appear.
But what is expected argument?

Rdoc of URI.encode shows following example:
   enc_uri = URI.escape("http://example.com/?a=\11\15")
   p enc_uri
   # => "http://example.com/?a=%09%0D"

But real implementation should be:
   enc_uri = URI.escape("http://example.com/?a=#{argument}")
   p enc_uri
   # => "http://example.com/?a=%09%0D"

You know, this implementation has a vulnerability that
argument can be "foo&extra=some_command".

So we concluded URI.encode is wrong and deprecated.

> What are we supposed to use instead?

Following will be the candidate.
* CGI.escape
* URI.www_form_encode
* URI.www_form_encode_component

If they are not what you want, please tell me.
B11f10c4cd9d53970e7be20caa43f940?d=identicon&s=25 Tanaka Akira (Guest)
on 2010-04-09 04:10
(Received via mailing list)
2010/4/9 Austin Ziegler <halostatue@gmail.com>:

> The concepts may be wrong, but there is a difference between URI
> escaping and CGI/HTML escaping. I think that some similar
> functionality is still needed.

As far as I know, following methods are available since Ruby 1.8.

* CGI.escape
* ERB::Util.url_encode
* WEBrick::HTTPUtils.escape
* WEBrick::HTTPUtils.escape_form
B11f10c4cd9d53970e7be20caa43f940?d=identicon&s=25 Tanaka Akira (Guest)
on 2010-04-09 04:31
(Received via mailing list)
2010/4/9 NARUSE, Yui <naruse@airemix.jp>:

> * URI.www_form_encode_component

It should be URI.encode_www_form_component.

% ./ruby -rcgi -ruri -rwebrick/httputils -rerb -e '
table = []
(0..255).each {|c|
  s = [c].pack("C")
  e = [
    ERB::Util.url_encode(s),
    CGI.escape(s),
    URI.encode_www_form_component(s),
    WEBrick::HTTPUtils.escape_form(s),
    WEBrick::HTTPUtils.escape(s),
    URI.escape(s),
  ]
  next if e.uniq.length == 1
  p e
  table << e
}
'

["%20", "+",   "+",   "+",   "%20", "%20"]
["%21", "%21", "%21", "!",   "!",   "!"]
["%24", "%24", "%24", "%24", "$",   "$"]
["%26", "%26", "%26", "%26", "&",   "&"]
["%27", "%27", "%27", "'",   "'",   "'"]
["%28", "%28", "%28", "(",   "(",   "("]
["%29", "%29", "%29", ")",   ")",   ")"]
["%2A", "%2A", "*",   "*",   "*",   "*"]
["%2B", "%2B", "%2B", "%2B", "+",   "+"]
["%2C", "%2C", "%2C", "%2C", ",",   ","]
["%2F", "%2F", "%2F", "%2F", "/",   "/"]
["%3A", "%3A", "%3A", "%3A", ":",   ":"]
["%3B", "%3B", "%3B", "%3B", ";",   ";"]
["%3D", "%3D", "%3D", "%3D", "=",   "="]
["%3F", "%3F", "%3F", "%3F", "?",   "?"]
["%40", "%40", "%40", "%40", "@",   "@"]
["%5B", "%5B", "%5B", "%5B", "%5B", "["]
["%5D", "%5D", "%5D", "%5D", "%5D", "]"]
["%7E", "%7E", "%7E", "~",   "~",   "~"]
31ab75f7ddda241830659630746cdd3a?d=identicon&s=25 Austin Ziegler (austin)
on 2010-04-09 06:59
(Received via mailing list)
On Thu, Apr 8, 2010 at 10:31 PM, Tanaka Akira <akr@fsij.org> wrote:
>> * URI.www_form_encode_component
> It should be URI.encode_www_form_component.

That's a little long, IMO, but your test results indicate that it is
the right choice. How new is this method?

I'd much rather see URI.escape/URI.unescape be replaced with methods
that work, backwards compatibility with broken behaviour be damned.

-austin
9361878d459f1709feec780518946ee5?d=identicon&s=25 NARUSE, Yui (Guest)
on 2010-04-09 09:09
(Received via mailing list)
2010/4/9 Austin Ziegler <halostatue@gmail.com>:
> On Thu, Apr 8, 2010 at 10:31 PM, Tanaka Akira <akr@fsij.org> wrote:
>>> * URI.www_form_encode_component
>> It should be URI.encode_www_form_component.
>
> That's a little long, IMO, but your test results indicate that it is
> the right choice. How new is this method?

Yeah, but there is some escape encodings for URI and its parts.
So explicit name is needed in the URI context.

As akr showed in [ruby-core:29373], following are for the same use case:
escaping the data of web forms for a query.
* CGI.escape(s)
* URI.encode_www_form_component(s)
* WEBrick::HTTPUtils.escape_form(s)

They are different in some portions but significant difference is
escaping " " to "+".
* CGI.escape escapes many symbols
* WEBrick::HTTPUtils.escape_form keep symbols
* URI.encode_www_form_component follows HTML5's description

> I'd much rather see URI.escape/URI.unescape be replaced with methods
> that work, backwards compatibility with broken behaviour be damned.

We don't have any consensus about what behavior is correct
for the name URI.escape.
86c07aa43c6df798df005edd84ee8b56?d=identicon&s=25 Marc-Andre Lafortune (Guest)
on 2010-04-09 23:54
(Received via mailing list)
Hi,

Imagine I want to display a url to the user in a human readable format
(say like the status bar of the browser when a user hovers over a
link).

Ruby should have a nice "unescape" for that, and the reverse "escape"
too.

These methods should convert
"http://example.com/?name=Marc%20Andr%C3%A9+42" to
"http://example.com/?name=Marc André+42" and back (exactly what Chrome
does)

As the code below shows, the only correct escaping is URI.escape.
The only correct unescaping is WEBrick::HTTPUtils.unescape and
URI.unescape. Note that WEBrick currently raises an error when
escaping...

So unless there is another sensible way to go from URL to human
readable form and back, I recommend not deprecating URI.{un}escape

rubydev -rcgi -ruri -rwebrick/httputils -rerb -e '
# encoding: utf-8
s="http://example.com/?name=Marc André+42"
puts [ "escape:",
ERB::Util.url_encode(s),
CGI.escape(s),
URI.encode_www_form_component(s),
(WEBrick::HTTPUtils.escape_form(s) rescue "fails"),
(WEBrick::HTTPUtils.escape(s) rescue "fails"),
URI.escape(s),
]

s="http://example.com/?name=Marc%20Andr%C3%A9+42"
puts [ "unescape:",
# no equivalent in ERB??,
CGI.unescape(s),
URI.decode_www_form_component(s),
WEBrick::HTTPUtils.unescape_form(s),
WEBrick::HTTPUtils.unescape(s),
URI.unescape(s),
]
'
9361878d459f1709feec780518946ee5?d=identicon&s=25 NARUSE, Yui (Guest)
on 2010-04-10 10:31
(Received via mailing list)
(2010/04/10 6:54), Marc-Andre Lafortune wrote:
> Imagine I want to display a url to the user in a human readable format
> (say like the status bar of the browser when a user hovers over a
> link).

If it is the use case of URI.unescape, the algorithm of it should follow
web browsers, doesn't it? Try make a link and mouse over following url;
You'll find some of them aren't decoded.
http://www.google.com/search?hl=en&q=%2F%3F%23%25%2B%26

Decoding URI is difficult problem and at least URI.unescape is not
enough.
HTML5/IRI-bis people seem discussing about related this, like
http://lists.w3.org/Archives/Public/public-iri/201...
But if we follow, it must cause breaking compatibility.

So deprecating URI.{,un}escape is needed.

see also:
https://bugzilla.mozilla.org/show_bug.cgi?id=105909
https://bugzilla.mozilla.org/show_bug.cgi?id=497476

> escaping...
If the use case of URI.unescape is to show a human readable format,
why escaping is needed?
86c07aa43c6df798df005edd84ee8b56?d=identicon&s=25 Marc-Andre Lafortune (Guest)
on 2010-04-10 17:03
(Received via mailing list)
Hi,

On Sat, Apr 10, 2010 at 4:30 AM, NARUSE, Yui <naruse@airemix.jp> wrote:
> If the use case of URI.unescape is to show a human readable format,
> why escaping is needed?

To do the reverse: to allow a user to enter a human readable format
for a URL and convert it to the format the internet requires.

In most if not all browsers, if I type in
"http://example.com/?name=Marc André+42", this string will be
URI.escaped and used for http request.

So any application that requests a URL from its user needs URI.escape.

> If it is the use case of URI.unescape, the algorithm of it should follow
> web browsers, doesn't it? Try make a link and mouse over following url;
> You'll find some of them aren't decoded.
> http://www.google.com/search?hl=en&q=%2F%3F%23%25%2B%26

Indeed, Safari and Chrome do not. It can definitely be argued that
Firefox is mistaken, because the shown string doesn't round trip. If
you type it in the URL bar, you will not get to the same URL.

> Decoding URI is difficult problem and at least URI.unescape is not enough.
> HTML5/IRI-bis people seem discussing about related this, like
> http://lists.w3.org/Archives/Public/public-iri/201...
> But if we follow, it must cause breaking compatibility.

I will not argue that URI.{un}escape could be improved/modified.

I will argue that we must find a better solution that deprecating a
useful method before any replacement is available.

We don't even know what the future api will be, right?

What do you think of the following:
URI.{un}escape are renamed URI.{un}escape_chars (or percent_{de/en}code)
URI.{un}escape become aliases
URI.{un}escape_uri will become available as soon as possible, with the
desired treatment (will it translate "http://www.xn--e5h.com/" <=>
"http://www.♀.com/" ?)

This keeps full compatibility, makes the distinction clearer,
continues to provide a usefull feature, and gives us time to spec and
implement the new feature.

---
Marc-André
B11f10c4cd9d53970e7be20caa43f940?d=identicon&s=25 Tanaka Akira (Guest)
on 2010-04-10 17:47
(Received via mailing list)
2010/4/11 Marc-Andre Lafortune <ruby-core-mailing-list@marc-andre.ca>:
>
> In most if not all browsers, if I type in
> "http://example.com/?name=Marc André+42", this string will be
> URI.escaped and used for http request.
>
> So any application that requests a URL from its user needs URI.escape.

% ruby -ruri -e 'p URI.escape("http://example.com/?name=Marc André+42")'
"http://example.com/?name=Marc%20Andr%C3%A9+42"

Assuming that the web application interprets the query string according
to
application/x-www-form-urlencoded, the value is "Marc André 42".
The plus symbol is replaced by space.

Is it desired behavior?

% ruby -ruri -e 'p URI.escape("http://example.com/#foo")'
"http://example.com/%23foo"

I think the user intend that #foo means a fragment but
URI.escape replaces it to a path segment.

% ruby -ruri -e 'p URI.escape("http://example.com/%7Eakr/")'
"http://example.com/%257Eakr/"

I think the user intend that http://example.org/~akr/ but
URI.escape replaces it to a different resource.

% ruby -ruri -e 'p URI.unescape("http://example.com/%23foo")'
"http://example.com/#foo"

The actual URI contains the path segment %23foo but
URI.unescape replaces it to a fragment.

URI.escape and URI.unescape is not what you want.

> I will argue that we must find a better solution that deprecating a
> useful method before any replacement is available.

The methods are not useful for your purpose.

It may be good to have such methods in URI.
But URI.escape and URI.unescape is too ambiguos name for that.

People tend to think URI.escape is usable for building URI components
which is different to the purpose.
Try code search:
http://www.google.com/codesearch?q=URI.escape+lang%3Aruby
86c07aa43c6df798df005edd84ee8b56?d=identicon&s=25 Marc-Andre Lafortune (Guest)
on 2010-04-10 19:15
(Received via mailing list)
Hi,

On Sat, Apr 10, 2010 at 11:47 AM, Tanaka Akira <akr@fsij.org> wrote:
> The methods are not useful for your purpose.

Given your examples, I agree.

> It may be good to have such methods in URI.
> But URI.escape and URI.unescape is too ambiguos name for that.

It would probably be wise to revise the current encoding options and
see what is missing.

Thanks to you and Yui, for answering my questions.
This topic is locked and can not be replied to.