Forum: Ruby on Rails javascript encodeURIComponent equal code

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Nanyang Z. (Guest)
on 2009-03-31 17:06
Javascript's encodeURIComponent works differently from CGI.eacape or
ERB::Util.u.
for example:
encodeURIComponent('中文') = '%D6%D0%CE%C4'
but
>> CGI.escape("中文")
=> "%E4%B8%AD%E6%96%87"
>> ERB::Util.u("中文")
=> "%E4%B8%AD%E6%96%87"

Is there any way to get the same encoded result with ruby code?
Frederick C. (Guest)
on 2009-03-31 17:41
(Received via mailing list)
On Mar 31, 2:06 pm, Nanyang Z. <removed_email_address@domain.invalid>
wrote:
> Javascript's encodeURIComponent works differently from CGI.eacape or
> ERB::Util.u.

Well the difference is that the javascript stuff is produced UTF16 and
the ruby UTF8 (although the documentation I can find suggests that the
javascript should also be producing utf8).

> for example:
> encodeURIComponent('中文') = '%D6%D0%CE%C4'
> but>> CGI.escape("中文")
>
> => "%E4%B8%AD%E6%96%87">> ERB::Util.u("中文")
>
> => "%E4%B8%AD%E6%96%87"
>
> Is there any way to get the same encoded result with ruby code?

The are various libraries for messing around with string encodings,
including iconv, and pack/unpack have some specifiers that are
relevant for unicode stuff, and rails itself also has various unicode
utilities in it.

Fred
Nanyang Z. (Guest)
on 2009-03-31 19:27
Frederick C. wrote:
> Well the difference is that the javascript stuff is produced UTF16 and
> the ruby UTF8 (although the documentation I can find suggests that the
> javascript should also be producing utf8).ith ruby code?

Thank you for your replied. May be it is the true. But how can the utf16
encodeURIComponent result to be the shorter?

> The are various libraries for messing around with string encodings,
> including iconv, and pack/unpack have some specifiers that are
> relevant for unicode stuff, and rails itself also has various unicode
> utilities in it.

I tried to encode the string to utf-16 encoding before passing it to
CGI.escape(), But I don't have any luck to production the same result as
encodeURIComponent did. ( I got "%FE%FFN-e%87" from "中文")


I find a perl and a python way to do encodeURIComponent on the net, and
their are here:
http://d.hatena.ne.jp/ruby-U/20081110/1226313786

It is a pity that I don't know perl nor python. Can anyone figure out
the ruby code for me from them?
Frederick C. (Guest)
on 2009-03-31 19:35
(Received via mailing list)
On Mar 31, 4:27 pm, Nanyang Z. <removed_email_address@domain.invalid>
wrote:
> Frederick C. wrote:
> > Well the difference is that the javascript stuff is produced UTF16 and
> > the ruby UTF8 (although the documentation I can find suggests that the
> > javascript should also be producing utf8).ith ruby code?
>
> Thank you for your replied. May be it is the true. But how can the utf16
> encodeURIComponent result to be the shorter?

Because for double byte characters utf16 is shorter than utf8.

> I find a perl and a python way to do encodeURIComponent on the net, and
> their are here:http://d.hatena.ne.jp/ruby-U/20081110/1226313786
>
> It is a pity that I don't know perl nor python. Can anyone figure out
> the ruby code for me from them?
>
Those aren't playing with encodings which is apparently the issue
here. Why does it matter anyway?

Fred
Nanyang Z. (Guest)
on 2009-03-31 19:44
Frederick C. wrote:

> Those aren't playing with encodings which is apparently the issue
> here. Why does it matter anyway?

ok.


Here is the source code of ERB::Util.url_encode(s) method.
# File erb.rb, line 801
    def url_encode(s)
      s.to_s.gsub(/[^a-zA-Z0-9_\-.]/n){ sprintf("%%%02X",
$&.unpack("C")[0]) }
    end


now it works like this:
> ERB::Util.url_encode("中文")
>
> => "%E4%B8%AD%E6%96%87"

Can you help me changing the url_encode code a bit, so it can return
utf16 result? ( which '%D6%D0%CE%C4' is the one I want.)
Frederick C. (Guest)
on 2009-03-31 19:54
(Received via mailing list)
On Mar 31, 4:44 pm, Nanyang Z. <removed_email_address@domain.invalid>
wrote:
> $&.unpack("C")[0]) }
>     end
>
> now it works like this:
>
> > ERB::Util.url_encode("中文")
>
> > => "%E4%B8%AD%E6%96%87"
>
> Can you help me changing the url_encode code a bit, so it can return
> utf16 result? ( which '%D6%D0%CE%C4' is the one I want.)

well s.unpack("U*") will turn a string into a array of integers (utf
code points) that it should then be easy to split into bytes. I'd
start from scratch rather than using url_encode though.


Fred
Nanyang Z. (Guest)
on 2009-03-31 20:04
Frederick C. wrote:
> well s.unpack("U*") will turn a string into a array of integers (utf
> code points) that it should then be easy to split into bytes. I'd
> start from scratch rather than using url_encode though.

Thanks! Fred.

>> "中文".unpack("C*")
=> [228, 184, 173, 230, 150, 135]
 > ERB::Util.url_encode("中文")
> => "%E4%B8%AD%E6%96%87"

For the first time,I have a little idea what url_encode is doing.

when:
>> "中文".unpack("U*")
=> [20013, 25991]

So, it is a way turning [20013, 25991] to '%D6%D0%CE%C4', right?
Frederick C. (Guest)
on 2009-03-31 20:45
(Received via mailing list)
On Mar 31, 5:04 pm, Nanyang Z. <removed_email_address@domain.invalid>
wrote:
>
> when:>> "中文".unpack("U*")
>
> => [20013, 25991]
>
> So, it is a way turning [20013, 25991] to '%D6%D0%CE%C4', right?
>
Well 20013 is 0x4E2D which is the utf16 for the first of your
characters. Looking back at  what you write I'd no idea where D6D0 is
coming from - that's a completely different character according to the
unicode character palette I have. Not sure what you javascript has
been doing.

Fred
Nanyang Z. (Guest)
on 2009-03-31 20:56
Frederick C. wrote:
> I'd no idea where D6D0 is
> coming from

OK, problem solved. Thank you, Fred. I may never have it done without
your help.

It turns out %D6%D0%CE%C4 is not a utf16 relate result, but a GB2312
encoding production.

I convert the string from utf8 to GB2312 with iconv, then the url_encode
products the right string I need.

Thank you again.
Qi X. (Guest)
on 2009-05-20 10:58
Nanyang Z. wrote:
> Frederick C. wrote:
>> I'd no idea where D6D0 is
>> coming from
>
> OK, problem solved. Thank you, Fred. I may never have it done without
> your help.
>
> It turns out %D6%D0%CE%C4 is not a utf16 relate result, but a GB2312
> encoding production.
>
> I convert the string from utf8 to GB2312 with iconv, then the url_encode
> products the right string I need.
>
> Thank you again.

could you give me some codes you soloved the problem?
thanks a lot.
This topic is locked and can not be replied to.