Javascript encodeURIComponent equal code


#1

Javascript’s encodeURIComponent works differently from CGI.eacape or
ERB::Util.u.
for example:
encodeURIComponent(‘中文’) = ‘%D6%D0%CE%C4’
but

CGI.escape(“中文”)
=> “%E4%B8%AD%E6%96%87”

ERB::Util.u(“中文”)
=> “%E4%B8%AD%E6%96%87”

Is there any way to get the same encoded result with ruby code?


#2

On Mar 31, 2:06Â pm, Nanyang Z. removed_email_address@domain.invalid
wrote:

Javascript’s encodeURIComponent works differently from CGI.eacape or
ERB::Util.u.

Well the difference is that the javascript stuff is produced UTF16 and
the ruby UTF8 (although the documentation I can find suggests that the
javascript should also be producing utf8).

for example:
encodeURIComponent(‘中文’) = ‘%D6%D0%CE%C4’
but>> CGI.escape(“中文”)

=> “%E4%B8%AD%E6%96%87”>> ERB::Util.u(“中文”)

=> “%E4%B8%AD%E6%96%87”

Is there any way to get the same encoded result with ruby code?

The are various libraries for messing around with string encodings,
including iconv, and pack/unpack have some specifiers that are
relevant for unicode stuff, and rails itself also has various unicode
utilities in it.

Fred


#3

Frederick C. wrote:

Well the difference is that the javascript stuff is produced UTF16 and
the ruby UTF8 (although the documentation I can find suggests that the
javascript should also be producing utf8).ith ruby code?

Thank you for your replied. May be it is the true. But how can the utf16
encodeURIComponent result to be the shorter?

The are various libraries for messing around with string encodings,
including iconv, and pack/unpack have some specifiers that are
relevant for unicode stuff, and rails itself also has various unicode
utilities in it.

I tried to encode the string to utf-16 encoding before passing it to
CGI.escape(), But I don’t have any luck to production the same result as
encodeURIComponent did. ( I got “%FE%FFN-e%87” from “中文”)

I find a perl and a python way to do encodeURIComponent on the net, and
their are here:
http://d.hatena.ne.jp/ruby-U/20081110/1226313786

It is a pity that I don’t know perl nor python. Can anyone figure out
the ruby code for me from them?


#4

On Mar 31, 4:27Â pm, Nanyang Z. removed_email_address@domain.invalid
wrote:

Frederick C. wrote:

Well the difference is that the javascript stuff is produced UTF16 and
the ruby UTF8 (although the documentation I can find suggests that the
javascript should also be producing utf8).ith ruby code?

Thank you for your replied. May be it is the true. But how can the utf16
encodeURIComponent result to be the shorter?

Because for double byte characters utf16 is shorter than utf8.

I find a perl and a python way to do encodeURIComponent on the net, and
their are here:http://d.hatena.ne.jp/ruby-U/20081110/1226313786

It is a pity that I don’t know perl nor python. Can anyone figure out
the ruby code for me from them?

Those aren’t playing with encodings which is apparently the issue
here. Why does it matter anyway?

Fred


#5

On Mar 31, 4:44Â pm, Nanyang Z. removed_email_address@domain.invalid
wrote:

$&.unpack(“C”)[0]) }
  end

now it works like this:

ERB::Util.url_encode(“中文”)

=> “%E4%B8%AD%E6%96%87”

Can you help me changing the url_encode code a bit, so it can return
utf16 result? ( which ‘%D6%D0%CE%C4’ is the one I want.)

well s.unpack(“U*”) will turn a string into a array of integers (utf
code points) that it should then be easy to split into bytes. I’d
start from scratch rather than using url_encode though.

Fred


#6

Frederick C. wrote:

Those aren’t playing with encodings which is apparently the issue
here. Why does it matter anyway?

ok.

Here is the source code of ERB::Util.url_encode(s) method.

File erb.rb, line 801

def url_encode(s)
  s.to_s.gsub(/[^a-zA-Z0-9_\-.]/n){ sprintf("%%%02X", 

$&.unpack(“C”)[0]) }
end

now it works like this:

ERB::Util.url_encode(“中文”)

=> “%E4%B8%AD%E6%96%87”

Can you help me changing the url_encode code a bit, so it can return
utf16 result? ( which ‘%D6%D0%CE%C4’ is the one I want.)


#7

On Mar 31, 5:04Â pm, Nanyang Z. removed_email_address@domain.invalid
wrote:

when:>> “中文”.unpack(“U*”)

=> [20013, 25991]

So, it is a way turning [20013, 25991] to ‘%D6%D0%CE%C4’, right?

Well 20013 is 0x4E2D which is the utf16 for the first of your
characters. Looking back at what you write I’d no idea where D6D0 is
coming from - that’s a completely different character according to the
unicode character palette I have. Not sure what you javascript has
been doing.

Fred


#8

Frederick C. wrote:

I’d no idea where D6D0 is
coming from

OK, problem solved. Thank you, Fred. I may never have it done without
your help.

It turns out %D6%D0%CE%C4 is not a utf16 relate result, but a GB2312
encoding production.

I convert the string from utf8 to GB2312 with iconv, then the url_encode
products the right string I need.

Thank you again.


#9

Frederick C. wrote:

well s.unpack(“U*”) will turn a string into a array of integers (utf
code points) that it should then be easy to split into bytes. I’d
start from scratch rather than using url_encode though.

Thanks! Fred.

“中文”.unpack(“C*”)
=> [228, 184, 173, 230, 150, 135]
ERB::Util.url_encode(“中文”)
=> “%E4%B8%AD%E6%96%87”

For the first time,I have a little idea what url_encode is doing.

when:

“中文”.unpack(“U*”)
=> [20013, 25991]

So, it is a way turning [20013, 25991] to ‘%D6%D0%CE%C4’, right?


#10

Nanyang Z. wrote:

Frederick C. wrote:

I’d no idea where D6D0 is
coming from

OK, problem solved. Thank you, Fred. I may never have it done without
your help.

It turns out %D6%D0%CE%C4 is not a utf16 relate result, but a GB2312
encoding production.

I convert the string from utf8 to GB2312 with iconv, then the url_encode
products the right string I need.

Thank you again.

could you give me some codes you soloved the problem?
thanks a lot.