Forum: Ruby Improving hexadecimal escaping performance

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
0f1f17ba297242e9d3c86d4cc0a6ea85?d=identicon&s=25 Iñaki Baz Castillo (Guest)
on 2009-02-23 01:07
(Received via mailing list)
Hi, I've a module with two methods (thanks Jeff):
- hex_unescape(string)
- hex_scape(string)
as follows:

  def self::hex_unescape(str)
    str.gsub(/%([0-9a-fA-F]{2})/) { $1.to_i(16).chr }
  end

  def self::hex_escape(str)
    str.gsub(/[^a-zA-Z0-9_\-.]/n) { sprintf("%%%02X", $&.unpack("C")[0])
}
  end

"hex_escape" method is copied from CGI lib, and sincerelly I don't like
too
much its approach using "sprintf". Is there other way more ellegant?
(performance is the mos important requeriment anyway).

Thanks a lot.
54404bcac0f45bf1c8e8b827cd9bb709?d=identicon&s=25 7stud -- (7stud)
on 2009-02-23 07:23
Iñaki Baz Castillo wrote:
> I don't like
> too
> much its approach using "sprintf". Is there other way more ellegant?
> (performance is the mos important requeriment anyway).
>

pickaxe2, p. 23:
------
Another output method we use a lot is printf....
------

pickaxe2, p. 526:
--------
printf

Equivalent to io.write sprintf(...)
--------

The Ruby Way (2nd), p. 72:
----------
2.9 Formatting a String

This is done in Ruby as it is in C, with the sprintf method.
---------


>Is there other way more ellegant?

def hex_escape(str)
  str.gsub(/[^a-zA-Z0-9_\-.]/n) do |match|
    "%%%02X" % match[0]
  end
end

s = "?<>é"
puts hex_escape(s)

--output:--
%3F%3C%3E%C3%A9
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2009-02-23 10:48
(Received via mailing list)
2009/2/23 Iñaki Baz Castillo <ibc@aliax.net>:
>    str.gsub(/[^a-zA-Z0-9_\-.]/n) { sprintf("%%%02X", $&.unpack("C")[0]) }
>  end
>
> "hex_escape" method is copied from CGI lib, and sincerelly I don't like too
> much its approach using "sprintf". Is there other way more ellegant?
> (performance is the mos important requeriment anyway).

Then I am sure you _measured_ it and came to the conclusion that it is
too slow, did you?  What are your results and what are your
performance requirements?

Cheers

robert
0f1f17ba297242e9d3c86d4cc0a6ea85?d=identicon&s=25 Iñaki Baz Castillo (Guest)
on 2009-02-23 11:29
(Received via mailing list)
2009/2/23 Robert Klemme <shortcutter@googlemail.com>:
>>  def self::hex_escape(str)
>>    str.gsub(/[^a-zA-Z0-9_\-.]/n) { sprintf("%%%02X", $&.unpack("C")[0]) }
>>  end
>>
>> "hex_escape" method is copied from CGI lib, and sincerelly I don't like too
>> much its approach using "sprintf". Is there other way more ellegant?
>> (performance is the mos important requeriment anyway).
>
> Then I am sure you _measured_ it and came to the conclusion that it is
> too slow, did you?  What are your results and what are your
> performance requirements?

I did a Benchmark.realtime comparing hex_unescape and hex_escape
methods. hex_unescape takes ~2.5*10^(-5) while hex_escape takes
~4*10^(-5).

Anyway I've realized right now that "sprintf" is directly implemented
as C code so it can't be faster.

Thanks.
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2009-02-23 12:56
(Received via mailing list)
2009/2/23 Iñaki Baz Castillo <ibc@aliax.net>:
>>>
>> performance requirements?
>
> I did a Benchmark.realtime comparing hex_unescape and hex_escape
> methods. hex_unescape takes ~2.5*10^(-5) while hex_escape takes
> ~4*10^(-5).
>
> Anyway I've realized right now that "sprintf" is directly implemented
> as C code so it can't be faster.

Well, you can at least do this in 1.8

def self::hex_escape(str)
   str.gsub(/[^a-zA-Z0-9_\-.]/n) {|m| sprintf("%%%02X", m[0]) }
end

And this in 1.9

def self::hex_escape(str)
   str.gsub(/[^a-zA-Z0-9_\-.]/n) {|m| sprintf("%%%02X", m.getbyte(0)) }
end

Cheers

robert
0f1f17ba297242e9d3c86d4cc0a6ea85?d=identicon&s=25 Iñaki Baz Castillo (Guest)
on 2009-02-23 14:56
(Received via mailing list)
2009/2/23 Robert Klemme <shortcutter@googlemail.com>:
> end
Thanks, do you mean that "m[0]" in Ruby 1.9 has a different behaviour
than in 1.8? maybe in 1.9 "m[0]" returns the first character (even if
it's more than two bytes as "ñ", "€"...) while in 1.8 it returns just
the firrst two bytes?

PD: I've Ruby 1.9 (2007-12-25 revision 14709) and I don't have
"getbyte()" method for String.

Thanks a lot.
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2009-02-23 15:17
(Received via mailing list)
2009/2/23 Iñaki Baz Castillo <ibc@aliax.net>
> > def self::hex_escape(str)
> "getbyte()" method for String.
15:15:25 ~$ ruby -ve 'p "foo"[0]'
ruby 1.8.7 (2008-08-11 patchlevel 72) [i386-cygwin]
102
15:15:31 ~$ ruby19 -ve 'p "foo"[0]'
ruby 1.9.1p0 (2009-01-30 revision 21907) [i386-cygwin]
"f"
15:15:34 ~$ ruby19 -ve 'p "foo".getbyte(0)'
ruby 1.9.1p0 (2009-01-30 revision 21907) [i386-cygwin]
102
15:15:57 ~$


robert
0f1f17ba297242e9d3c86d4cc0a6ea85?d=identicon&s=25 Iñaki Baz Castillo (Guest)
on 2009-02-23 15:35
(Received via mailing list)
2009/2/23 Robert Klemme <shortcutter@googlemail.com>:
> 15:15:25 ~$ ruby -ve 'p "foo"[0]'
> ruby 1.8.7 (2008-08-11 patchlevel 72) [i386-cygwin]
> 102
> 15:15:31 ~$ ruby19 -ve 'p "foo"[0]'
> ruby 1.9.1p0 (2009-01-30 revision 21907) [i386-cygwin]
> "f"
> 15:15:34 ~$ ruby19 -ve 'p "foo".getbyte(0)'
> ruby 1.9.1p0 (2009-01-30 revision 21907) [i386-cygwin]
> 102
> 15:15:57 ~$

Clear now, thanks :)
1d53b088a989e069b94597c282eebbbc?d=identicon&s=25 Simon Krahnke (Guest)
on 2009-02-23 16:35
(Received via mailing list)
* Iñaki Baz Castillo <ibc@aliax.net> (11:28) schrieb:

> I did a Benchmark.realtime comparing hex_unescape and hex_escape
> methods. hex_unescape takes ~2.5*10^(-5) while hex_escape takes
> ~4*10^(-5).

For what exactly is 40 microseconds too slow?

mfg,                         simon .... l
0f1f17ba297242e9d3c86d4cc0a6ea85?d=identicon&s=25 Iñaki Baz Castillo (Guest)
on 2009-02-23 17:16
(Received via mailing list)
2009/2/23 Simon Krahnke <overlord@gmx.li>:
> * Iñaki Baz Castillo <ibc@aliax.net> (11:28) schrieb:
>
>> I did a Benchmark.realtime comparing hex_unescape and hex_escape
>> methods. hex_unescape takes ~2.5*10^(-5) while hex_escape takes
>> ~4*10^(-5).
>
> For what exactly is 40 microseconds too slow?

I don't mean that, but it's extrange that the inverse method takes
double time, isn't it?
1d53b088a989e069b94597c282eebbbc?d=identicon&s=25 Simon Krahnke (Guest)
on 2009-02-24 03:35
(Received via mailing list)
* Iñaki Baz Castillo <ibc@aliax.net> (17:14) schrieb:

> double time, isn't it?
How would you implement these at the core level?

mfg,                      simon .... l
This topic is locked and can not be replied to.