Forum: Ruby Improving hexadecimal escaping performance

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Iñaki Baz C. (Guest)
on 2009-02-23 02:07
(Received via mailing list)
Hi, I've a module with two methods (thanks Jeff):
- hex_unescape(string)
- hex_scape(string)
as follows:

  def self::hex_unescape(str)
    str.gsub(/%([0-9a-fA-F]{2})/) { $1.to_i(16).chr }
  end

  def self::hex_escape(str)
    str.gsub(/[^a-zA-Z0-9_\-.]/n) { sprintf("%%%02X", $&.unpack("C")[0])
}
  end

"hex_escape" method is copied from CGI lib, and sincerelly I don't like
too
much its approach using "sprintf". Is there other way more ellegant?
(performance is the mos important requeriment anyway).

Thanks a lot.
7stud -. (Guest)
on 2009-02-23 08:23
Iñaki Baz C. wrote:
> I don't like
> too
> much its approach using "sprintf". Is there other way more ellegant?
> (performance is the mos important requeriment anyway).
>

pickaxe2, p. 23:
------
Another output method we use a lot is printf....
------

pickaxe2, p. 526:
--------
printf

Equivalent to io.write sprintf(...)
--------

The Ruby Way (2nd), p. 72:
----------
2.9 Formatting a String

This is done in Ruby as it is in C, with the sprintf method.
---------


>Is there other way more ellegant?

def hex_escape(str)
  str.gsub(/[^a-zA-Z0-9_\-.]/n) do |match|
    "%%%02X" % match[0]
  end
end

s = "?<>é"
puts hex_escape(s)

--output:--
%3F%3C%3E%C3%A9
Robert K. (Guest)
on 2009-02-23 11:48
(Received via mailing list)
2009/2/23 Iñaki Baz C. <removed_email_address@domain.invalid>:
>    str.gsub(/[^a-zA-Z0-9_\-.]/n) { sprintf("%%%02X", $&.unpack("C")[0]) }
>  end
>
> "hex_escape" method is copied from CGI lib, and sincerelly I don't like too
> much its approach using "sprintf". Is there other way more ellegant?
> (performance is the mos important requeriment anyway).

Then I am sure you _measured_ it and came to the conclusion that it is
too slow, did you?  What are your results and what are your
performance requirements?

Cheers

robert
Iñaki Baz C. (Guest)
on 2009-02-23 12:29
(Received via mailing list)
2009/2/23 Robert K. <removed_email_address@domain.invalid>:
>>  def self::hex_escape(str)
>>    str.gsub(/[^a-zA-Z0-9_\-.]/n) { sprintf("%%%02X", $&.unpack("C")[0]) }
>>  end
>>
>> "hex_escape" method is copied from CGI lib, and sincerelly I don't like too
>> much its approach using "sprintf". Is there other way more ellegant?
>> (performance is the mos important requeriment anyway).
>
> Then I am sure you _measured_ it and came to the conclusion that it is
> too slow, did you?  What are your results and what are your
> performance requirements?

I did a Benchmark.realtime comparing hex_unescape and hex_escape
methods. hex_unescape takes ~2.5*10^(-5) while hex_escape takes
~4*10^(-5).

Anyway I've realized right now that "sprintf" is directly implemented
as C code so it can't be faster.

Thanks.
Robert K. (Guest)
on 2009-02-23 13:56
(Received via mailing list)
2009/2/23 Iñaki Baz C. <removed_email_address@domain.invalid>:
>>>
>> performance requirements?
>
> I did a Benchmark.realtime comparing hex_unescape and hex_escape
> methods. hex_unescape takes ~2.5*10^(-5) while hex_escape takes
> ~4*10^(-5).
>
> Anyway I've realized right now that "sprintf" is directly implemented
> as C code so it can't be faster.

Well, you can at least do this in 1.8

def self::hex_escape(str)
   str.gsub(/[^a-zA-Z0-9_\-.]/n) {|m| sprintf("%%%02X", m[0]) }
end

And this in 1.9

def self::hex_escape(str)
   str.gsub(/[^a-zA-Z0-9_\-.]/n) {|m| sprintf("%%%02X", m.getbyte(0)) }
end

Cheers

robert
Iñaki Baz C. (Guest)
on 2009-02-23 15:56
(Received via mailing list)
2009/2/23 Robert K. <removed_email_address@domain.invalid>:
> end
Thanks, do you mean that "m[0]" in Ruby 1.9 has a different behaviour
than in 1.8? maybe in 1.9 "m[0]" returns the first character (even if
it's more than two bytes as "ñ", "€"...) while in 1.8 it returns just
the firrst two bytes?

PD: I've Ruby 1.9 (2007-12-25 revision 14709) and I don't have
"getbyte()" method for String.

Thanks a lot.
Robert K. (Guest)
on 2009-02-23 16:17
(Received via mailing list)
2009/2/23 Iñaki Baz C. <removed_email_address@domain.invalid>
> > def self::hex_escape(str)
> "getbyte()" method for String.
15:15:25 ~$ ruby -ve 'p "foo"[0]'
ruby 1.8.7 (2008-08-11 patchlevel 72) [i386-cygwin]
102
15:15:31 ~$ ruby19 -ve 'p "foo"[0]'
ruby 1.9.1p0 (2009-01-30 revision 21907) [i386-cygwin]
"f"
15:15:34 ~$ ruby19 -ve 'p "foo".getbyte(0)'
ruby 1.9.1p0 (2009-01-30 revision 21907) [i386-cygwin]
102
15:15:57 ~$


robert
Iñaki Baz C. (Guest)
on 2009-02-23 16:35
(Received via mailing list)
2009/2/23 Robert K. <removed_email_address@domain.invalid>:
> 15:15:25 ~$ ruby -ve 'p "foo"[0]'
> ruby 1.8.7 (2008-08-11 patchlevel 72) [i386-cygwin]
> 102
> 15:15:31 ~$ ruby19 -ve 'p "foo"[0]'
> ruby 1.9.1p0 (2009-01-30 revision 21907) [i386-cygwin]
> "f"
> 15:15:34 ~$ ruby19 -ve 'p "foo".getbyte(0)'
> ruby 1.9.1p0 (2009-01-30 revision 21907) [i386-cygwin]
> 102
> 15:15:57 ~$

Clear now, thanks :)
Simon K. (Guest)
on 2009-02-23 17:35
(Received via mailing list)
* Iñaki Baz C. <removed_email_address@domain.invalid> (11:28) schrieb:

> I did a Benchmark.realtime comparing hex_unescape and hex_escape
> methods. hex_unescape takes ~2.5*10^(-5) while hex_escape takes
> ~4*10^(-5).

For what exactly is 40 microseconds too slow?

mfg,                         simon .... l
Iñaki Baz C. (Guest)
on 2009-02-23 18:16
(Received via mailing list)
2009/2/23 Simon K. <removed_email_address@domain.invalid>:
> * Iñaki Baz C. <removed_email_address@domain.invalid> (11:28) schrieb:
>
>> I did a Benchmark.realtime comparing hex_unescape and hex_escape
>> methods. hex_unescape takes ~2.5*10^(-5) while hex_escape takes
>> ~4*10^(-5).
>
> For what exactly is 40 microseconds too slow?

I don't mean that, but it's extrange that the inverse method takes
double time, isn't it?
Simon K. (Guest)
on 2009-02-24 04:35
(Received via mailing list)
* Iñaki Baz C. <removed_email_address@domain.invalid> (17:14) schrieb:

> double time, isn't it?
How would you implement these at the core level?

mfg,                      simon .... l
This topic is locked and can not be replied to.