Improving hexadecimal escaping performance


#1

Hi, I’ve a module with two methods (thanks Jeff):

  • hex_unescape(string)

  • hex_scape(string)
    as follows:

    def self::hex_unescape(str)
    str.gsub(/%([0-9a-fA-F]{2})/) { $1.to_i(16).chr }
    end

    def self::hex_escape(str)
    str.gsub(/[^a-zA-Z0-9_-.]/n) { sprintf("%%%02X", $&.unpack(“C”)[0])
    }
    end

“hex_escape” method is copied from CGI lib, and sincerelly I don’t like
too
much its approach using “sprintf”. Is there other way more ellegant?
(performance is the mos important requeriment anyway).

Thanks a lot.


#2

Iñaki Baz C. wrote:

I don’t like
too
much its approach using “sprintf”. Is there other way more ellegant?
(performance is the mos important requeriment anyway).

pickaxe2, p. 23:

Another output method we use a lot is printf…

pickaxe2, p. 526:

printf

Equivalent to io.write sprintf(…)

The Ruby Way (2nd), p. 72:

2.9 Formatting a String

This is done in Ruby as it is in C, with the sprintf method.

Is there other way more ellegant?

def hex_escape(str)
str.gsub(/[^a-zA-Z0-9_-.]/n) do |match|
“%%%02X” % match[0]
end
end

s = “?<>é”
puts hex_escape(s)

–output:–
%3F%3C%3E%C3%A9


#3

2009/2/23 Iñaki Baz C. removed_email_address@domain.invalid:

str.gsub(/[^a-zA-Z0-9_-.]/n) { sprintf("%%%02X", $&.unpack(“C”)[0]) }
end

“hex_escape” method is copied from CGI lib, and sincerelly I don’t like too
much its approach using “sprintf”. Is there other way more ellegant?
(performance is the mos important requeriment anyway).

Then I am sure you measured it and came to the conclusion that it is
too slow, did you? What are your results and what are your
performance requirements?

Cheers

robert


#4

2009/2/23 Iñaki Baz C. removed_email_address@domain.invalid:

performance requirements?

I did a Benchmark.realtime comparing hex_unescape and hex_escape
methods. hex_unescape takes ~2.510^(-5) while hex_escape takes
~4
10^(-5).

Anyway I’ve realized right now that “sprintf” is directly implemented
as C code so it can’t be faster.

Well, you can at least do this in 1.8

def self::hex_escape(str)
str.gsub(/[^a-zA-Z0-9_-.]/n) {|m| sprintf("%%%02X", m[0]) }
end

And this in 1.9

def self::hex_escape(str)
str.gsub(/[^a-zA-Z0-9_-.]/n) {|m| sprintf("%%%02X", m.getbyte(0)) }
end

Cheers

robert


#5

2009/2/23 Robert K. removed_email_address@domain.invalid:

def self::hex_escape(str)
str.gsub(/[^a-zA-Z0-9_-.]/n) { sprintf("%%%02X", $&.unpack(“C”)[0]) }
end

“hex_escape” method is copied from CGI lib, and sincerelly I don’t like too
much its approach using “sprintf”. Is there other way more ellegant?
(performance is the mos important requeriment anyway).

Then I am sure you measured it and came to the conclusion that it is
too slow, did you? What are your results and what are your
performance requirements?

I did a Benchmark.realtime comparing hex_unescape and hex_escape
methods. hex_unescape takes ~2.510^(-5) while hex_escape takes
~4
10^(-5).

Anyway I’ve realized right now that “sprintf” is directly implemented
as C code so it can’t be faster.

Thanks.


#6

2009/2/23 Robert K. removed_email_address@domain.invalid:

end
Thanks, do you mean that “m[0]” in Ruby 1.9 has a different behaviour
than in 1.8? maybe in 1.9 “m[0]” returns the first character (even if
it’s more than two bytes as “ñ”, “€”…) while in 1.8 it returns just
the firrst two bytes?

PD: I’ve Ruby 1.9 (2007-12-25 revision 14709) and I don’t have
“getbyte()” method for String.

Thanks a lot.


#7

2009/2/23 Iñaki Baz C. removed_email_address@domain.invalid

def self::hex_escape(str)
“getbyte()” method for String.
15:15:25 ~$ ruby -ve ‘p “foo”[0]’
ruby 1.8.7 (2008-08-11 patchlevel 72) [i386-cygwin]
102
15:15:31 ~$ ruby19 -ve ‘p “foo”[0]’
ruby 1.9.1p0 (2009-01-30 revision 21907) [i386-cygwin]
“f”
15:15:34 ~$ ruby19 -ve ‘p “foo”.getbyte(0)’
ruby 1.9.1p0 (2009-01-30 revision 21907) [i386-cygwin]
102
15:15:57 ~$

robert


#8

2009/2/23 Robert K. removed_email_address@domain.invalid:

15:15:25 ~$ ruby -ve ‘p “foo”[0]’
ruby 1.8.7 (2008-08-11 patchlevel 72) [i386-cygwin]
102
15:15:31 ~$ ruby19 -ve ‘p “foo”[0]’
ruby 1.9.1p0 (2009-01-30 revision 21907) [i386-cygwin]
“f”
15:15:34 ~$ ruby19 -ve ‘p “foo”.getbyte(0)’
ruby 1.9.1p0 (2009-01-30 revision 21907) [i386-cygwin]
102
15:15:57 ~$

Clear now, thanks :slight_smile:


#9

I did a Benchmark.realtime comparing hex_unescape and hex_escape
methods. hex_unescape takes ~2.510^(-5) while hex_escape takes
~4
10^(-5).

For what exactly is 40 microseconds too slow?

mfg, simon … l


#10

2009/2/23 Simon K. removed_email_address@domain.invalid:

I did a Benchmark.realtime comparing hex_unescape and hex_escape
methods. hex_unescape takes ~2.510^(-5) while hex_escape takes
~4
10^(-5).

For what exactly is 40 microseconds too slow?

I don’t mean that, but it’s extrange that the inverse method takes
double time, isn’t it?


#11

double time, isn’t it?
How would you implement these at the core level?

mfg, simon … l