Struct is slow

I have a script in which I was using a 2-element array where a struct
would be used in another language, so I decided to give the Ruby class
Struct a try.

My script went from taking 2 seconds to taking 27 seconds from this
simple change! The use of this 2-element array is only one small part
of the script so it was very surprising to me that it could even cause
the script to take twice as long no matter how inefficient it might be.
To take more than 10 times longer for the entire script to execute was a
shocker.

I will probably shy away from the use of the Struct class after this
brief experience and use arrays with constants for the indexes. I’m
wondering if this is an erroneous conclusion and perhaps I am missing
something.

What are the advantages/disadvantages of using the Struct class in Ruby?

On 10/16/07, Wayne M. [email protected] wrote:

I have a script in which I was using a 2-element array where a struct
would be used in another language, so I decided to give the Ruby class
Struct a try.

My script went from taking 2 seconds to taking 27 seconds from this
simple change! The use of this 2-element array is only one small part
of the script so it was very surprising to me that it could even cause
the script to take twice as long no matter how inefficient it might be.
To take more than 10 times longer for the entire script to execute was a
shocker.
Hmm some code to look at would be nice, this seems difficult to believe
indeed.

Robert

Wayne M. wrote:

I have a script in which I was using a 2-element array where a struct
would be used in another language, so I decided to give the Ruby class
Struct a try.

My script went from taking 2 seconds to taking 27 seconds from this
simple change!

This doesn’t sound likely, even if your script did nothing else but use
the Struct class. I’d look elsewhere for the source of slowness (try
running your script with -rprofile).

A quick benchmark suggests that Struct is somewhat slower to instantiate
(+100%), marginally slower to set values in (+33%), and the same speed
to fetch values from as an Array:

BENCHMARK:

require ‘benchmark’

GC.disable
Foo = Struct.new(:foo, :bar)

repetitions = 500_000

puts ‘struct-init’, Benchmark::measure {
repetitions.times { f = Foo.new(‘x’, 666) }
}

puts ‘array-init’, Benchmark::measure {
repetitions.times { f = [‘x’, 666] }
}

puts ‘struct-get’, Benchmark::measure {
f = Foo.new(‘x’, 666)
repetitions.times { g = f.foo }
}

puts ‘array-get’, Benchmark::measure {
f = [‘x’, 666]
repetitions.times { g = f[0] }
}

puts ‘struct-set’, Benchmark::measure {
f = Foo.new(‘x’, 666)
repetitions.times { f.foo = ‘y’ }
}

puts ‘array-set’, Benchmark::measure {
f = [‘x’, 666]
repetitions.times { f[0] = ‘y’ }
}

END

RESULTS (ruby 1.8.4 (2005-12-24) [i386-mswin32])

struct-init
1.359000 0.031000 1.390000 ( 1.390000)
array-init
0.563000 0.016000 0.579000 ( 0.579000)
struct-get
0.281000 0.000000 0.281000 ( 0.281000)
array-get
0.297000 0.000000 0.297000 ( 0.297000)
struct-set
0.422000 0.000000 0.422000 ( 0.422000)
array-set
0.281000 0.000000 0.281000 ( 0.281000)

alex

I understand your skepticism, it seems impossible, and I totally agree
it DOES seem impossible, yet that is what I am seeing.

I thought maybe I had done something wrong, so I recreated my experiment
(I had already deleted the file). I was running it from Eclipse, so I
opened a DOS window and ran it directly and got the same result.

The script is too large to post, but what it does is read C header files
and generates an array of these “structs” which have two strings: the
type and the name of a typedef. It then goes through the list making
them to be somewhat “primary” types for another script to process. Here
are some examples of the output:

float UBT_Feet_Type
uint_8 UBT_Turn_Directions_Type
struct NCBYTERD_PositionConvRecType
uint_32 NCTPLYGN_PolygonIterationsType
struct[4] NCTPLYGN_QuadralateralPolygonVerticesType
struct NCTPLYGN_QuadralateralClassType
struct[8] NCTPLYGN_QuadralateralPolygonSetType
uint_8 NCPRMYTP_DataStatusType
double NCPRMYTP_DegreesType
enum NCPRMYTP_PeriodsTypeEnum

One thing I just noticed is that the script with the Struct class
doesn’t work correctly. It doesn’t make the types “primary” and it
doesn’t generate as many lines. Obviously, there’s something I don’t
understand about how to use Struct.

Here is the difference between the 2 scripts. The array

FILE COMPARISON
Produced: 10/16/2007 1:27:12 PM

Mode: Just Differences

Left file: C:\workspace\CreateSfct\h_file_types_diff.txt Right file:
C:\workspace\CreateSfct\h_file_types (before struct mod).rb
L19 $typedef_rec = Struct.new(“Typedef”, :typedef_type,
:typedef_name)
R19 TYPEDEF_TYPE = 0
TYPEDEF_NAME = 1


L74 typedef_type = type_arr[j][:typedef_type]
typedef_name = type_arr[j][:typedef_name]
R75 typedef_type = type_arr[j][TYPEDEF_TYPE]
typedef_name = type_arr[j][TYPEDEF_NAME]


L88 type_arr[j][:typedef_type].sub!(/\w+/, primary_t)
R89 type_arr[j][TYPEDEF_TYPE].sub!(/\w+/, primary_t)


L91 if type_arr[j][:typedef_type] =~ /[([A-Z]\w*)]/ then
type_arr[j][:typedef_type].sub!($1,define_lookup($1,define_hash))
R92 if type_arr[j][TYPEDEF_TYPE] =~ /[([A-Z]\w*)]/ then
type_arr[j][TYPEDEF_TYPE].sub!($1,define_lookup($1,define_hash))


L186 return $typedef_rec.new(“enum”, enum_name)
R187 return [“enum”, enum_name]


L202 return $typedef_rec.new(typedef_type, typedef_name)
R203 return [typedef_type, typedef_name]


L546 type_arr.each do |x|
print(x[:typedef_type], “\t\t”, x[:typedef_name], “\n”)
R547 type_arr.each do |typedef_type, typedef_name|
print(typedef_type, “\t\t”, typedef_name, “\n”)

If type_arr[j] is a struct, then try type_arr[j].typedef_type
rather than type_arr[j][:typedef_type], and see if that makes
a difference.

-mental

Mental G. wrote:

If type_arr[j] is a struct, then try type_arr[j].typedef_type
rather than type_arr[j][:typedef_type], and see if that makes
a difference.

Well, I tried it and, as expected, the syntax change made no difference.
The one with Struct’s still takes 27 seconds and doesn’t work correctly.
It’s not clear to me why it should be different than just using a
2-element array (shrug).

On 16.10.2007 19:05, Alex F. wrote:

running your script with -rprofile).
Foo = Struct.new(:foo, :bar)

puts ‘struct-set’, Benchmark::measure {
END
0.297000 0.000000 0.297000 ( 0.297000)
struct-set
0.422000 0.000000 0.422000 ( 0.422000)
array-set
0.281000 0.000000 0.281000 ( 0.281000)

I’m not sure whether you are familiar with Benchmark#bmbm which does a
rehearsal - personally I rather not switch off GC since in realistic
situations GC time belongs into the mix. But results are rather
similar:

Robert@Babelfish2 /cygdrive/c/TEMP
$ ruby array-struct.rb
Rehearsal --------------------------------------------------
struct init 3.812000 0.000000 3.812000 ( 3.923000)
array init 1.672000 0.000000 1.672000 ( 1.709000)
struct get 0.437000 0.000000 0.437000 ( 0.440000)
array get 0.485000 0.000000 0.485000 ( 0.485000)
struct set 0.718000 0.000000 0.718000 ( 0.716000)
array set 0.500000 0.000000 0.500000 ( 0.510000)
----------------------------------------- total: 7.624000sec

                  user     system      total        real

struct init 3.891000 0.000000 3.891000 ( 3.984000)
array init 1.640000 0.000000 1.640000 ( 1.690000)
struct get 0.438000 0.000000 0.438000 ( 0.450000)
array get 0.469000 0.000000 0.469000 ( 0.469000)
struct set 0.703000 0.000000 0.703000 ( 0.715000)
array set 0.484000 0.000000 0.484000 ( 0.504000)

Robert@Babelfish2 /cygdrive/c/TEMP
$ cat array-struct.rb
require ‘benchmark’

REP = 500_000

Foo = Struct.new :foo, :bar

data = “foo”
c1 = Foo.new data, data
c2 = [data, data]

Benchmark.bmbm 15 do |x|
x.report ‘struct init’ do
REP.times { Foo.new data, data }
end

x.report ‘array init’ do
REP.times { [data, data] }
end

x.report ‘struct get’ do
REP.times { c1.bar }
end

x.report ‘array get’ do
REP.times { c2[1] }
end

x.report ‘struct set’ do
REP.times { c1.bar = data }
end

x.report ‘array set’ do
REP.times { c2[1] = data }
end
end

Kind regards

robert

On 16.10.2007 22:00, Wayne M. wrote:

Mental G. wrote:

If type_arr[j] is a struct, then try type_arr[j].typedef_type
rather than type_arr[j][:typedef_type], and see if that makes
a difference.

Well, I tried it and, as expected, the syntax change made no difference.
The one with Struct’s still takes 27 seconds and doesn’t work correctly.
It’s not clear to me why it should be different than just using a
2-element array (shrug).

Did you actually also use the same block call semantics?

L546 type_arr.each do |x|
print(x[:typedef_type], “\t\t”, x[:typedef_name], “\n”)
R547 type_arr.each do |typedef_type, typedef_name|
print(typedef_type, “\t\t”, typedef_name, “\n”)

Make the latter

R547 type_arr.each do |x|
print(x.typedef_type, “\t\t”, x.typedef_name, “\n”)

Where x is your struct type.

robert

I’m not sure whether you are familiar with Benchmark#bmbm which does a
rehearsal - personally I rather not switch off GC since in realistic
situations GC time belongs into the mix. But results are rather
similar:
Sure. GC does belong to the mix. Now, if GC is enabled you are not able
to
compare anything. Let’s assume that GC runs during ‘array init’, you
will
say ‘hey, struct init is faster’. Now, if GC runs during ‘struct init’
the
result may change …

Keeping GC is meaningful when benchmarking a whole application. In
microbenchmarks like these, it is simply noise.

Sylvain J. wrote:

microbenchmarks like these, it is simply noise.
For fairness, you could do it this way (assuming REP is large enough):


Benchmark.bmbm 15 do |x|
GC.start
x.report ‘struct init’ do
REP.times { Foo.new data, data }
GC.start
end

GC.start
x.report ‘array init’ do
REP.times { [data, data] }
GC.start
end

2007/10/17, Joel VanderWerf [email protected]:

Sylvain J. wrote:

I’m not sure whether you are familiar with Benchmark#bmbm which does a
rehearsal - personally I rather not switch off GC since in realistic
situations GC time belongs into the mix. But results are rather
similar:
Sure. GC does belong to the mix. Now, if GC is enabled you are not able to
compare anything. Let’s assume that GC runs during ‘array init’, you will
say ‘hey, struct init is faster’. Now, if GC runs during ‘struct init’ the
result may change …

But you have a lot invocations and it’s highly unlikely that GC runs
during every init. On the other hand, if you allocate a lot of memory
from OS that may slow things down. And this will happen without GC.

Also, IMHO cost of memory GC overhead is also part of the runtime
performance. If you compare two approaches and one allocates a lot
more memory than the other, then GC times belong in the measurement
because in a real application that approach will also lead to
increased GC overhead.

 GC.start

end

GC.start
x.report ‘array init’ do
REP.times { [data, data] }
GC.start
end

GC.start is not guaranteed to actually run GC AFAIK. I’d probably
rather do

Benchmark.bmbm do
GC.stop

tests

GC.start
end

There is a chance that GC.start will clean up all the memory allocated
during the first run.

Kind regards

robert

Robert K. wrote:

Make the latter

R547 type_arr.each do |x|
print(x.typedef_type, “\t\t”, x.typedef_name, “\n”)

Where x is your struct type.

Yes, that’s exactly what I did, which was no different. You wouldn’t
expect it to be different, would you? My understanding is that the two
are exactly the same, just different syntax, right?

Anyway, the clue to the problem is that it doesn’t work the same way.
I’m on a trip in another state right now and can’t work on it, but in a
few weeks if I have time I’ll run it with a debugger and maybe I’ll
understand why using Struct would be different from using a 2-element
array.

The difference file shows exactly what I did. It’s correct Ruby, isn’t
it? If there’s something I did wrong, I don’t see it.

Just wanted to follow-up so people don’t think there might be problems
with the Struct class. It isn’t actually slow at all. The problem was
just that an inner loop was not converted from an array and so didn’t
work and used a lot of CPU resources. Once fixed, I can’t tell any
difference between the two versions.

It was this:

    type_arr.each do |td_type, td_name|
      if t == td_name then
        primary_t = td_type
        break
      end
    end

needed to be converted to this:

    type_arr.each do |x|
      if t == x.typedef_name then
        primary_t = x.typedef_type
        break
      end
    end

I used a different naming scheme in that loop (td_name instead of
typedef_name) which caused me not to find it when doing a search for all
places that needed to be converted to a struct.

This is one example where strong typing could have made a difference.
Had the array been declared as an array of arrays or array of structs,
perhaps another language could have detected the error. I’m not
advocating for strongly typed languages, I’m just saying that some kinds
of errors could be harder to find in Ruby. What do you folks think?

On 10/17/07, Robert K. [email protected] wrote:

But you have a lot invocations and it’s highly unlikely that GC runs
during every init. On the other hand, if you allocate a lot of memory
from OS that may slow things down. And this will happen without GC.

Also, IMHO cost of memory GC overhead is also part of the runtime
performance. If you compare two approaches and one allocates a lot
more memory than the other, then GC times belong in the measurement
because in a real application that approach will also lead to
increased GC overhead.

Spot on, IMHO


Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/

On 10/24/07, Wayne M. [email protected] wrote:

Just wanted to follow-up so people don’t think there might be problems
with the Struct class. It isn’t actually slow at all.
Actually I did not think so :wink: but others might, kudos for telling us!!
Cheers
Robert

On 10/24/07, Rick DeNatale [email protected] wrote:

This would allow you to first concentrate on making the code do the
right thing before trying to optimize it.

I usually find that it really doesn’t matter how fast things run if
they produce the wrong results.

Funny you say that Rick, because I was thinking about the role testing
should have in this.
I kind of ponder of writing two test suits for my projects, one which
clearly is BDD (shall be) and one that kind of replaces very defensive
programming style full of assertions.
Now this is indeed a dangerous road because one might fall
into the trap of implementing a straight jacket, killing the duck :wink:
etc.

I still believe however that such “microtests” can be a valuable tool
and might be
more useful to find some errors than debuggers, depending on the type
of error of course.
Any thoughts, experiences about/with that approach?

Cheers
Robert

On 10/24/07, Wayne M. [email protected] wrote:

        primary_t = td_type
      end

of errors could be harder to find in Ruby. What do you folks think?
What would have made a bigger difference would be to develop that code
in the contest of specifications written as tests.

This would allow you to first concentrate on making the code do the
right thing before trying to optimize it.

I usually find that it really doesn’t matter how fast things run if
they produce the wrong results.


Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/

On Oct 24, 2007, at 3:26 PM, Rick DeNatale wrote:

  1. I’m not sure what you mean by the second type of test suite. I
    don’t think you’re talking about putting test assertions into the
    code. If you’re not I’m not sure I see the real distinction

I interpreted the two types of testing as follows.

Assume you have a method that expects a single non-negative integer
argument. You’ll probably want to test that it behaves correctly
for the following:
foo(0) # edge case
foo(1) # typical call
foo(36) # random case from all values
foo(100000000) # large magnitude
foo(1023) # testing near powers of two
foo(1024)
foo(1025)

These tests will be designed to make sure that foo works correctly
on expected input.

Then there is the question about whether you should test:
foo(-1)
foo(nil)
foo(“bogus”)
foo(“too”, “many”, :arguments)
foo()

and so on. These tests are throwing unexpected input at the
method. Bertrand Meyer in his writings about design by contract
programming would say that foo isn’t obligated to do anything at
all when called with arguments that are outside its ‘contract’.
If foo blows up on that input it isn’t the failure of ‘foo’ but
instead is the failure of the caller to adhere to the contract.

So the philosophical question is should you extend foo’s contract
to say that it will raise a particular exception (ArgumentError),
add the code to foo, and keep the tests or instead should you
decide that the tests are superfluous and instead focus on
better tests for the code that eventually calls foo (i.e. make
sure your code never calls foo incorrectly).

Gary W.

On 10/24/07, Robert D. [email protected] wrote:

On 10/24/07, Rick DeNatale [email protected] wrote:
Paraphrasing, you should test before you optimize, and use tdd/bdd.

I still believe however that such “microtests” can be a valuable tool
and might be
more useful to find some errors than debuggers, depending on the type
of error of course.
Any thoughts, experiences about/with that approach?

I should, and probably will eventually, answer this more fully on my
blog, but off the top of my head

  1. I’m not sure what you mean by the second type of test suite. I
    don’t think you’re talking about putting test assertions into the
    code. If you’re not I’m not sure I see the real distinction

  2. That said, I firmly believe that tests belong outside of the code
    under test. We talked about this some time ago when Trans asked for
    opinions of the inplace tests in facets. I distinguish between
    instrumentation and test equipment. Instrumentation is writing code
    to be testable, but the ‘bulky’ test equipment doesn’t weigh down the
    F1 car being tested.

  3. The chicken typing, I recently wrote about (and I like the term the
    more I use it) is just that, weighing down the car unnecessarily. If
    the code will fail it’s going to fail with or without the chicken
    tests, and usually in a way that the chicken tests fail to anticipate.
    Furthermore, introducing the chicken tests introduce additional ways
    for the code to fail unnecessarily when the chicken tests report false
    positives.

  4. Getting back to the distinction between two types of test suite.
    Personally, the attitude and approach to testing is much more
    important than Sapir-Worff considerations, syntax, and whether you
    call it TDD or BDD or call them tests or specs. IMHO BDD is a sect of
    the main test-first religion, and things like RSpec are kind of like
    fancy exercise equipment. If it helps you acquire the proper attitude
    towards test first design that’s great. To my mind, when I’ve looked
    at the bdd tools, although the have some nice ancillary features, I’ve
    sensed that they are a little too intrusive on the code under test for
    my taste, so I’d rather just use ‘free weights’ such as Test::Unit and
    discipline.

Just like exercise equipment, neither does much good if you just use
it for a clothes rack!


Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/

On 10/24/07, Rick DeNatale [email protected] wrote:

clearly is BDD (shall be) and one that kind of replaces very defensive
I should, and probably will eventually, answer this more fully on my
blog,
looking forward to it
but off the top of my head

  1. I’m not sure what you mean by the second type of test suite. I
    don’t think you’re talking about putting test assertions into the
    code. If you’re not I’m not sure I see the real distinction
    Well better give an example, let us assume I want to write a package
    that does err, what, no idea, just translation, than I would do
    TDD/BDD somehow like follows, sorry I am not fluent in rspec but I
    will just use something similar.

“Vive la France”.shall.translate.to “Viva Italia”
“Je ne suis pas disponible”.shall.translate.to “Gone fishing”

Now that is normally considered enough. But for my personal comfort
and anticipating problems and bugs inside my great plan of
implementation - experience speaking ;). I will add some tests very
closely related to the implementation. You see, something totally
opposed to the BDD thing.
E.g.

assert_kind_of? Enumeration, my_thingy.instance_variable_get(“@words”)
but better would be duck type tests
assert_respond_to? :each, my_thing…
assert [:each, :map, :size] <= my_thingy.ivarget(“@words”).methods

The trick would be to identify some few strategic test points to get
alarmed about any
misconception or coding error.

  1. That said, I firmly believe that tests belong outside of the code
    under test.
    I believe it too but less firmely.
    We talked about this some time ago when Trans asked for
    opinions of the inplace tests in facets. I distinguish between
    instrumentation and test equipment. Instrumentation is writing code
    to be testable, but the ‘bulky’ test equipment doesn’t weigh down the
    F1 car being tested.
    True but, I find Trans’ tests rather instructive and he does not
    really want a F1 car, but generally I agree with you. Libraries like
    Factes and Labrador are somehow different because they are so
    reflective and code pretty well describes what their code does.

  2. The chicken typing, I recently wrote about (and I like the term the
    more I use it) is just that, weighing down the car unnecessarily. If
    the code will fail it’s going to fail with or without the chicken
    tests, and usually in a way that the chicken tests fail to anticipate.
    Here again I am with you, even instrumentation is a pain I guess and
    that is why these microtests would be almost an exercise in
    metaprogramming, I guess you would know some Smalltalk stories about
    that.
    The idea would be to identify stratigic points where being chicken
    makes the duck more confident elswhere.
    But not sure at all if that might work!
    Furthermore, introducing the chicken tests introduce additional ways
    for the code to fail unnecessarily when the chicken tests report false
    positives.
    Here I am losing you, should we not test because of the fear of false
    positives?

  3. Getting back to the distinction between two types of test suite.
    Personally, the attitude and approach to testing is much more
    important than Sapir-Worff considerations, syntax, and whether you
    call it TDD or BDD or call them tests or specs. IMHO BDD is a sect of
    the main test-first religion, and things like RSpec are kind of like
    fancy exercise equipment. If it helps you acquire the proper attitude
    towards test first design that’s great.
    I guess it does for me
    To my mind, when I’ve looked
    at the bdd tools, although the have some nice ancillary features, I’ve
    sensed that they are a little too intrusive
    that was true when I tested Rspec about 1.5 years ago, is it still?
    on the code under test for
    my taste, so I’d rather just use ‘free weights’ such as Test::Unit and
    discipline.

Just like exercise equipment, neither does much good if you just use
it for a clothes rack!
Err how did you know? :wink:
R.