Forum: Ruby building ruby for speed: wise or otherwise?

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
457cf540784a12ba2f30e06565a2c189?d=identicon&s=25 hgs (Guest)
on 2005-11-28 17:06
(Received via mailing list)
My active record based script is taking longer than I'd like.
While I wait for approval to get a faster machine :-) I'm wondering
about rebuilding ruby 1.8.2 (which I have now) and changing the
CFLAGS from the default
CFLAGS=-g -O2
to
CFLAGS=-O3
or something of the sort.  I'm presently using gcc-3.4.3 on
Solaris9.  Has anyone done this and if so is there anything I should
watch out for?  ISTR reported problems when building other packages
with high -O values in the past.

Would the answer be different for gcc-4.0.2?

        Thank you,
        Hugh
47b1910084592eb77a032bc7d8d1a84e?d=identicon&s=25 vjoel (Guest)
on 2005-11-28 17:14
(Received via mailing list)
Hugh Sasse wrote:
> with high -O values in the past.
>
> Would the answer be different for gcc-4.0.2?

If gcc-4.x is an option, try it. Anecdotally, it's substantially faster
than 3.x. In fact, it's one less reason for me to use msvc on windows:
code compiled with gcc-4.0 (with -O2) turns out to be faster than msvc
for some numerically intensive simulation code running as a ruby
extension, whereas msvc was faster than gcc-3.x output code. YMMV.
F3b7b8756d0c7f71cc7460cc33aefaee?d=identicon&s=25 Daniel.Berger (Guest)
on 2005-11-28 17:18
(Received via mailing list)
Joel VanderWerf wrote:
>>Solaris9.  Has anyone done this and if so is there anything I should
> extension, whereas msvc was faster than gcc-3.x output code. YMMV.
>

Just out of curiosity, what options did you pass to cl when using MSVC?

- Dan
457cf540784a12ba2f30e06565a2c189?d=identicon&s=25 hgs (Guest)
on 2005-11-28 17:18
(Received via mailing list)
On Tue, 29 Nov 2005, Joel VanderWerf wrote:

> > watch out for?  ISTR reported problems when building other packages
> > with high -O values in the past.
> >
> > Would the answer be different for gcc-4.0.2?
>
> If gcc-4.x is an option, try it. Anecdotally, it's substantially faster

Yes, I'm unpacking the tarball now, as x.0.2 is sufficiently tried
and tested to be worth looking at, for my situation.

> than 3.x. In fact, it's one less reason for me to use msvc on windows:
> code compiled with gcc-4.0 (with -O2) turns out to be faster than msvc
> for some numerically intensive simulation code running as a ruby
> extension, whereas msvc was faster than gcc-3.x output code. YMMV.

thanks for that.  Still curious about dropping -g and bumping up to
-03 though... :-)
>
        Hugh
5befe95e6648daec3dd5728cd36602d0?d=identicon&s=25 bob.news (Guest)
on 2005-11-28 17:26
(Received via mailing list)
Joel VanderWerf wrote:
>> watch out for?  ISTR reported problems when building other packages
>> with high -O values in the past.
>>
>> Would the answer be different for gcc-4.0.2?
>
> If gcc-4.x is an option, try it. Anecdotally, it's substantially
> faster than 3.x. In fact, it's one less reason for me to use msvc on
> windows: code compiled with gcc-4.0 (with -O2) turns out to be faster
> than msvc for some numerically intensive simulation code running as a
> ruby extension, whereas msvc was faster than gcc-3.x output code.
> YMMV.

If Hugh is using ActiveRecord intensively with a database then it's most
likely that he'll see no positiv performance effect from compiling it
with
more aggressive optimization.

In fact it's likely that careful optimization on the database side will
yield better results.  This can be as easy as creating some indexes -
but
might be much more complicated - depending on the bottleneck.  (Often
it's
IO and this might have several reasons, from sub optimal execution plans
to slow disks / controllers.)

Kind regards

    robert
47b1910084592eb77a032bc7d8d1a84e?d=identicon&s=25 vjoel (Guest)
on 2005-11-28 17:34
(Received via mailing list)
Daniel Berger wrote:
> Joel VanderWerf wrote:
...
>> If gcc-4.x is an option, try it. Anecdotally, it's substantially faster
>> than 3.x. In fact, it's one less reason for me to use msvc on windows:
>> code compiled with gcc-4.0 (with -O2) turns out to be faster than msvc
>> for some numerically intensive simulation code running as a ruby
>> extension, whereas msvc was faster than gcc-3.x output code. YMMV.
>>
>
> Just out of curiosity, what options did you pass to cl when using MSVC?

The default flags generated by mkmf.rb:

CC = cl -nologo
CFLAGS   =  -MD -Zi -O2b2xg- -G6
CPPFLAGS = -I. -I$(topdir) -I$(hdrdir) -I$(srcdir)  -I. -I./..
-I./../missing

.c.obj:
	$(CC) $(CFLAGS) $(CPPFLAGS) -c -Tc$(<:\=/)

I've never played around with the optimization flags for msvc (partly
because msvc always seemed so much faster than gcc).
457cf540784a12ba2f30e06565a2c189?d=identicon&s=25 hgs (Guest)
on 2005-11-28 17:43
(Received via mailing list)
On Tue, 29 Nov 2005, Robert Klemme wrote:

> If Hugh is using ActiveRecord intensively with a database then it's most
> likely that he'll see no positiv performance effect from compiling it with
> more aggressive optimization.
>
> In fact it's likely that careful optimization on the database side will
> yield better results.  This can be as easy as creating some indexes - but
> might be much more complicated - depending on the bottleneck.  (Often it's
> IO and this might have several reasons, from sub optimal execution plans
> to slow disks / controllers.)
>
At the moment my script to populate the tables is taking about an
hour.  Anyway it's mostly ruby I think, because it spends most of
the time setting up the arrays before it populates the db with them.

Besides that, I'm fairly new to database work, so I'm trying to
optimize what I know about before I start fiddling with the db.

Slow disks/controllers (+ lots of users) could be a factor, the
machine is 5.5 years old.

But those are good points. Thank you.

> Kind regards
>
>     robert
>
>
        Hugh
956f185be9eac1760a2a54e287c4c844?d=identicon&s=25 decoux (Guest)
on 2005-11-28 17:59
(Received via mailing list)
>>>>> "H" == Hugh Sasse <hgs@dmu.ac.uk> writes:

H> thanks for that.  Still curious about dropping -g and bumping up to
H> -03 though... :-)

 You can try it, but don't forget this

moulon% CC="gcc -fomit-frame-pointer" ./configure  > /dev/null 2>&1
moulon% make > /dev/null
re.c: In function 'rb_memsearch':
re.c:121: warning: pointer targets in passing argument 1 of
'rb_memcicmp' differ in signedness
re.c:121: warning: pointer targets in passing argument 2 of
'rb_memcicmp' differ in signedness
re.c:129: warning: pointer targets in passing argument 1 of
'rb_memcicmp' differ in signedness
re.c:129: warning: pointer targets in passing argument 2 of
'rb_memcicmp' differ in signedness
regex.c: In function 'calculate_must_string':
regex.c:1014: warning: pointer targets in initialization differ in
signedness
regex.c:1015: warning: pointer targets in initialization differ in
signedness
regex.c:1029: warning: pointer targets in assignment differ in
signedness
regex.c: In function 'ruby_re_search':
regex.c:3222: warning: pointer targets in passing argument 1 of
'slow_search' differ in signedness
regex.c:3222: warning: pointer targets in passing argument 3 of
'slow_search' differ in signedness
regex.c:3222: warning: pointer targets in passing argument 5 of
'slow_search' differ in signedness
regex.c:2689: warning: pointer targets in passing argument 5 of
'slow_match' differ in signedness
regex.c:3227: warning: pointer targets in passing argument 1 of
'bm_search' differ in signedness
regex.c:3227: warning: pointer targets in passing argument 3 of
'bm_search' differ in signedness
string.c: In function 'rb_str_index_m':
string.c:1133: warning: pointer targets in initialization differ in
signedness
string.c: In function 'rb_str_rindex_m':
string.c:1255: warning: pointer targets in initialization differ in
signedness
string.c:1256: warning: pointer targets in initialization differ in
signedness
./lib/fileutils.rb:1257: [BUG] Segmentation fault
ruby 1.8.4 (2005-10-29) [i686-linux]

make: *** [.rbconfig.time] Aborted
moulon%




Guy Decoux
457cf540784a12ba2f30e06565a2c189?d=identicon&s=25 hgs (Guest)
on 2005-11-28 18:07
(Received via mailing list)
On Tue, 29 Nov 2005, ts wrote:

> re.c:121: warning: pointer targets in passing argument 1 of 'rb_memcicmp' differ in 
signedness
[ half a kilo of warnings :-) ]
> string.c:1256: warning: pointer targets in initialization differ in signedness
> ./lib/fileutils.rb:1257: [BUG] Segmentation fault
> ruby 1.8.4 (2005-10-29) [i686-linux]
>
> make: *** [.rbconfig.time] Aborted
> moulon%
>

Ah, not a good idea.  But dropping -g ought to speed things up a
little, I'd hope.
>
> Guy Decoux
>
>
        Thank you,
        Hugh
D83785463666ae9bb98a0753eebc8950?d=identicon&s=25 nightphotos (Guest)
on 2005-11-28 18:07
(Received via mailing list)
Hi Joel,

> code compiled with gcc-4.0 (with -O2) turns out to be faster than msvc
> for some numerically intensive simulation code running as a ruby
> extension, whereas msvc was faster than gcc-3.x output code. YMMV.

Was this with MSVC 7.1 or 8.0?

Thanks,

Wayne Vucenic
No Bugs Software
"Ruby and C++ Agile Contract Programming in Silicon Valley"
956f185be9eac1760a2a54e287c4c844?d=identicon&s=25 decoux (Guest)
on 2005-11-28 18:07
(Received via mailing list)
>>>>> "H" == Hugh Sasse <hgs@dmu.ac.uk> writes:

H>         [ half a kilo of warnings :-) ]

 gcc 4.0.2, the warning were for matz :-)

H> Ah, not a good idea.  But dropping -g ought to speed things up a
H> little, I'd hope.

 If you drop -g, it can add -fomit-frame-pointer with -O2 :-)

Guy Decoux
457cf540784a12ba2f30e06565a2c189?d=identicon&s=25 hgs (Guest)
on 2005-11-28 18:11
(Received via mailing list)
On Tue, 29 Nov 2005, ts wrote:

> >>>>> "H" == Hugh Sasse <hgs@dmu.ac.uk> writes:
>
> H>         [ half a kilo of warnings :-) ]
>
>  gcc 4.0.2, the warning were for matz :-)
>
> H> Ah, not a good idea.  But dropping -g ought to speed things up a
> H> little, I'd hope.
>
>  If you drop -g, it can add -fomit-frame-pointer with -O2 :-)

Good job I asked! :-) Thank you.  I'll carry on with my build of new
[binutils, bison, gcc] then.
>
> Guy Decoux
>
        Hugh
7264fb16beeea92b89bb42023738259d?d=identicon&s=25 chneukirchen (Guest)
on 2005-11-28 19:00
(Received via mailing list)
Hugh Sasse <hgs@dmu.ac.uk> writes:

>> to slow disks / controllers.)
>>
> At the moment my script to populate the tables is taking about an
> hour.  Anyway it's mostly ruby I think, because it spends most of
> the time setting up the arrays before it populates the db with them.

Do you use transactions correctly?
457cf540784a12ba2f30e06565a2c189?d=identicon&s=25 hgs (Guest)
on 2005-11-28 19:08
(Received via mailing list)
On Tue, 29 Nov 2005, Christian Neukirchen wrote:

> Hugh Sasse <hgs@dmu.ac.uk> writes:
>
> > On Tue, 29 Nov 2005, Robert Klemme wrote:
> >
        [...]
> >> In fact it's likely that careful optimization on the database side will
> >> yield better results.  This can be as easy as creating some indexes - but
        [...]
> > At the moment my script to populate the tables is taking about an
> > hour.  Anyway it's mostly ruby I think, because it spends most of
> > the time setting up the arrays before it populates the db with them.
>
> Do you use transactions correctly?

I've only got one process accessing the db at the moment.  If you've
got pointers on common errors, misconceptions, etc I'd be glad to
learn about them.   So I suppose that answer is likely to be "no"
:-)
>
> > Besides that, I'm fairly new to database work, so I'm trying to
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > optimize what I know about before I start fiddling with the db.
> >

        Thank you,
        Hugh
7264fb16beeea92b89bb42023738259d?d=identicon&s=25 chneukirchen (Guest)
on 2005-11-28 20:57
(Received via mailing list)
Hugh Sasse <hgs@dmu.ac.uk> writes:

> learn about them.   So I suppose that answer is likely to be "no"
> :-)

If you need to INSERT bigger chunks of data, put it in an transaction
so it will write the data to disk only once.  If you need to insert
even bigger amounts of data, using COPY (usage depends on your
database) can speed things up a lot too.  It may be helpful to add
indexes after importing the data.
457cf540784a12ba2f30e06565a2c189?d=identicon&s=25 hgs (Guest)
on 2005-11-28 21:29
(Received via mailing list)
On Tue, 29 Nov 2005, Christian Neukirchen wrote:

> > I've only got one process accessing the db at the moment.  If you've
> > got pointers on common errors, misconceptions, etc I'd be glad to
> > learn about them.   So I suppose that answer is likely to be "no"
> > :-)
>
> If you need to INSERT bigger chunks of data, put it in an transaction
> so it will write the data to disk only once.  If you need to insert
> even bigger amounts of data, using COPY (usage depends on your
> database) can speed things up a lot too.  It may be helpful to add
> indexes after importing the data.

OK, thanks.
>
        Hugh
47fb897113ebddae565f2f4b9f45ca1d?d=identicon&s=25 Jonathan <zjll9@imail.etsu.edu> <zjll9@imail.etsu. (Guest)
on 2005-11-28 22:13
hgs wrote:
> On Tue, 29 Nov 2005, Christian Neukirchen wrote:
>
>> > I've only got one process accessing the db at the moment.  If you've
>> > got pointers on common errors, misconceptions, etc I'd be glad to
>> > learn about them.   So I suppose that answer is likely to be "no"
>> > :-)
>>
>> If you need to INSERT bigger chunks of data, put it in an transaction
>> so it will write the data to disk only once.  If you need to insert
>> even bigger amounts of data, using COPY (usage depends on your
>> database) can speed things up a lot too.  It may be helpful to add
>> indexes after importing the data.
>
> OK, thanks.
>>
>         Hugh

Temporarily disabling constraints (UNIQUE and FOREIGN_KEY) will improve
performance too if you are sure the data is safe.
47b1910084592eb77a032bc7d8d1a84e?d=identicon&s=25 vjoel (Guest)
on 2005-11-29 05:42
(Received via mailing list)
Wayne Vucenic wrote:
> Hi Joel,
>
>
>>code compiled with gcc-4.0 (with -O2) turns out to be faster than msvc
>>for some numerically intensive simulation code running as a ruby
>>extension, whereas msvc was faster than gcc-3.x output code. YMMV.
>
>
> Was this with MSVC 7.1 or 8.0?

I'm embarrassed to say it was 6.0. I have 8.0 (express) but can't get
past the "MSVCR80.DLL missing" problem, at least with the
mkmf.rb-generated Makefile. (For regular projects in MSVC 8.0, you can
get around this problem by deleting the foobar.exe.embed.manifest.res
file from the Debug dir of project foobar.) Anyone have any ideas?

I'm using the single-click installer ruby, which IIRC is compiled with
7.1. Maybe it's not a fair comparison with gcc-built ruby, since that
will take advatage of i686 vs. i386. So, not a very scientific
comparison at all--it would best to use the latest MS compiler, build
ruby from scratch, and make sure to use the same arch settings as for
gcc.

I'm just glad to see that gcc is so much better than it was.
5befe95e6648daec3dd5728cd36602d0?d=identicon&s=25 bob.news (Guest)
on 2005-11-29 09:24
(Received via mailing list)
Hugh Sasse wrote:
>> from sub optimal execution plans to slow disks / controllers.)
>>
> At the moment my script to populate the tables is taking about an
> hour.  Anyway it's mostly ruby I think, because it spends most of
> the time setting up the arrays before it populates the db with them.

How did you measure that?

> Besides that, I'm fairly new to database work, so I'm trying to
> optimize what I know about before I start fiddling with the db.

Um, although I can understand your waryness with regard to the unknown -
you may completely waste your time.  IMHO you should first determine the
cause of the slowness and then find a solution.  If you optimize
something
that just takes 10% of the whole running time you'll never seen an
improvement of more than 10%...

Another option to get masses of data into a database is to use some form
of bulk insert / bulk load.  Depending on your database there are
probably
several options.

> Slow disks/controllers (+ lots of users) could be a factor, the
> machine is 5.5 years old.

Could be.  If possible spend it at least more mem.

Kind regards

    robert
457cf540784a12ba2f30e06565a2c189?d=identicon&s=25 hgs (Guest)
on 2005-11-29 11:13
(Received via mailing list)
On Tue, 29 Nov 2005, Robert Klemme wrote:

> >> bottleneck.  (Often it's IO and this might have several reasons,
> >> from sub optimal execution plans to slow disks / controllers.)
> >>
> > At the moment my script to populate the tables is taking about an
> > hour.  Anyway it's mostly ruby I think, because it spends most of
> > the time setting up the arrays before it populates the db with them.
>
> How did you measure that?

By eye! :-)  The code doesn't access the database at all until the last
part, and it doesn't get there till about 45 mins.  But to be
honest, this is so slow it isn't worth benchmarking to get the
milliseconds.
     555    1676   17179 /home/hgs/csestore_meta/populate_tables2.rb
I could post the script if you like. I've not profiled it to find
out where the slow bits are because it would take about 5 hours
going by previous slowdowns when profiling.
>
> > Besides that, I'm fairly new to database work, so I'm trying to
> > optimize what I know about before I start fiddling with the db.
>
> Um, although I can understand your waryness with regard to the unknown -
> you may completely waste your time.  IMHO you should first determine the
> cause of the slowness and then find a solution.  If you optimize something
> that just takes 10% of the whole running time you'll never seen an
> improvement of more than 10%...

this is true.  This script isn't run very often though, so if it
takes an hour I can live with it. Taking 5 hours to profile it once
would
be too long.
>
> Another option to get masses of data into a database is to use some form
> of bulk insert / bulk load.  Depending on your database there are probably
> several options.

MySQL.  Part of the problem is that this script is also for
updating, based on new data.  If the db is empty it just inserts,
else it updates.  Easy enought in ActiveRecord.
>
> > Slow disks/controllers (+ lots of users) could be a factor, the
> > machine is 5.5 years old.
>
> Could be.  If possible spend it at least more mem.

Thanks.  I'm trying to get something done about that.
>
> Kind regards
>
>     robert
>
        Thank you,
        Hugh
5befe95e6648daec3dd5728cd36602d0?d=identicon&s=25 bob.news (Guest)
on 2005-11-29 14:16
(Received via mailing list)
Hugh Sasse wrote:
>>>> will yield better results.  This can be as easy as creating some
> By eye! :-)  The code doesn't access the database at all until the
> last part, and it doesn't get there till about 45 mins.  But to be
> honest, this is so slow it isn't worth benchmarking to get the
> milliseconds.

Wow!  In that case it certainly seems to make sense to optimize that.
Did
you keep an eye on memory consumption and disk IO?  Could well be that
the
sheer amount of data (and thus memory) slows your script down.

>      555    1676   17179 /home/hgs/csestore_meta/populate_tables2.rb
> I could post the script if you like. I've not profiled it to find
> out where the slow bits are because it would take about 5 hours
> going by previous slowdowns when profiling.

Unfortunately we're close to release and I don't really have much time
to
look into this deeper.  If anyone else volunteers...

> MySQL.  Part of the problem is that this script is also for
> updating, based on new data.  If the db is empty it just inserts,
> else it updates.  Easy enought in ActiveRecord.

Ok, bad for bulk loading.

Kind regards

    robert
457cf540784a12ba2f30e06565a2c189?d=identicon&s=25 hgs (Guest)
on 2005-11-29 14:24
(Received via mailing list)
On Tue, 29 Nov 2005, Robert Klemme wrote:

> > By eye! :-)  The code doesn't access the database at all until the
> > last part, and it doesn't get there till about 45 mins.  But to be
> > honest, this is so slow it isn't worth benchmarking to get the
> > milliseconds.
>
> Wow!  In that case it certainly seems to make sense to optimize that.  Did
> you keep an eye on memory consumption and disk IO?  Could well be that the

Not really.  The machine is 5.5 years old.  Rough Moore's law calc
shows it would take about 5 mins on a modern machine.

> sheer amount of data (and thus memory) slows your script down.
>
> >      555    1676   17179 /home/hgs/csestore_meta/populate_tables2.rb
> > I could post the script if you like. I've not profiled it to find
> > out where the slow bits are because it would take about 5 hours
> > going by previous slowdowns when profiling.
>
> Unfortunately we're close to release and I don't really have much time to
> look into this deeper.  If anyone else volunteers...

I'm profiling it now, while I wait for GCC-4.0.2 to cook.  Already
been snookered by needing gmp and mpfr for the fortran, then I found
I need autogen to build without the fortran, which needs guile, so
I've got those last two.  [cf "The Gas Man Cometh"]
>
> > MySQL.  Part of the problem is that this script is also for
> > updating, based on new data.  If the db is empty it just inserts,
> > else it updates.  Easy enought in ActiveRecord.
>
> Ok, bad for bulk loading.

I think so.
>
> Kind regards
>
>     robert

        Thank you,
        Hugh
F0223b1193ecc3a935ce41a1edd72e42?d=identicon&s=25 zdennis (Guest)
on 2005-11-29 20:08
(Received via mailing list)
Hugh Sasse wrote:
>>IO and this might have several reasons, from sub optimal execution plans
>>to slow disks / controllers.)
>>
>
> At the moment my script to populate the tables is taking about an
> hour.  Anyway it's mostly ruby I think, because it spends most of
> the time setting up the arrays before it populates the db with them.
>

I had similar problems with ActiveRecord and large datasets. Its slow. I
wrote a active record
extension (haven't released yet, I am trying to figure out how-to best
release....as plugin, or as
patch to rubyonrails dev team). It makes large dataset entry 10 times
faster. If you like I can
email you privately the code and see if it helps you.

Zach
457cf540784a12ba2f30e06565a2c189?d=identicon&s=25 hgs (Guest)
on 2005-11-29 20:16
(Received via mailing list)
On Wed, 30 Nov 2005, zdennis wrote:

> privately the code and see if it helps you.
How big is it?  I wondering how much there is to learn since I'm
still getting to grips with all this Rails stuff :-).  I am
interested though.  Thank you.
>
> Zach
>
        Hugh
3fd17cac06aeb4d0b087b1b0c7a94c73?d=identicon&s=25 Eric Christensen (Guest)
on 2005-11-30 05:54
vjoel wrote:
>
> I'm embarrassed to say it was 6.0. I have 8.0 (express) but can't get
> past the "MSVCR80.DLL missing" problem, at least with the
> mkmf.rb-generated Makefile. (For regular projects in MSVC 8.0, you can
> get around this problem by deleting the foobar.exe.embed.manifest.res
> file from the Debug dir of project foobar.) Anyone have any ideas?
>
> I'm using the single-click installer ruby, which IIRC is compiled with
> 7.1. Maybe it's not a fair comparison with gcc-built ruby, since that
> will take advatage of i686 vs. i386. So, not a very scientific
> comparison at all--it would best to use the latest MS compiler, build
> ruby from scratch, and make sure to use the same arch settings as for
> gcc.
>
> I'm just glad to see that gcc is so much better than it was.

I'd be very interested in seeing it compiled with the just-released VC
using LTCG (link-time code generation). It can make inter-module
optimizations and adjust calling conventions on a case-by-case basis.
47b1910084592eb77a032bc7d8d1a84e?d=identicon&s=25 vjoel (Guest)
on 2005-11-30 06:14
(Received via mailing list)
Eric Christensen wrote:
>>will take advatage of i686 vs. i386. So, not a very scientific
>
I'd love to try it. Any idea how to hack the Makefiles generated by
mkmf.rb to fix the "MSVCR80.DLL missing" problem? Can you compile any
ruby extension successfully with 8.0?

Do you know that LTCG is included in the Express version? ISTR that some
optimization features (maybe profile guided optimization) were not in
the express edition. However, the Property Page for Linker/Optimization
seems to allow setting LTCG and PGO.
457cf540784a12ba2f30e06565a2c189?d=identicon&s=25 hgs (Guest)
on 2005-11-30 17:31
(Received via mailing list)
On Tue, 29 Nov 2005, Robert Klemme wrote:

> Hugh Sasse wrote:
> >      555    1676   17179 /home/hgs/csestore_meta/populate_tables2.rb
> > I could post the script if you like. I've not profiled it to find
> > out where the slow bits are because it would take about 5 hours
> > going by previous slowdowns when profiling.

Well that was an interesting estimate. So far it has been 29 hours
to profile it...

        Hugh
3fd17cac06aeb4d0b087b1b0c7a94c73?d=identicon&s=25 Eric Christensen (Guest)
on 2005-11-30 20:33
vjoel wrote:
> I'd love to try it. Any idea how to hack the Makefiles generated by
> mkmf.rb to fix the "MSVCR80.DLL missing" problem? Can you compile any
> ruby extension successfully with 8.0?
>
> Do you know that LTCG is included in the Express version? ISTR that some
> optimization features (maybe profile guided optimization) were not in
> the express edition. However, the Property Page for Linker/Optimization
> seems to allow setting LTCG and PGO.

I'm *very* new to Ruby, so I haven't really tried anything yet. But I
just got a copy of VS 2005 Pro, so I'm itchin' to try. Now if I could
just find some spare time...

Do you know if there is a benchmark that would give an idea of the
relative speed of the interpreter over a representative workload, or
could perhaps be used as the scenario for tthe profile-guided
optimization?

One more question: is there a 64-bit Ruby?
6e32b16ec35070a346dd4e08799589e5?d=identicon&s=25 eule (Guest)
on 2005-12-02 12:11
(Received via mailing list)
(In response to news:3v329qF1337dkU1@individual.net by Robert Klemme)

> Unfortunately we're close to release and I don't really have much time to
> look into this deeper.  If anyone else volunteers...

Please Hugh, post the script or at least parts of it. I have been doing
lots of database filling recently and might be able to give you a few
pointers. Here's the general ones:

Profiling is often no use, Benchmarking might just be enough. Roughly
knowing where time goes to is a big help in guiding optimization.

Be sure to try running it with a small data set, to speed up the test-
change-test cycle (or whatever the cycle is called in languages that you
don't compile). Maybe even profile that way.

Do you have a generation stage followed by a fill stage ? Or is the
computation intermingled with database accesses ?

hope to be of help
k
457cf540784a12ba2f30e06565a2c189?d=identicon&s=25 hgs (Guest)
on 2005-12-02 13:36
(Received via mailing list)
On Fri, 2 Dec 2005, Kaspar Schiess wrote:

> (In response to news:3v329qF1337dkU1@individual.net by Robert Klemme)
>
> > Unfortunately we're close to release and I don't really have much time to
> > look into this deeper.  If anyone else volunteers...
>
> Please Hugh, post the script or at least parts of it. I have been doing
> lots of database filling recently and might be able to give you a few
> pointers. Here's the general ones:

I don't think it is particularly pretty.  TableMaker used to
generate SQL directly, now it uses AR instead, so the output file is
unused.  I've tried to clear out the other unused stuff that is no
use to you but I may still need.

        Thank you
        Hugh

#!/usr/local/bin/ruby -w

$: << '/home/hgs/aeg_intranet/csestore/app/models'

# require 'csv'
require 'set'
require 'open-uri'
require 'net/http'
require 'date'
require 'md5'
# require 'hashattr'
require 'fasthashequals'

require "rubygems"
require_gem "activerecord"  # for the ORM.

# Makes no sense to include these before active_record
# These are just (almost empty) models from rails.  There are some
# relationship definitions (has_a, etc) but that's about it.
require 'student'
require 'cse_module'
require 'device'

$debug = false


# Class for creating the database tables from the supplied input
class TableMaker
  attr_accessor :students, :cse_modules
  INPUT = "hugh.csv"
  OUTPUT = "populate_tables.sql"

  ACCEPTED_MODULES =
/^\"TECH(100[1-7]|200\d|201[01]|300\d|301[0-2])|MUST100[28]/

  STRFTIME_FORMAT = "%a, %d %b %Y %H:%M:%S GMT"

  PATH_TO_IMAGES = 'Z:\\new\\jpegs\\'

  # Read in the database and populate the tables.
  def initialize(input=INPUT, output=OUTPUT)
    begin
      puts "TableMaker.initialize (input=#{input.inspect},
output=#{output.inspect}"
      # check these agree
      # Struct.new( "Student", :forename, :surname, :birth_dt,
      #             :picture, :coll_status)
      # Struct.new("Ident", :student, :pnumber)

      # Struct.new("CourseModule", :aos_code, :dept_code,
      #            :aos_type, :full_desc)

      # Struct.new("StudentModule", :student_id, :course_module)

      @students = Set.new()
      @cse_modules = Set.new()
      @student_modules = Hash.new{Set.new()}
      # Most images will be written in bulk so cache them
      @web_timestamps = Hash.new()
      # Initialize variables
      forename, surname, birth_dt, pnumber, aos_code,
        acad_period, stage_ind, dept_code, stage_code, aos_type,
        picture, coll_status, full_desc = [nil] * 13

      student, cse_module, ident = nil, nil, nil
      record = nil

      last_pnumber, last_aos_code = nil, nil
      last_student, last_cse_module = nil, nil

      open(input, 'r') do |infp|
        while record = infp.gets
          # record.strip!
          puts "record is #{record}" if $debug
          # Don't split off the rest till we need it.
          # Hopefully splitting on strings is faster.
          forename, surname, birth_dt,
            pnumber, aos_code, the_rest =  record.split(/\s*\|\s*/,6)

          next unless aos_code =~ ACCEPTED_MODULES

          forename, surname, birth_dt, pnumber, aos_code,
            acad_period, stage_ind, dept_code, stage_code, aos_type,
            picture, coll_status, full_desc = record.split(/\s*\|\s*/)


          puts "from record, picture is [#{picture.inspect}]." if $debug

          if pnumber == last_pnumber
            student = last_student
            puts "pnumber set to last_pnumber" if $debug
          else
            # Structures for student
            student = Student.new(
                                  :forename => forename,
                                  :surname => surname,
                                  :birth_dt => birth_dt,
                                  :pnumber => pnumber,
                                  :picture => picture,
                                  :coll_status => coll_status
                                 )

                                 # Avoid duplicates
                                 # unless @students.include? student
                                 @students.add student
                                 # else
                                 # puts "Already seen #{student}" if
$debug
                                 # end
                                 last_pnumber = pnumber
                                 last_student = student

          end


          # Structures for module data.
          if aos_code == last_aos_code
            this_cse_module = last_cse_module
          else
            this_cse_module = CseModule.new(
                                            :aos_code => aos_code,
                                            :dept_code => dept_code,
                                            :aos_type => aos_type,
                                            :full_desc => full_desc
                                           )
          end

          # Avoid duplicates
          @cse_modules.add this_cse_module
          last_cse_module = this_cse_module
          @student_modules[student].add this_cse_module

          puts "cse_module is #{this_cse_module}" if $debug

        end
      end
    rescue
      puts "\n"
      puts $!
      puts $!.backtrace.join("\n")
    end
  end

  def has_student?(given_student)
    result =  @students.member?(given_student)
    puts "has_student?: @students.size is #{@students.size}, result is
#{result}"
    return result
  end

  def diff_students(other_table)
    diff_students = @students - other_table.students
    return Set.new(diff_students)
  end

  # The pnumber is a barcode that uniquely identifies a student.
  def has_pnumber?(apnumber)
    return @students.any? do |pn|
      pn == apnumber
    end
  end

  def new_pnumber(old_table)
    new_pnumbers = @pnumbers.reject do |pn|
      old_table.has_pnumber?(pn)
    end
    return Set.new(new_pnumbers)
  end

  # Convert the picture to a URI and get it, if necessary.
  # moved out of make_cards to shorten that function.
  def get_picture(pic_name)
    pic =  "#{pic_name}"
    pic.gsub!(/\"/,'')
    pic.gsub!(/ /, "%20")
    url = pic.dup
    puts "pic is #{pic.inspect}\nurl is #{url.inspect}" # if $debug
    pic.sub!(/^.*\//,'')
    puts "pic is now #{pic.inspect}" # if $debug
    if pic.empty?
      puts "No such picture " if $debug
    elsif pic =~ /^Z:\\/i
      puts "Already got this " if $debug
    else
      Dir.chdir("./images") do
        begin
          grab = true
          url =~ /^http:\/\/([^:\/]+):?([^\/]*?)(.*)/
          host, port, path = $1, $2, $3
          port = 80 if port.nil? or port.empty?
          puts "pic #{pic}:- host #{host} port #{port} path #{path} "
#if $debug
          Net::HTTP.start(host, port) do |http|
            header = http.head(path)
            lastmod = header['last-modified']
            # timestamp = DateTime.strptime(lastmod, STRFTIME_FORMAT)
            # timestamp = Time.new(DateTime.strptime(lastmod,
STRFTIME_FORMAT))
            lastmod ||=Time.now.to_s
            timestamp = (@web_timestamps[lastmod] ||=
Time.parse(lastmod))

            if File.exist?(pic)
              mtime = File.mtime(pic)
              puts "mtime #{mtime} timestamp #{timestamp}" if $debug
              if mtime > timestamp
                puts "file is newer, skip." if $debug
                grab = false
              end
            end
            if grab
              open(pic, "wb") do |image|
                image.print http.get(path).body
              end
            end
          end
        rescue => e
          puts e.inspect
          puts "\n"
          puts "#{$!}, #{e}"
          puts $!.backtrace().join("\n")
        end
      end
    end
    return  PATH_TO_IMAGES + pic + "\r\n"
  end


  # Output all the data necessary to create the id cards.
  def make_cards(output,the_students = @students)
    personal_fields = [:forename, :surname, :birth_dt, :pnumber]
    open(output, "w") do |outf|
      the_students.each do |student|
        puts "student:- #{student} :" if $debug
        outstring = personal_fields.collect do |message|
          # Remove unwanted quotation marks
          "#{student.send(message)}, ".gsub(/"/,'')
        end.join('')
        # We need to iterate in case a student has two ids
        # Not any more -- we know they will look like two students.
        # It doesn't matter.

        outstring += get_picture(student.picture)
        outf.print outstring
      end
    end
  end

  # Cannot update the database til the comparison is complete, so
  # this code must be moved into here
  def update_database
    @students.each do |student|
      puts "update_database(): pnumber is #{student.pnumber}"
      begin
        orig_student = Student.find(:first, :conditions => ["pnumber =
?",student.pnumber])
        puts "update_database(): orig_student.pnumber is
#{orig_student.pnumber}"
      rescue Exception => e
        puts "update_database(): exception is #{e}"
        puts "\n"
        puts $!
        puts $!.backtrace.join("\n")
        puts "\n"
        orig_student = nil
      end
      if orig_student.nil? # i.e. nothing found
        student.save!
      else
        orig_student.update_attributes(
                                       :surname => student.surname,
                                       :birth_dt => student.birth_dt,
                                       :picture => student.picture,
                                       :coll_status =>
student.coll_status
                                      )
      end
    end
    @cse_modules.each do |cse_module|
      orig_cse_module = CseModule.find(:first, :conditions => ['aos_code
= ?', cse_module.aos_code]) rescue nil
      if orig_cse_module.nil?
        cse_module.save!
      else
        orig_cse_module.update_attributes(
                                          :dept_code =>
cse_module.dept_code,
                                          :aos_type =>
cse_module.aos_type,
                                          :full_desc =>
cse_module.full_desc
                                         )
      end
    end
    # This next line should sort out the join table.
    @student_modules.each do |student, modules|
      the_student = Student.find(:first, :conditions => ['pnumber = ?',
student.pnumber])
      modules.each do |cse_module|
        the_cse_module = CseModule.find(:first, :conditons => ['aos_code
= ?', cse_module.aos_code])
        puts "update_database(): updating #{the_cse_module} with
#{the_student}"
        the_cse_module.students << the_student
      end
    end
  end
end

class KitTableMaker

  def initialize(input)
    # create outside the block for speed.
    name, serialno, barcode = [nil]*3
    @kit = Set.new()
    barcodes = Set.new()

    open(input, 'r') do |infp|
      while record = infp.gets
        name, serialno, barcode = record.split(/\s*,\s*/,3)
        if barcodes.member?(barcode)
          puts "Duplicate barcode #{barcode}"
        else
          device = Device.new(:description => name,
                              :serialno => serialno,
                              :barcode => barcode)
          barcodes.add(barcode)
          @kit.add device
        end
      end
    end
  end


  def update_database
    @kit.each do |device|
      begin
        orig_kit = Device.find(:first, :conditions => ["barcode = ?",
device.barcode])
      rescue Exception => e
        puts "Device::update_database: exception is #{e}"
        puts "\n", $!, $!.backtrace.join("\n"), "\n"
      end
      if orig_kit.nil?
        device.save!
      else
        begin
          orig_kit.update_attributes(:description => device.name,
                                     :serialno => device.serialno,
                                     :barcode => device.barcode)
        rescue Exception => e
          puts "Device::update_database: exception is #{e}"
          puts "\n", $!, $!.backtrace.join("\n"), "\n"
        end
      end
    end
  end
end


if __FILE__  == $0
  begin
    ActiveRecord::Base.establish_connection(
                                            :adapter => 'mysql',
                                            :host => 'localhost',
                                            :port => 3608,
                                            :database =>
'csestore_development',
                                            :username => 'hgs',
                                            :password =>
'post-it-to-ruby-talk?'
                                           )
                                           new_table =
TableMaker.new("hugh.csv", "update_tables.sql")
                                           new_table.update_database()
                                           old_table =
TableMaker.new("hugh.csv.old")

                                           new_table.make_cards("cards.out")
                                           new_table.make_cards("new_cards.out",
new_table.diff_students(old_table))
  rescue Exception => e
    puts "\n"
    puts "#{$!}, #{e}"
    puts $!.backtrace().join("\n")
  end

  device_table = KitTableMaker.new("stock1.csv")
  device_table.update_database()
end
6e32b16ec35070a346dd4e08799589e5?d=identicon&s=25 eule (Guest)
on 2005-12-04 15:45
(Received via mailing list)
Hello Hugh,

I'd propose modifying your main logic as like follows:
 require 'benchmark'
    include Benchmark

    puts measure { new_table = TableMaker.new("hugh.csv",
"update_tables.sql") }
    puts measure { new_table.update_database() }
    puts measure { old_table = TableMaker.new("hugh.csv.old") }

    puts measure { new_table.make_cards("cards.out") }
    puts measure { new_table.make_cards("new_cards.out",
new_table.diff_students(old_table)) }

and then running it with a reduced test set. That should give you a hint
as to where time is spent. I have read the code you posted, but cannot
find a performance hog in it. Perhaps you meant to say 'huge.csv'
instead
of 'hugh.csv' ? How many students are there ? How many courses ? How
many
average courses per student ?

Also, I assume you know that fetching the image files from http can
potentially be very slow. To speed that up, you could parallelize the
process by using a queue, a few workers and a stub image that you can
return.

Or you can of course just wait for the machine ;) .. Too bad Moores law
doesn't say that you actually get a new machine every 18 months, only
that it is available.

best greetings,
kaspar
3fd17cac06aeb4d0b087b1b0c7a94c73?d=identicon&s=25 Eric Christensen (Guest)
on 2005-12-10 00:49
Eric Christensen wrote:
>
> I'm *very* new to Ruby, so I haven't really tried anything yet. But I
> just got a copy of VS 2005 Pro, so I'm itchin' to try. Now if I could
> just find some spare time...
>
I finally got miniruby built with VC 2005 & /GL: it seems to run about
10% faster than the 1.8.3 Windows drop.
47b1910084592eb77a032bc7d8d1a84e?d=identicon&s=25 vjoel (Guest)
on 2005-12-11 04:05
(Received via mailing list)
Eric Christensen wrote:
> Eric Christensen wrote:
>
>>I'm *very* new to Ruby, so I haven't really tried anything yet. But I
>>just got a copy of VS 2005 Pro, so I'm itchin' to try. Now if I could
>>just find some spare time...
>>
>
> I finally got miniruby built with VC 2005 & /GL: it seems to run about
> 10% faster than the 1.8.3 Windows drop.
>

What did you do to get miniruby to build?

What's preventing a full ruby build?
This topic is locked and can not be replied to.