I am thinking of doing a ‘side by side’ distro of Ruby that includes the
latest SVN up’s, as well as some ‘fringe’ best practices, like a tweaked
GC.
It would have the ability to do force_recycle on objects arbitrarily (at
your own risk), and getters and setters for the GC variables (like how
often to recycle, how close you are to the next collection, how big of
heap blocks to use, etc.)
and also have a GC that is write-on-copy friendly (takes barely longer,
but doesn’t dirty memory).
And any other personal tweaks that people contribute. Kind of a
bleeding edge Ruby.
Would that be useful to anyone? Would anyone use it?
Thanks and take care.
-Roger
On 09.11.2007 21:28, Roger P. wrote:
And any other personal tweaks that people contribute. Kind of a
bleeding edge Ruby.
Would that be useful to anyone? Would anyone use it?
Thanks and take care.
Personally, if I had the resources to invest into this I’d rather spend
them on JRuby. You get a GC with many tweaking options etc. plus native
threads.
Kind regards
robert
Roger P. wrote the following on 09.11.2007 21:28 :
And any other personal tweaks that people contribute. Kind of a
bleeding edge Ruby.
Would that be useful to anyone? Would anyone use it?
Useful ? Yes it would : I have some Rails actions that eat memory quite
happily (associations with hundreds of thousands of objects which
themselves have associations on which to work…). It would help if the
ruby processes would let go of the memory or at least let it live in
swap undisturbed once the action is done.
Would I use it ? Probably, I’ll have to get time to both review your
patch myself and stress-test it (I prefer to test it and understand it
first-hand because I suppose it won’t be used by many). Would the patch
be easy to understand to someone both familiar with Ruby, GC techniques
and C? Or is preliminary knowledge of Ruby’s internal a must?
Regards,
Lionel
Robert K. wrote the following on 09.11.2007 22:05 :
Personally, if I had the resources to invest into this I’d rather
spend them on JRuby. You get a GC with many tweaking options etc.
plus native threads.
Please don’t forget that many gems still don’t work or don’t have
replacement in JRuby. JRuby is the solution for people needing easy
Ruby <-> Java integration but Ruby with strong Unix ties has its
benefits too.
I think I’d have to spend quite some time to migrate applications from
MRI to JRuby: I heavily used ruby-gettext, hpricot, memcache,
ruby-opengl and I believe most of these use C for library interfaces or
performance… some utils like rcov probably don’t work either with
JRuby because they probably rely on the same C interface.
So as much as I’d like JRuby to succeed even if I don’t use it myself
(currently), people willing to work on MRI (or YARV and Rubinious for
that matter) are most welcomed to do so too.
But maybe there is an efficient way to use JNI to trivially port most of
these to JRuby. This could motivate my toying with JRuby…
Regards,
Lionel
Roger P. wrote:
And any other personal tweaks that people contribute. Kind of a
bleeding edge Ruby.
Would that be useful to anyone? Would anyone use it?
Thanks and take care.
-Roger
What would be more useful to me, and in fact where I’m headed, is a Ruby
that’s tunable to your hardware. Just make a source distribution and
force people to recompile it. Right now, my tweaks are all at the GCC
level, and that’s the way it’s going to be for a while. I don’t believe
I’ve exhausted all of the goodies that GCC has to offer, especially GCC
4.2.
Another thing that would be more useful is a comprehensive enough test
and benchmark suite that a user could tell what the payoffs were from
the tweaks and whether the language syntax and semantics remained intact
after the tweaks.
I’m in the process of re-factoring the Rakefile from my profiling
efforts. I’d be happy to profile your source as part of that. By the
way, are you starting with 1.9 or 1.8? I’m still profiling 1.8 only, but
I expect to have 1.9 profiling working within a week or so.
M. Edward (Ed) Borasky wrote the following on 10.11.2007 05:58 :
Lionel B. wrote:
Useful ? Yes it would : I have some Rails actions that eat memory quite
happily (associations with hundreds of thousands of objects which
themselves have associations on which to work…). It would help if the
ruby processes would let go of the memory or at least let it live in
swap undisturbed once the action is done.
Sounds to me like you’re building a data structure in RAM to avoid
making your RDBMS earn its keep.
In some cases, yes because the code is easier to maintain that way. I
usually take the time to switch to SQL when it becomes a problem though
and pure SQL is powerful enough for the task (done that several times
last month).
But my current problem is that simply iterating other large associations
(to create a new object for each and every object on the other end of a
has_many association with complex business rules SQL can’t handle for
example) is enough to use 100-300MB with hundreds of thousands of
objects. Usually I can split the task paginating through the whole set,
but in some cases it isn’t possible : if inserts or deletes happens
concurrently you can miss some objects or try to process some twice (I’m
actually considering fetching all the primary keys in a first pass and
then paginate using windows in this set, which comes with other problems
though manageable ones in my case)…
A temporary 100-300MB spike isn’t a problem, what is a problem is that :
1/ the memory isn’t freed after completion of the task,
2/ it’s kept dirty by the GC
->there’s no way the OS can reuse this memory for another spike
happening in another process, only the original process can reuse it.
This is not a major problem : I can always move all these huge
processings in short-lived dedicated processes but it’s kind of a downer
when the language keeps out of your way most of the time and then shows
one limitation.
But seriously, unless you can restructure your application so it
doesn’t keep a lot of stuff in RAM, you’re probably doomed to throw
hardware at it.
Yes, I’ve done some simple code tuning that helps memory usage, but it
only helps with the wait for a bigger server.
In other words, hard drives are where data that must live for extended
periods (or forever) belong, explicitly filed there by your
application code, not implicitly filed there by the swapper. RAM is
for volatile information that is being used and re-used frequently.
As I understand it, the problem is that MRI keeps some unused memory
allocated and then the GC marks it dirty… So technically there’s
information being used and re-used frequently but only by the GC
Lionel.
Lionel B. wrote:
But my current problem is that simply iterating other large associations
(to create a new object for each and every object on the other end of a
has_many association with complex business rules SQL can’t handle for
example) is enough to use 100-300MB with hundreds of thousands of
objects.
Ah … complex business rules. That’s the big problem with programming
languages – they make it possible to have complex business rules.
Before computers were invented, we had to make do with “buy low, sell
high, collect early, pay late” and double-entry bookkeeping.
A temporary 100-300MB spike isn’t a problem, what is a problem is
that :
1/ the memory isn’t freed after completion of the task,
2/ it’s kept dirty by the GC
->there’s no way the OS can reuse this memory for another spike
happening in another process, only the original process can reuse it.
This is not a major problem : I can always move all these huge
processings in short-lived dedicated processes but it’s kind of a downer
when the language keeps out of your way most of the time and then shows
one limitation.
[snip]
As I understand it, the problem is that MRI keeps some unused memory
allocated and then the GC marks it dirty… So technically there’s
information being used and re-used frequently but only by the GC
Well … that sounds like an actual bug rather than a design issue in
MRI. Is it that the GC can’t tell it’s unused?
Roger P. wrote:
[snip]
Maybe an optimized GC might not be such a bad idea after all
I haven’t been following 1.9 closely enough to know what it does about
garbage collection. But yes, it does look like the MRI GC could stand
some optimization. Given the trade-offs and use cases, I’d optimize for
Rails. And I’m guessing that on Linux/GCC, a nice tight stop-and-copy GC
might well outperform what’s there, and a generational GC would be
better than what’s there but not worth the coding effort. I can’t help
you on Windows or MacOS … the memory management there is a black box
to me.
Which brings up an interesting question. While it seems more Ruby
developers work with Macs than with Windows or Linux, where are most of
the Ruby server applications (Rails and otherwise) deployed? I want to
guess Linux, but I don’t actually know for a fact that is the case.
As a previous email suggested, there are a couple of use cases for a
garbage collector, only one of them being long-running server
applications. But if the overwhelming majority of Ruby server
applications are Rails on Linux, it would surely be worthwhile tuning
the GC to that. Stop-and-copy integrated with the Linux memory manager
(assume RHEL 5/CentOS 5 64-bit) sounds like a winner off the top of my
head.
Hmmm … maybe I should dual-boot my workstation with CentOS 5 and fool
around with this.
As I understand it, the problem is that MRI keeps some unused memory
allocated and then the GC marks it dirty… So technically there’s
information being used and re-used frequently but only by the GC
Well … that sounds like an actual bug rather than a design issue in
MRI. Is it that the GC can’t tell it’s unused?
The GC’s mark and sweep ‘recreates’ its freelist every time it runs a
GC, so if you have a lot of free objects (believe it or not), it will
remark them all–possibly in about the same order as the previous. A
design thing.
So this interesting point of yours may have two implications: a ruby
proc that retains lots of ‘free’ memory will have a longer sweep time
(which having lotsa free is quite common with the standard MRI–it
allocates exponentially larger and larger heap sizes, so you’re almost
guaranteed (with a large process) the the ‘last most’ heap will be half
used, and, as you noted, the entire thing constantly remarked for every
GC (all used marked as ‘valid’, all free remarked for the freelist).
The way to avoid this would be to ‘only add’ to the freelist as you
unallocate objects. Then you’d avoid marking the free objects. You
could still free heaps the same way. If you did that you’d still be
traversing them for every GC (to look through for allocated objects no
longer accessible–unmarked objects), but wouldn’t be marking them
dirty.
Drawback might be a freelist that isn’t ‘optimized in order’ or
something (probably not much of a drawback).
Another way to kind of combat this is to use a smaller ‘heap chunk’ size
(instead of Ruby’s exponentially growing one), as this allows chunks
more frequently to be freed, which means they aren’t traversed
(basically you don’t have as much free memory kicking around, so you
don’t traverse it as much). It still leaves all free memory to
traverse, however.
If you wanted to avoid ever accessing freed objects at all, you’d need
to create an ‘allocated’ list, as well, so you could just traverse the
allocated list and then add those to the freelist that were freed. So
about 20%/object size increase. Maybe a good trade off??? Tough to
tell. If I guessed I’d say that the trade off is…worth it for large
long standing processes. It would use more RAM and be faster.
Maybe an optimized GC might not be such a bad idea after all
-Roger
Lionel B. wrote:
Useful ? Yes it would : I have some Rails actions that eat memory quite
happily (associations with hundreds of thousands of objects which
themselves have associations on which to work…). It would help if the
ruby processes would let go of the memory or at least let it live in
swap undisturbed once the action is done.
Sounds to me like you’re building a data structure in RAM to avoid
making your RDBMS earn its keep. But seriously, unless you can
restructure your application so it doesn’t keep a lot of stuff in RAM,
you’re probably doomed to throw hardware at it. In other words, hard
drives are where data that must live for extended periods (or forever)
belong, explicitly filed there by your application code, not
implicitly filed there by the swapper. RAM is for volatile information
that is being used and re-used frequently.
M. Edward (Ed) Borasky wrote:
I’m in the process of re-factoring the Rakefile from my profiling
efforts. I’d be happy to profile your source as part of that. By the
way, are you starting with 1.9 or 1.8? I’m still profiling 1.8 only, but
I expect to have 1.9 profiling working within a week or so.
What profiler are you using Ed? I just ask because I wrote a profiler
a decade ago that used gcc’s -pg (gprof profiling) with a custom prof
library that used tail patching and the TSC register to get real-time
nano-second resolution on profiled functions, including parent/child
rollups.
The use of real-time in profiling is fantastic, because it tells you
where, for example, I/O or excessive page-faulting is hurting some
function. I also hooked the memory allocation functions to gather
memory info, including both growth and flow (in/out). To carefully
select functions to profile, rather than profiling all, also helps a
lot. Finally, the TSC register counts processor cycles, so you get
nanosecond-resolution.
A lot of Unix-based systems are designed using flawed performance data
from the kernel (statistical) profiling, that simply doesn’t see I/O
or VM times.
I’ve no idea whether the hacks I used to do the tail-patching still
work with a current gcc, but it would be good to reactivate tprof if
possible. It’d be great to have a Ruby that can do such profiling.
I still have the code somewhere…
Clifford H…
Roger P. wrote:
If you wanted to avoid ever accessing freed objects at all, you’d need
to create an ‘allocated’ list, as well, so you could just traverse the
allocated list and then add those to the freelist that were freed. So
about 20%/object size increase. Maybe a good trade off??? Tough to
tell. If I guessed I’d say that the trade off is…worth it for large
long standing processes. It would use more RAM and be faster.
Actually, although it might use more virtual address space, if done
right,
it might consume less RAM - meaning physical, working set - simply by
not touching pages that aren’t in use.
Knowledge and awareness of the VM state seems to often get neglected in
these discussions of GC, even though it’s quite easy to compare the VM
effects of the various types.
Clifford H…
Clifford H. wrote:
Actually, although it might use more virtual address space, if done right,
it might consume less RAM - meaning physical, working set - simply by
not touching pages that aren’t in use.
Knowledge and awareness of the VM state seems to often get neglected in
these discussions of GC, even though it’s quite easy to compare the VM
effects of the various types.
platform.each do |p|
p.vm.research
p.phd.hire unless p.metrics.useful?
p.tools.write unless p.tools.useful?
end
Charles Oliver N. wrote:
implementation (YARV being most complete and furthest along…but we
have some 1.9 features in JRuby too).
-
As far as I know, only MRI is “100 percent MRI compatible”. The
other implementations are “extended subsets”. JRuby is for the moment
the most complete subset and has more extensions, i.e., Java libraries,
an AOT compiler and all of the performance tuning that the JRuby team
has done. I haven’t heard much from the Parrot/Cardinal project
recently, but I’m guessing we’ll see IronRuby at close to the level of
JRuby by early next year, and Rubinius some time in the spring.
-
I don’t think MRI is a dead end at all, considering the discussions
I’ve seen on this list just since I got back from RubyConf. I see people
seriously proposing re-doing the garbage collector, for example, and I
see other people investing a lot of effort in tweaking Rails to use Ruby
and the underlying OS more efficiently.
-
As far as I know, YARV/KRI is the only serious 1.9 implementation.
I do think that there is probably more excitement and interesting work
on YARV/KRI/1.9 than there is on MRI, or for that matter any of the MRI
extended subsets. But MRI is hardly a dead end IMHO.
Charles Oliver N. wrote:
Lionel B. wrote:
Robert K. wrote the following on 09.11.2007 22:05 :
Personally, if I had the resources to invest into this I’d rather
spend them on JRuby. You get a GC with many tweaking options etc.
plus native threads.
Does the jruby GC need much help? I was under the assumption it was
‘good nuf’ or something.
I think MRI is mostly a dead end at this point, unlikely to see any
major perf/scaling improvements anymore. If you’re going to focus a lot
of time on tweaking and improving an implementation, I’d recommend
helping out one of the really active 1.8-compatible implementations
(JRuby being the most complete and furthest along) or a 1.9
implementation (YARV being most complete and furthest along…but we
have some 1.9 features in JRuby too).
I assume by your comments you mean ‘work on the 1.9 MRI or on jruby’
True. Matz has specified that he isn’t looking to integrate drastic
changes into the 1.8.6 trunk anytime (like changes to the GC)–to keep
it stable. Bug fixes, sure, but other things, no.
So why, then, you ask, would people waste time trying to optimize it?
I guess I figured that the 1.9 code (like for the GC) was about the same
so that patches to 1.8.6 that were useful would be good fodder for 1.9.
I think this is the case, too, as Matz mentioned being interested in
benchmarks for any tweaked GC’s in 1.9 (i.e. ‘go ahead and tweak
away–it will get implemented then’).
I could be wrong about the usefulness of working on 1.8.6, though. Hmm.
I think I’m just afraid of working on 1.9 since bugs seem to still be
rolling in. I like stability in others code so that if it exists–it’s
my own fault so I know where to fix it
Now to find the time to write a real GC … (i.e. re-write every
useful extension that has a gc_mark function…sigh).
My latest thought would be to rewrite the object allocater to use just
malloc/free and see if it is faster. In my heart of hearts I almost
think it could be Sometimes we optimize ourselves to death.
void * new_obj(){ return malloc(sizeof(RANY)); }
void * recycle(void *obj) { free(obj)); }
Oh wait, traversing the stack and looking for heap pointers is
problematic with my solution. Maybe I could overcome it by making the
size of a ruby object += 4, and putting the chars ‘rbrx’ at its front so
I know if it’s a ruby heap objects–and hoping few people use that for
strings or something. Oh wait that would introduce more bugs. One
more shot down.
Have a good night!
-Roger
A temporary 100-300MB spike isn’t a problem, what is a problem is that :
1/ the memory isn’t freed after completion of the task,
2/ it’s kept dirty by the GC
->there’s no way the OS can reuse this memory for another spike
happening in another process, only the original process can reuse it.
Ahh so you want the process to actually really free the memory after it
is done with it (and GC doesn’t do it because if heap ‘chunks’ have a
least one ruby object stil live in them, they are re-used). Yeah you
could maybe fix this by using fixed size heap ‘chunks’ (ruby’s grow
exponentially currently).
Maybe an optimized ruby would be useful
On 11/11/2007, M. Edward (Ed) Borasky [email protected] wrote:
- As far as I know, only MRI is “100 percent MRI compatible”. The
It’s only like 99% compatible anyway, there are changes from time to
time
see other people investing a lot of effort in tweaking Rails to use Ruby
and the underlying OS more efficiently.
No, it’s not dead end. However, I would expect its lifetime something
like 1-2 years. So small tweaks that bring immediate benefit are worth
it. Rewriting the GC probably not. Even if you manage to do it before
1.8 is obsolete it would get intensive use for a few moths at best.
If 2.0 succeeds (and I believe in it) there will be little incentive
to use 1.8 anymore. 2.0 will be the current actively developed
interpreter, and implementing the GC in there makes more sense.
Thanks
Michal
Lionel B. wrote:
Robert K. wrote the following on 09.11.2007 22:05 :
Personally, if I had the resources to invest into this I’d rather
spend them on JRuby. You get a GC with many tweaking options etc.
plus native threads.
Please don’t forget that many gems still don’t work or don’t have
replacement in JRuby. JRuby is the solution for people needing easy
Ruby <-> Java integration but Ruby with strong Unix ties has its
benefits too.
The lack of native gems in JRuby has so far not amounted to much of an
obstacle. Since in normal Ruby code you can go after the Java equivalent
libraries with ease, there’s been very little demand to port over native
extensiosn.
I think I’d have to spend quite some time to migrate applications from
MRI to JRuby: I heavily used ruby-gettext, hpricot, memcache,
ruby-opengl and I believe most of these use C for library interfaces or
performance… some utils like rcov probably don’t work either with
JRuby because they probably rely on the same C interface.
I don’t think memcache uses C. Hpricot does, but there’s a JRuby port
since it’s a Ragel-generated state machine. I don’t know about gettext.
GL could easily be replaced with one of several Java 3D/GL binding
libraries.
So as much as I’d like JRuby to succeed even if I don’t use it myself
(currently), people willing to work on MRI (or YARV and Rubinious for
that matter) are most welcomed to do so too.
I think MRI is mostly a dead end at this point, unlikely to see any
major perf/scaling improvements anymore. If you’re going to focus a lot
of time on tweaking and improving an implementation, I’d recommend
helping out one of the really active 1.8-compatible implementations
(JRuby being the most complete and furthest along) or a 1.9
implementation (YARV being most complete and furthest along…but we
have some 1.9 features in JRuby too).
But maybe there is an efficient way to use JNI to trivially port most of
these to JRuby. This could motivate my toying with JRuby…
JNI to native extensions is a band-aid at best. The better option is to
rewrite the libraries in terms of what’s readily available on the JVM.
On 10.11.2007 23:32, M. Edward (Ed) Borasky wrote:
As a previous email suggested, there are a couple of use cases for a
garbage collector, only one of them being long-running server
applications.
That’s why I say, rather use an existing and highly optimized GC that
exists today (meaning the JVM’s GC) than invest loads of development and
research time to recreate something similar. I guess several tens or
hundreds of man years have gone into the JVM’s GC - that’s difficult to
top.
Kind regards
robert
On Nov 10, 2007 8:02 PM, Charles Oliver N. [email protected]
wrote:
Lionel B. wrote:
JNI to native extensions is a band-aid at best. The better option is to
rewrite the libraries in terms of what’s readily available on the JVM.
I don’t know why this gives me a sense of deja vu.
I don’t know how many here remember VisualAge/Java. We (IBM/OTI)
built the first IBM Java IDE on top of what we called the UVM (or
universal virtual machine). This was the IBM Smalltalk VM extended to
execute java bytecodes as well as Smalltalk bytecodes.
In this implementation, Java primitives were written in Smalltalk.
IIRC this was either before the JNI existed, or the JNI evolved to
make this impractical, and IBM moved to a Java only VM.
I know that it’s difficult, and probably premature to define a
standard extension interface which would work across the various
emerging ruby implementations. But without that I’m afraid that the
promise of having multiple implementations is somewhat muted.
Rick DeNatale
My blog on Ruby
http://talklikeaduck.denhaven2.com/