Ruby wish-list

rogerdpack · October 15, 2007, 4:23pm

My personal ruby wish-list (for any feedback):

the ability to rescue arrays (or some way to rescue multiple classes
without pain), like this:

all_socket_interrupts_array = [SocketError, Errno::EHOSTUNREACH,
Errno::ENETUNREACH]

begin

stuff

rescue all_socket_interrupts # non ugly, yet precise!

end

a GC that is ‘user-definable’ (run after this definable threshold,
this often), and (asidedbly), a GC that can run in its own (native)
thread so it doesn’t pause execution of normal threads.
an ensure block that’s uninterruptible, a la:

begin

do stuff

rescue

rescue stuff

ensure_uninterruptible # (or call it ensure_critical)

do stuff which is guaranteed to get run, and not interrupted.

end

the optional ability to have it display the whole backtrace on
uncaught exceptions (and also for all existing threads).

Guess that’s it

Any thoughts?
Thanks.
-Roger

rogerdpack · October 15, 2007, 4:32pm

Roger P. wrote:

stuff

rescue all_socket_interrupts # non ugly, yet precise!
rescue *all_socket_interrupts_array => e

Should work, I think. At least, it appears to under my brief tests…

rogerdpack · October 15, 2007, 5:19pm

Hi,

In message “Re: ruby wish-list”
on Mon, 15 Oct 2007 23:23:06 +0900, Roger P.
[email protected] writes:

Now you know you can.

|2) a GC that is ‘user-definable’ (run after this definable threshold,
|this often), and (asidedbly), a GC that can run in its own (native)
|thread so it doesn’t pause execution of normal threads.

I’d rather prefer smarter collector, but it’s possible.

GC on its own thread is a different story. Interaction between
collector and mutator may hinder the performance.

It’s not as simple as you’ve expected. First we need to define how
“uninterruptible” section work.

|4) the optional ability to have it display the whole backtrace on
|uncaught exceptions (and also for all existing threads).

Simple code like:

begin
…
rescue => e
puts e.backtrace
end

would do.

          matz.

rogerdpack · October 15, 2007, 5:33pm

Works great thanks Alex!

rogerdpack · October 15, 2007, 5:24pm

Thanks Matz.
Comments on comments:

|2) a GC that is ‘user-definable’ (run after this definable threshold,
|this often), and (asidedbly), a GC that can run in its own (native)
|thread so it doesn’t pause execution of normal threads.

I’d rather prefer smarter collector, but it’s possible.

Yeah a smarter collector would be even nicer.

|ensure_uninterruptible # (or call it ensure_critical)

It’s not as simple as you’ve expected. First we need to define how
“uninterruptible” section work.

I agree. One definition would be to mark Thread.critical, then runs the
block, then unmark. I would use it

|4) the optional ability to have it display the whole backtrace on
|uncaught exceptions (and also for all existing threads).

Simple code like:

begin
…
rescue => e
puts e.backtrace
end

would do.
          matz.

Good point My suggestions are thinning down quickly
Thanks.
-Roger

rogerdpack · October 15, 2007, 6:52pm

On 15.10.2007 17:24, Roger P. wrote:

|ensure_uninterruptible # (or call it ensure_critical)

It’s not as simple as you’ve expected. First we need to define how
“uninterruptible” section work.

I agree. One definition would be to mark Thread.critical, then runs the
block, then unmark. I would use it

Bad idea in light of native threads IMHO. Every construct that halts
all threads should be avoided. If you need exclusive access to
resources you need to proper synchronize on them.

Good point My suggestions are thinning down quickly

Cheers

robert

rogerdpack · October 15, 2007, 7:34pm

I don’t really see the reason why the GC would need or want a specific
thread to itself - for a start, such a design makes the system slower on
low end systems. There may also be cases where it is possible to choose
‘optimal’ times to run the GC within a single thread context.

One thing regarding the GC I am unsure about are the conditions under
which the GC is actually run. One not uncommon problem with external
libraries (classic and common example is RMagick) do not malloc using
the correct api, Ruby often fails to call the GC, at all.

A call to GC.start under these conditions can prevent an OOME, as
calling GC.start does in fact cause RMagick to free memory - but ruby
doesn’t know about this.

The simplest solution to this issue I can see is to ensure that the GC
is run when an OOME occurs, or more particularly, all loaded extensions
are told to free when an OOME occurs (this does not seem to happen under
these conditions). Whilst I know this is not really the responsibility
of Ruby, this simple addition could solve problems for quite a number of
scripts, thus removing a FAQ.

More regular GC runs may actually be sensible, depending on the real
performance issues that might arise with longer running applications and
fragmentation. A documented example of such a problem, and a solution is
here:

rogerdpack · October 15, 2007, 6:53pm

On 15.10.2007 18:45, Robert K. wrote:

all threads should be avoided. If you need exclusive access to
resources you need to proper synchronize on them.

I meant: in the light of the fact that native threads will come at a
certain point in time. Your suggested feature would make it
unnecessarily complex and slow to use native threads (there would have
to be regular synchronization on some hidden global lock, which defies
the whole purpose of using (native) threads).

robert

rogerdpack · October 15, 2007, 7:51pm

I agree. Thinking out loud…with a true ‘native’ threaded model I
don’t know if it would be a spectacular idea to be able to block all
threads. I’ve often wondered how Ruby 1.9 will implement
Thread.critical, at all. If it does attempt to, then maybe this
suggestion (though aimed mostly at 1.8.6) might still be useful (if you
don’t mind the possible slowdown). If not then yeah–probably not worth
the hassle

Other suggestions of how ensure_uninterruptable might work (like ‘this
thread doesn’t accept interruptions [thread_name.raise’s] for awhile’)
seem like even worse ideas.

The benefit of having such a feature in the first place would be that
you can ‘nest’ timeouts and other code that executes
other_thread_name.raise, without some dangerous issues cropping up when
two raises occur very close to the same time. Or basically that you can
execute other_thread_name.raise on more complex code without the
drawbacks that might occur.

An example of this is if you nest two timeouts one within another, and
one happens to expire when the other is not finished processing its
ensure block. This will possibly cause a ‘random’ exception to be
raised on the origin thread later. I guess basically currently the use
of other_thread_name.raise is dangerous, this would help that.

Just my $.02
Thought welcome.
-Roger

Robert K. wrote:

On 15.10.2007 18:45, Robert K. wrote:

all threads should be avoided. If you need exclusive access to
resources you need to proper synchronize on them.

I meant: in the light of the fact that native threads will come at a
certain point in time. Your suggested feature would make it
unnecessarily complex and slow to use native threads (there would have
to be regular synchronization on some hidden global lock, which defies
the whole purpose of using (native) threads).

robert

rogerdpack · October 15, 2007, 8:20pm

So you’d prefer a few tweaks:

I don’t really see the reason why the GC would need or want a specific
thread to itself - for a start, such a design makes the system slower on
low end systems. There may also be cases where it is possible to choose
‘optimal’ times to run the GC within a single thread context.

So if it were someday created to run as a separate thread, you’d like to
still be able to have a call ‘GC.start.join’ or what not, to let it
finish during an ‘optimal’ time?

…A call to GC.start under these conditions can prevent an OOME, or more
particularly, that all loaded extensions
are told to free when an OOME occurs

And you’d prefer a small change to the GC such that it also starts on
OOME’s, correct?

Wow I hope I never run into any memory issues like that!

Yeah those also sound reasonable
Wish lists have no bias
Take care.
-Roger

rogerdpack · October 17, 2007, 3:45pm

Ari B. wrote:

Hey!

Not quite on topic of garbage collection… But how hard would
it be to create maybe a style of method creation that doesn’t use
the . to represent Object.behavior ?
Define your code in the global namespace, like
def a
end
is a function a ‘in the global namespace’ – no dots.
Maybe hard code in the parent class?
The way Rails does it is by say you do the command
my_special_class_function_one
if it doesn’t exist it catches the error thrown for non-existence, looks
up how and then defines the method and returns it (redefines things
in-line).

In retrospect, it seems like definitely a core language feature that
may or may not be impossible to get at… But I figured I’d ask

Also, Is there an easy/hard way to define new %{} style methods? Like
for a Rope object, maybe %m{} or something.

Not sure.
GL!

Just a newbie’s musings.

Thanks,
Ari
--------------------------------------------|
If you’re not living on the edge,
then you’re just wasting space.

rogerdpack · October 16, 2007, 2:09am

Hey!

Not quite on topic of garbage collection… But how hard would
it be to create maybe a style of method creation that doesn’t use
the . to represent Object.behavior ?

In retrospect, it seems like definitely a core language feature that
may or may not be impossible to get at… But I figured I’d ask

Also, Is there an easy/hard way to define new %{} style methods? Like
for a Rope object, maybe %m{} or something.

Just a newbie’s musings.

Thanks,
Ari
--------------------------------------------|
If you’re not living on the edge,
then you’re just wasting space.

rogerdpack · October 17, 2007, 9:08pm

On 10/15/07, Ari B. [email protected] wrote:

Also, Is there an easy/hard way to define new %{} style methods? Like
for a Rope object, maybe %m{} or something.

This has come up before - I can’t find the thread, but I believe the
answer was that it would hinder the ability to add new ones to the
core (since that could potentially break code that had already defined
it).

martin

rogerdpack · October 20, 2007, 12:09am

Roger P. wrote:

GC wish list:

Another wish for Ruby would be the power to define ones own unary or
binary operators.

Like I wish I could define "is_within?’ for arrays, like
if element_x is_within? array_y

do stuff

end

That would be sweet. If such a thing were possible, then one could
really create code that looks like complete sentences
My $.02 for the day.

-Roger

rogerdpack · October 18, 2007, 8:06pm

GC wish list:
Isn’t it possible to create your own reference checking style object?
Would this be possible, for example.

Class Object

attr_reader :internal_object # only used if you call
new_with_timely_death_wrapper for your new call–see below
def new_with_timely_death_wrapper class_to_use, *args
@internal_object = class_to_use.new *args # since this object is
‘only internal’ deleting it later will be OK
end

def assign_to_me this_wrapped_object
dec
@internal_object = this_wrapped_object.internal_object
inc
end

def do_method name, *args
eval("@internal_object.#{name} *args") # ok there’s probably a better
way maybe object.internal_object.do_whatever args
end

def dec
@internal_object.count -= 1
recycle_current_object if count == 0
end

def inc
@internal_object.count += 1
end

def recycle_current_object

traverse internal members of @internal_object–force_recycle them,

unless they’re wrappers, then just dec them.
@internal_object.recycle
end

def inc
@internal_object.count += 1
end

def go_out_of_scope
dec
self.force_recycle # we are toast – this is scary and might not be
right
end

end

Then the example:

a = Array.new_with_timely_death_wrapper(0,0)
b = Array.new_with_timely_death_wrapper(0,0) # only time you should use
assign is on start
b.assign_to_me a #recycle’s b’s object, assigns internal_object to a’s
internal_object.a, sets count to 2
a.go_out_of_scope # a set to 1
b.go_out_of_scope # a set to 0 – recycled.

?

rogerdpack · October 20, 2007, 6:06am

On Oct 19, 2007, at 5:09 PM, Roger P. wrote:

Roger P. wrote:

Like I wish I could define "is_within?’ for arrays, like
if element_x is_within? array_y

do stuff

end

That can’t be done now?!?

arr = [‘a’, ‘b’]

if arr.include? ‘a’
puts “array arr includes the letter ‘a’”
end

should be easy to do it the other way as well.

rogerdpack · October 20, 2007, 1:20am

Oh, wishing star! Here are my wishes:

Use “new” as the name of the constructor instead of “initialize”.
You can keep “initialize” around for legacy support; just “alias
initialize new” for the future.

This makes it easy to connect the dots: “SomeClass.new” actually calls
the “new” instance method of a newly instantiated object of class
SomeClass.

Fix the Ruby parser to treat // (literal regexp) just like the
%r{…} (also literal regexp) construct so that you aren’t forced to use
parentheses in method calls:

$ ruby -v
ruby 1.8.6 (2007-03-13 patchlevel 0) [i686-linux]

$ ruby -w -e ‘“foo”.sub /o$/, “x”’
-e:1: warning: ambiguous first argument; put parentheses or even
spaces

$ ruby -w -e ‘“foo”.sub %r/o$/, “x”’
[observe lack of warning]

This problem becomes especially apparent when you use gsub! or sub!:

“foo”.gsub! /o$/, “x” # IMHO, sweet!

“foo”.gsub! %r/o$/, “x” # IMHO, so-so!

“foo”.gsub!(/o$/, “x”) # IMHO, ugly!

Thanks for your consideration.

rogerdpack · October 20, 2007, 12:30pm

Hi,

In message “Re: ruby wish-list”
on Sat, 20 Oct 2007 08:20:33 +0900, Suraj K. [email protected]
writes:

|1. Use “new” as the name of the constructor instead of “initialize”.
|You can keep “initialize” around for legacy support; just “alias
|initialize new” for the future.

Besides that renaming hook like “initialize” is far more difficult
than you expect, I don’t think “new” is sufficient name for instance
initialization hook.

|2. Fix the Ruby parser to treat // (literal regexp) just like the
|%r{…} (also literal regexp) construct so that you aren’t forced to use
|parentheses in method calls:
|
| $ ruby -v
| ruby 1.8.6 (2007-03-13 patchlevel 0) [i686-linux]
|
| $ ruby -w -e ‘“foo”.sub /o$/, “x”’
| -e:1: warning: ambiguous first argument; put parentheses or even spaces

No matter how much you hate this warning, ambiguity would not go away.
If you don’t put parentheses around arguments, the parser confuses
division operator (/) and regular expression at the first argument.

          matz.

rogerdpack · October 31, 2007, 8:16pm

cross-post from core:
After examining how the 1.8.6 gc works, I had a few thoughts:

Background:

It seems that on a ‘cpu intensive’ program (one that generates a lot of
discardable objects–quite common), there is a competition between two
aspects of the gc to call garbage_collect first. They are:

If you run out of available heap slots ruby calls garbage_collect,
and
if
“FREE_MIN” slots now exist (by default 4096) then it returns and leaves
the heap the same size. It also resets the current ‘malloc’ed bytes’
counter to 0, since it called garbage_collect.
If you reach GC_MALLOC_LIMIT of malloc’ed bytes, then it calls
garbage_collect, resets it to 0.

Anyway so what happens in today’s implementations is that number 1 is
called often (I believe) preventing number 2 from ever even springing,
as it
resets the current count of malloc’ed bytes. It’s like garbage_collect
is
trying to serve 2 masters, and ends up serving just the one. I see this
as curious as it basically disallows GC_MALLOC_LIMIT from being reached,
which is not what you would expect.

Thoughts?

On another point, I have a question on this line of code, run at the end
of garbage collection:
if (malloc_increase > malloc_limit) {
malloc_limit += (malloc_increase - malloc_limit) * (double)live /
(live + freed); // this line
if (malloc_limit < GC_MALLOC_LIMIT) malloc_limit = GC_MALLOC_LIMIT;
}

I haven’t checked this, but it seems to me that It seems to me that
(malloc_increase - malloc_limit) will always be a very small number (?)
which may not be what was expected. I could be wrong.

So my question is "what should the GC do, and when?
Any thoughts?
In my opinion, if it runs out of heap slots available, it should call
garbage_collect AND increase the heap size (so that next time it won’t
run out, and will have enough to hopefully reach GC_MALLOC_LIMIT).

I think when it does reach GC_MALLOC_LIMIT malloc’ed bytes, it should
set it

new_malloc_limit = GC_MALLOC_LIMIT *
(1 - percent_of_recent_allocated_memory_that_was_freed)

to allow the malloc_limit to change dynamically, maybe with a fixed max
size.

So my question is what should best happen?
Ruby rox.

rogerdpack · November 5, 2007, 11:27am

Another wish for Ruby would be the power to define ones own unary or
binary operators.

Like I wish I could define "is_within?’ for arrays, like
if element_x is_within? array_y

do stuff

end

I guess it is possible:

if element_x.is_within? array_y

do stuff

end

Ruby is flexible enough again!

Another wish list item would be being able to use “a {var_name}” instead
of “a #{var_name}” I hate that extra pound, as it reminds me of perl

Take care.
-Roger