Matz says namespaces are too hard to implement - why?

apeiros · December 22, 2007, 4:18am

Short primer: What are namespaces?
Matz talked (in 2005 I think) about a concept he named ‘namespaces’,
they would allow you to monkey patch e.g. Array in namespace :foo
without the patch being visible in namespace :bar, essentially to make
namespacing safe. He didn’t implement it because it’s too hard to.

I wonder why it is too hard, given that Class#clone exists and shadowing
constants within a module works fine. I wrote up a small proof of
concept which still has problems and could be extended further. It’s
pure ruby, the problems should be solvable at C level.
The probably ugliest part is due to constant lookup rules in blocks, I
had to use string eval.

The code is nicely formatted viewable on Parked at Loopia

== Example code
require ‘namespaces’
namespace :foo, %{
class Array
def bar
“bar”
end
end
p Array.new.bar
}

namespace :bar, %{
p Array.new.bar
}

== Example output
“bar”
testnamespaces.rb:xx:in create': undefined method bar’ for []:Array
(NoMethodError)
from testnamespaces.rb:xx:in `namespace’
from testnamespaces.rb:xx

== namespaces.rb
module Kernel
def namespace(name, code)
Namespace.exist?(name) ? Namespace[name].module_eval(code) :
Namespace.create(name, code)
end
end

module Namespace
@space = {}

class <<self
def exist?(name)
@space.has_key?(name)
end

def [](name)
  @space[name]
end

def create(name, code)
  name  = name.to_s.upcase
  raise "Namespace '#{name}' exists already" if

@space.has_key?(name)
space = const_set(name, Module.new)
@space[name] = space
Object.constants.each { |c|
begin
@space[name].const_set(c, v.clone)
rescue; end
}
space.module_eval(code)
space
end
end
end

Problems that a pure ruby solution suffers and things that I didn’t yet
add:
-Concurrency
-Literals
-Importing from other namespaces
-Recursively clone constants
-Blocks have lookup rules tied to definition context even with
module_eval, hence string evals

So now I wonder, am I missing something? Is the devil in the details, or
is one of the incomplete things in my proof of concept the show stopper?
I see that it would be some work, but I don’t see how it would be hard.
Please enlighten me

Regards
Stefan

apeiros · December 22, 2007, 4:21am

Stefan R. wrote:

essentially to make namespacing safe.

Should of course have been: “essentially to make monkey patching safe.”
Sorry, it’s late here :-/

Regards
Stefan

apeiros · December 22, 2007, 5:24am

Stefan R. wrote:

Stefan R. wrote:

essentially to make namespacing safe.

Should of course have been: “essentially to make monkey patching safe.”

Monkey patching?

–
James B.

“The greatest obstacle to discovery is not ignorance, but the illusion
of knowledge.”

D. Boorstin

apeiros · December 22, 2007, 6:30am

James B. wrote:

Stefan R. wrote:

Stefan R. wrote:

essentially to make namespacing safe.

Should of course have been: “essentially to make monkey patching safe.”

Monkey patching?

–
James B.

somebody get me a bandaide. lol.

apeiros · December 22, 2007, 12:35pm

On 22.12.2007 04:18, Stefan R. wrote:

The probably ugliest part is due to constant lookup rules in blocks, I
end
testnamespaces.rb:xx:in create': undefined methodbar’ for []:Array
end
@space[name]
@space[name].const_set(c, v.clone)
-Concurrency
-Literals
-Importing from other namespaces
-Recursively clone constants
-Blocks have lookup rules tied to definition context even with
module_eval, hence string evals

So now I wonder, am I missing something? Is the devil in the details, or
is one of the incomplete things in my proof of concept the show stopper?
I see that it would be some work, but I don’t see how it would be hard.
Please enlighten me

Ultimately Matz will be the one who is able to answer this properly. My
gut guess is that - besides the issues you mentioned - performance is a
critical issue. Method lookups are so ubiquitous that everything that
slows them down should be limited as far as possible. And method lookup
will certainly suffer because the set of allowed methods is no longer
determined by a class but also by a location in the code where a method
is invoked. Especially in the light of nesting these selector
namespaces lookups will become complex and potentially slow.

Kind regards

robert

apeiros · December 22, 2007, 1:21pm

Robert K. wrote:

Ultimately Matz will be the one who is able to answer this properly. My
gut guess is that - besides the issues you mentioned - performance is a
critical issue. Method lookups are so ubiquitous that everything that
slows them down should be limited as far as possible. And method lookup
will certainly suffer because the set of allowed methods is no longer
determined by a class but also by a location in the code where a method
is invoked. Especially in the light of nesting these selector
namespaces lookups will become complex and potentially slow.

Or perhaps, the various implementers will be able to answer this
properly as well

So here’s a long answer:

Selector namespaces would be difficult to implement largely because of
how method dispatch works. Currently, the metaclass of the target object
is 100% in control. It decides which method will be called in all cases,
and the caller is not able to influence that selection process in any
way. So we have to monkey patch the actual metaclass to have new
behavior show up for all future calls, with the down side of course
being that new behavior shows up for all future calls…you can’t choose
that some paths see new behavior and some see old.

Groovy is an example of a modern language that supports selector
namespaces, which they call Categories. Categories allow you to apply a
set of metaclass changes to one (or more?) class on a specific thread
within a specific scope. So you can add behavior to String for a block
of code and all code it calls, and when the block finishes the behavior
disappears.

And it’s dog slow, even compared to normal non-Category Groovy calls
which aren’t particularly fast to begin with.

The way it’s implemented in Groovy leverages the fact that all
invocations in Groovy go first to an intermediary that decides whether
the target object’s metaclass should have the last say or not. If a
Category is in play, all such intermediaries will allow the Category to
have first grab at method invocation, at which time they can pretend the
target metaclass has the new behavior. Otherwise, calls pass straight
through to the metaclass.

The problem with this is that when you’re not using a Category, you
still have to constantly check for each invocation whether a Category
has been installed. Check, check, check, check, check, check…wasting
cycles. It also adds multiple additional layers into method invocation
when you are in a Category, since it stacks all the Category logic on
top of the already heavy method invocation logic.

If not for Categories, no intermediary would be needed, and no checks
would be needed.

I’d expect similar semantics in Ruby, since non-thread-local namespaces
would be almost worthless (threads would see behavior on types change
forward and back at arbitrary times), and as a result similar
performance implications. Even worse, it would require shoehorning an
intermediary into the Ruby call path, further slowing it down and
complicating optimization by VMs like the JVM and CLR. And it would add
in all the same checks, since every call would have to check whether a
namespace has been installed before proceeding.

Charlie

apeiros · December 22, 2007, 2:53pm

Doesn’t _why have an interesting approach of this problem? I forget
what it is called. I think he basically “objectified” the whole of
Ruby so it was reusable.

Also, I recently posted this to core (doubt it would ease the
implementation issues however):

Would it be possible to do selector namespaces on a file basis? That
is to say, load a library such that it would only apply to the
immediate file and no other? For example lets say I have:

round.rb

class Float
def round
0
end
end

foo.rb

n = 1.23
puts n.round

#boo.rb
selector_require “round” # hypothetical
require “foo”
n = 1.234
puts n.to_f

running boo.rb, we’d get:

1
0

The 1 comes from foo.rb, but the 0 from boo.rb because we “selectively
loaded” round.rb.

I’ve never been satisfied with block-oriented approaches often cited.
Is this perhaps a more useful approach? Or does this have problems of
it own?

T.

apeiros · December 22, 2007, 3:21pm

Or perhaps, the various implementers will be able to answer this
properly as well

So here’s a long answer:

Selector namespaces would be difficult to implement largely because of
how method dispatch works. Currently, the metaclass of the target object
is 100% in control. It decides which method will be called in all cases,
and the caller is not able to influence that selection process in any
way. So we have to monkey patch the actual metaclass to have new
behavior show up for all future calls, with the down side of course
being that new behavior shows up for all future calls…you can’t choose
that some paths see new behavior and some see old.

Groovy is an example of a modern language that supports selector
namespaces, which they call Categories. Categories allow you to apply a
set of metaclass changes to one (or more?) class on a specific thread
within a specific scope. So you can add behavior to String for a block
of code and all code it calls, and when the block finishes the behavior
disappears.

And it’s dog slow, even compared to normal non-Category Groovy calls
which aren’t particularly fast to begin with.

The way it’s implemented in Groovy leverages the fact that all
invocations in Groovy go first to an intermediary that decides whether
the target object’s metaclass should have the last say or not. If a
Category is in play, all such intermediaries will allow the Category to
have first grab at method invocation, at which time they can pretend the
target metaclass has the new behavior. Otherwise, calls pass straight
through to the metaclass.

The problem with this is that when you’re not using a Category, you
still have to constantly check for each invocation whether a Category
has been installed. Check, check, check, check, check, check…wasting
cycles. It also adds multiple additional layers into method invocation
when you are in a Category, since it stacks all the Category logic on
top of the already heavy method invocation logic.

If not for Categories, no intermediary would be needed, and no checks
would be needed.

I’d expect similar semantics in Ruby, since non-thread-local namespaces
would be almost worthless (threads would see behavior on types change
forward and back at arbitrary times), and as a result similar
performance implications. Even worse, it would require shoehorning an
intermediary into the Ruby call path, further slowing it down and
complicating optimization by VMs like the JVM and CLR. And it would add
in all the same checks, since every call would have to check whether a
namespace has been installed before proceeding.

Charlie

I fail to see why namespaces would have to be per Thread. I understand
my code and what methods are available as a lexical issue. Similar as
when I require a file, the provided classes will be available in every
thread, I’d expect a change in a namespace to be local to the lexical
scope of the namespace.

Is there a good example of where and why per-thread namespaces would
make it that much more useful you imply, so I can follow your argument
of thread unaware namespaces being worthless?

Also I fail to see how lookup would become more complicated. Take a look
at the approach I took. I copy the classes (on interpreter level one
could use COW techniques to reduce memory usage) into the new namespace.
The numbers of levels to look a method up or the way to look it up does
in no way change. Or do I miss something there?

Regards
Stefan

apeiros · December 22, 2007, 3:23pm

Trans wrote:

Doesn’t _why have an interesting approach of this problem? I forget
what it is called. I think he basically “objectified” the whole of
Ruby so it was reusable.

Yes, sandbox. There’s also discussions starting now about multi-VM
support in a future Ruby 1.9/2.0 version. Both allow you to isolate a
“sub-ruby” from changing things in the “super-ruby” but they’re far more
isolated than a selector namespace would be. Basically, in sandbox (and
in the JRuby equivalents) you have to marshal data between the two
“rubies” as though they were separate processes. Hardly a seamless
integration for namespacing, but useful for other domains (_why
demonstrated multiple Rails apps in the same process, for example).

Also, I recently posted this to core (doubt it would ease the
implementation issues however):

Would it be possible to do selector namespaces on a file basis? That
is to say, load a library such that it would only apply to the
immediate file and no other? For example lets say I have:

I’d say yes…but with a caveat: the namespace would only apply to
invocations within that file. I think in general it’s expected that a
selector namespace would affect called code as well in the same thread.
But perhaps that wasn’t intended by Matz’s initial proposals of selector
namespace behavior?

Maybe we need to get our heads around what selector namespaces should
actually be first…

The 1 comes from foo.rb, but the 0 from boo.rb because we “selectively
loaded” round.rb.

This is an example of something I’d expect to not work, because
“round” is not called after the namespace is installed. Did you mean to
call n.round at the bottom? That I would expect to work…but no calls
to round outside this file would see the namespace.

This version would also suffer from constantly checking if a namespace
has been installed, since the calls after are independent and
selector_require would presumably be just another method call. However a
file pragma that says “this file operates under a given selector”
would probably work well…since all calls in that file could be
decorated with namespace checks during parse.

I’ve never been satisfied with block-oriented approaches often cited.
Is this perhaps a more useful approach? Or does this have problems of
it own?

Blocks at least allow the interpreter to say “within this context, use
this namespace” and provide an “off” point where the namespace goes
away. I’m not sure either approach is better than the other, but the
pragma is probably the least impact to performance (and maybe the least
useful).

In the end the biggest problem that namespaces introduce is that dynamic
invocation becomes…even more dynamic, since every call can suddenly
take a path completely unrelated to the object being invoked if the
namespace decides it should do so. This is the crux of the issue in
Groovy, and until there’s a way to make namespaces have zero perf impact
on non-namespaced code I’d vote to keep them out.

But I still think it’s worth discussing exactly what they should do and
how they should work.

Charlie

apeiros · December 22, 2007, 4:01pm

On Dec 22, 8:52 am, Trans [email protected] wrote:

#boo.rb
selector_require “round” # hypothetical
require “foo”
n = 1.234
puts n.to_f

of course, “n.to_f” should be “n.round”.

T.

apeiros · December 22, 2007, 3:51pm

Stefan R. wrote:

behavior show up for all future calls, with the down side of course
And it’s dog slow, even compared to normal non-Category Groovy calls
The problem with this is that when you’re not using a Category, you
would be almost worthless (threads would see behavior on types change
my code and what methods are available as a lexical issue. Similar as
when I require a file, the provided classes will be available in every
thread, I’d expect a change in a namespace to be local to the lexical
scope of the namespace.

Is there a good example of where and why per-thread namespaces would
make it that much more useful you imply, so I can follow your argument
of thread unaware namespaces being worthless?

They wouldn’t “have to be” but that would probably be the most useful.
If I have a namespace that changes String#to_s and I call a library that
calls String#to_s, don’t I want that library to see my change?

Your version and Trans’s example would work fine for very localized
namespacing, which would have much lower implementation impact (and may
also be useful).

Also I fail to see how lookup would become more complicated. Take a look
at the approach I took. I copy the classes (on interpreter level one
could use COW techniques to reduce memory usage) into the new namespace.
The numbers of levels to look a method up or the way to look it up does
in no way change. Or do I miss something there?

The complication is that under normal circumstances a method has no
knowledge of whether it’s being called inside a namespace or not, since
the installation of the namespace itself is just another method call. So
every method in the system would have to check whether they are being
called under a namespace.

Your code gets around that by essentially delaying the parse until a
namespace is already installed. While this works, and allows namespacing
within that subcontext, you lose all the benefits of having code only
get parsed once. eval is very expensive, even more expensive than
installing per-call namespace checks throughout the system.

Charlie

apeiros · December 22, 2007, 4:23pm

Stefan R. wrote:

I’d expect (and want) a namespace only to be effective for libs that
know about the changes and request them (by importing the namespace
where the change is made e.g.)

AOP is largely method-bounded (or cutpoint-bounded) and not applicable
to things outside that method or cutpoint. It doesn’t really apply here.

What about this example, based on your code:

require ‘namespaces’
namespace :foo, %{
class Array
def each
# do something new
end
end
p Array.new.collect
}

The default implementation of Enumerable#collect calls each. Would you
expect collect to see the original each method or the one you’ve
provided in the namespace?

Your version and Trans’s example would work fine for very localized
namespacing, which would have much lower implementation impact (and may
also be useful).

I see the use of namespaces mostly in safe monkey patching. I.e. that I
can change the behaviour of default .each for a namespace without
breaking a library that depends on the default way .each works.

See the above example; if you only want namespaces so that within a
given block of code method calls to where you want them to, that’s
simpler to implement. But it breaks some amount of consistency you might
expect. It seems like if selector namespacing is useful (which I’m
unsure of) it would only be generally useful if it could also affect
calls further down the chain. Maybe I’m wrong?

them.
No, it doesn’t apply to block eval (as much), but you don’t get to delay
the parse in that case.

At some point you have to be able to say “this code is being
namespaced”. If you want to do that at runtime, then either you need to
modify already-parsed code (which won’t work across libraries or calls)
or you need every invocation to check for namespaces. If you want to do
that at parse/compile time you need a pragma or keyword, and you still
can’t do it across libraries or calls without installing a namespace
check for every invocation.

Also isn’t a required file essentially evaled too? Or are there
differences I’m unaware of?

There’s not much difference, except that require does a one-time eval of
an entire file, parsing all code, classes, methods, blocks in one go and
saving the results. eval calls parse every time you hit them.

Charlie

apeiros · December 22, 2007, 4:02pm

Charles Oliver N. wrote:

Is there a good example of where and why per-thread namespaces would
make it that much more useful you imply, so I can follow your argument
of thread unaware namespaces being worthless?

They wouldn’t “have to be” but that would probably be the most useful.
If I have a namespace that changes String#to_s and I call a library that
calls String#to_s, don’t I want that library to see my change?

Hm, my knowledge about AOP is very limited, but this sounds more like
AOP to me than namespacing.
I’d expect (and want) a namespace only to be effective for libs that
know about the changes and request them (by importing the namespace
where the change is made e.g.)

Your version and Trans’s example would work fine for very localized
namespacing, which would have much lower implementation impact (and may
also be useful).

I see the use of namespaces mostly in safe monkey patching. I.e. that I
can change the behaviour of default .each for a namespace without
breaking a library that depends on the default way .each works.

Also I fail to see how lookup would become more complicated. Take a look
at the approach I took. I copy the classes (on interpreter level one
could use COW techniques to reduce memory usage) into the new namespace.
The numbers of levels to look a method up or the way to look it up does
in no way change. Or do I miss something there?

The complication is that under normal circumstances a method has no
knowledge of whether it’s being called inside a namespace or not, since
the installation of the namespace itself is just another method call. So
every method in the system would have to check whether they are being
called under a namespace.

Your code gets around that by essentially delaying the parse until a
namespace is already installed. While this works, and allows namespacing
within that subcontext, you lose all the benefits of having code only
get parsed once. eval is very expensive, even more expensive than
installing per-call namespace checks throughout the system.

Does that apply to block-eval too? That I currently have to resort to
string-eval is only due to constant lookup rules, which with access to
the interpreter could be changed. Or as somebody on irc mentioned,
ruby2ruby could also be used to search constant lookup nodes and “bend”
them.
Also isn’t a required file essentially evaled too? Or are there
differences I’m unaware of?

Regards
Stefan

apeiros · December 22, 2007, 4:42pm

Charles Oliver N. wrote:

require ‘namespaces’
namespace :foo, %{
class Array
def each
# do something new
end
end
p Array.new.collect
}

The default implementation of Enumerable#collect calls each. Would you
expect collect to see the original each method or the one you’ve
provided in the namespace?

I’d expect Enumerable to use the Array#each as defined in that
namespace, the same way it would in current ruby if I changed
Array#each. But I think I begin to see a thought mistake. I guess I have
to think about it a bit :-/

See the above example; if you only want namespaces so that within a
given block of code method calls to where you want them to, that’s
simpler to implement. But it breaks some amount of consistency you might
expect. It seems like if selector namespacing is useful (which I’m
unsure of) it would only be generally useful if it could also affect
calls further down the chain. Maybe I’m wrong?

I think I’d have to actually play with it and see problems. Maybe you’re
right
But from what I think at the moment, consistency expectations would be a
problem with any namespace implementation. But maybe I’m wrong?

Regards
Stefan

apeiros · December 22, 2007, 5:55pm

2007/12/22, Stefan R. [email protected]:

But I think I begin to see a thought mistake. I guess I have
to think about it a bit :-/

Actually that fits well with what I was going to suggest: it seemed to
me that you did not have yet fully made up your mind what you want.
I doubt though that a lexical solution would be as useful as a
call stack scoped one.

Also, Charles, maybe implementation implications for lexical scoped
changes might be less dramatic but wouldn’t they still impose a
performance penalty at runtime? I mean, the issue of checking would
not change whether it’s dynamically or lexically scoped. And there
must be a penalty because of Ruby’s dynamic nature, i.e. since it’s
not compiled you cannot decide at compile time which version of a
method needs to be invoked. When I think about it it may even make
implementation more difficult, because then the set of current methods
changes even more (i.e. when you redefine a method in a namespace and
invoke that method from another method invoked in that namespace the
old definition would apply again; this also imposes the interesting
question if recursion with redefined methods is still possible… :-))
A lot of interesting problems to solve.

Kind regards

robert

apeiros · December 22, 2007, 7:56pm

Stefan R. wrote:

problem with any namespace implementation. But maybe I’m wrong?
I don’t think so, if the namespacing applies to a thread and all called
code. The idea is that you’d use such namespacing to add your own
behavior to some class, and expect that everyone calling that class
while the namespace is active would see that behavior.

Charlie

apeiros · December 22, 2007, 11:18pm

On 22/12/2007, Charles Oliver N. [email protected] wrote:

Ditto, though I’m not a fan of selector namespacing yet anyway.

invoke that method from another method invoked in that namespace the
(assume “namespace” is a keyword)
“”.foo # => “woohoo!”

So this is a syntax sugar for something like

#string_decorate.rb
module StringDecorate
def do_foo_with_string str
“woohoo!”
end
end

#foo.rb
require ‘string_decorate’

include StringDecorate
extend StringDecorate

do_foo_with_string “”

Now I do not say it’s useless to add methods to a class w/o the fear
you would stomp on somebody else’s added methods. But it only makes
the code look slightly nicer at the cost of some obfuscation of the
origin of the methods and some performance penalty.
Or is the foo also supposed to be visible in methods called from the
block?

The stack scoped namespace then would be roughly quivalent to

class AddBar < SimpleDelegator
def bar
“foobar!”
end
def inspect
“I won’t tell you :p”
end
end

do_something_with (AddBar.new foo)

It’s dreadfully slow. I tried it when I wanted to add methods to a
Fixnum and found out it does not have a class. Of course, if the
wrapped object is stored somewhere the method does not go away when
do_somethin_with ends.
On the other hand, it is somewhat consistent. If the method overrides
were on the stack you could get different views of the same object
from different places.

Basically all these things require to store some class and method
indexed hash of overrides somewhere that is searched before the
standard class hierarchy. It requires to do the search multiple times
but is should be like doing the standard lookup 2x or 3x, not the
horrendous slowdown as with the pure ruby delegator.

Thanks

Michal

apeiros · December 22, 2007, 8:15pm

Robert K. wrote:

2007/12/22, Stefan R. [email protected]:

But I think I begin to see a thought mistake. I guess I have
to think about it a bit :-/

Actually that fits well with what I was going to suggest: it seemed to
me that you did not have yet fully made up your mind what you want.
I doubt though that a lexical solution would be as useful as a
call stack scoped one.

Ditto, though I’m not a fan of selector namespacing yet anyway.

old definition would apply again; this also imposes the interesting
question if recursion with redefined methods is still possible… :-))
A lot of interesting problems to solve.

Yes, lexically scoped namespaces would have the same performance
implications call-stack scoped namespaces iff they were applied
dynamically at runtime. If they were applied statically at parse time,
via a keyword or other syntax, the overhead of checking for a namespace
would be limited to specific chunks of code:

(assume “namespace” is a keyword)

my_string.rb:
class StringDecorate
def foo
“woohoo!”
end
end

foo.rb:
namespace String => StringDecorate {
“”.foo # => “woohoo!”
}
“”.foo # => error

So the idea here is that since namespace is a keyword, at parse or
compile time everything inside the block would be decorated with
namespace-checking logic. And more importantly, everything outside the
block would remain blissfully unaware of namespace checking at all.

But this is still predicated on the idea that lexically-scoped
namespacing is actually useful.

Charlie

apeiros · December 23, 2007, 1:36am

Michal S. wrote:

Now I do not say it’s useless to add methods to a class w/o the fear
you would stomp on somebody else’s added methods. But it only makes
the code look slightly nicer at the cost of some obfuscation of the
origin of the methods and some performance penalty.
Or is the foo also supposed to be visible in methods called from the block?

My version just shows changing it within the calls appearing in that
block, but I think for it to be generally useful in the typical sense of
selector namespaces it would have to affect all downstream calls in the
thread as well.

standard class hierarchy. It requires to do the search multiple times
but is should be like doing the standard lookup 2x or 3x, not the
horrendous slowdown as with the pure ruby delegator.

So you’re saying the stdlib delegator is slow? Do you have some
benchmarks for this? I’d like to examine it and see if I can improve it
or speed it up in JRuby. I believe there are parts of Rails that use
delegator too (or perhaps only used to), so it would be nice to see if
it’s a real bottleneck.

Charlie

apeiros · December 23, 2007, 1:40am

Robert K. wrote:

Yeah, and then there is still the question what happens to recursion?
Since the checks are lexically, you could override a recursive method
with another one and the second invocation ends up calling the old
version. Spooky although I guess you could do some cute tricks with
this.

One way or the other I don’t claim to know where selector namespaces
are actually useful, so I can’t say which way would be better to go. But
I’ll gladly illustrate how to implement them either way.

Charlie