The finer points of postfix conditionals

dubstep · January 24, 2011, 3:22pm

Hi,

So it turns out the following results in a NoMethodError:

foo if foo = 1

I.e. the assignment to foo in the conditional is not in-scope when the
‘then’ part is evaluated.

Obviously the following does not result in a NoMethodError:

if foo = 1
foo
end

I would expect them to work the same way. Does anybody know the reason
for this? How exactly is the first example parsed?

Secondly, suppose foo has not yet been assigned. The following results
in foo being assigned to nil:

foo = 1 if false

In fact, so does:

if false
foo = 1
end

and even:

if false
foo = 1
bar = 2
end

(results in both foo and bar being assigned nil)

Also consider:

foo = 1
foo = 2 if false

This results in foo keeping its value of 1, not being assigned to nil.
So it is not simply the case that the second statement is parsed as:

foo = (2 if false)

Is anyone able to elaborate on what exactly the interpreter is doing in
these cases?

Thanks

turnip · January 24, 2011, 3:43pm

On Mon, Jan 24, 2011 at 2:22 PM, Jon L.
[email protected]wrote:

Also consider:

foo = 1
foo = 2 if false

This results in foo keeping its value of 1, not being assigned to nil.
So it is not simply the case that the second statement is parsed as:

Not being assigned to nil here makes sense, even without knowing why it
gets
assigned to nil in your other examples; the interpreter wouldn’t want to
override an existing value of foo.

I can’t answer the other questions as I don’t know enough about how the
interpreter is handling the source code, however I would guess that the
existence of an assignment in the source code causes the “foo” local
variable to be added as existent within its scope, even though an
assignment
has not actually occurred.

I can see this causing problems in certain cases, perhaps if, somewhere
in
the code, there was something like:

if foo
… # (1)
end

which was somehow after an assignment like

… if foo = some_value # (2)

If (2) were removed, (1) would cause an error. But I can’t see this
being
likely at all.

I’d be interested to know the answer to your question. I suspect
parse_tree
will be useful here.

Related to this:

x # NameError
x = x # => nil
x # => nil

turnip · January 24, 2011, 4:08pm

On Mon, Jan 24, 2011 at 3:22 PM, Jon L. [email protected]
wrote:

foo
end

I would expect them to work the same way. Does anybody know the reason
for this? How exactly is the first example parsed?

This is not an issue of parsing as you can easily check:

15:56:20 ~$ ruby19 -ce ‘foo if foo = 1’
-e:1: warning: found = in conditional, should be ==
Syntax OK

(Ignore the warning for the moment.)

This has to do with the way Ruby deals with local variables and
especially with the local variable method ambiguity. In short, a
local variable is known from the point in code on where it first shows
up on the left side of an assignment. Postfix “if” has the assignment
after the “body” of the “if” while in the case of the “if” statement
the assignment comes lexically before the access.

See also

http://www.ruby-doc.org/docs/ProgrammingRuby/html/language.html#UO

and even:

if false
foo = 1
bar = 2
end

(results in both foo and bar being assigned nil)

This is not exactly true: foo is not “assigned nil” but rather is foo
initialized as a local variable and initially a variable refers nil.
There is no assignment but because you have an assignment in code
(even though it’s no executed) the local variable comes into existence
(see above).

Also consider:

foo = 1
foo = 2 if false

This results in foo keeping its value of 1, not being assigned to nil.
So it is not simply the case that the second statement is parsed as:

foo = (2 if false)

Your assessment is correct. It is parsed as

(foo = 2) if false

Or, more general

expr /if/ condition

Is anyone able to elaborate on what exactly the interpreter is doing in
these cases?

I hope so. Please note also that the code “expr1 if var = expr2” can
always be replaced by the much more readable

var = expr2

expr1 if var

or

var = expr2

if var
expr1
end

or

var = expr2 and expr1 # mind operator precedence!

Kind regards

robert

turnip · January 24, 2011, 9:26pm

On Jan 24, 2011, at 1:18 PM, Jon L. wrote:

Thanks for the explanation. FWIW I think it is a shame that postfix
conditionals are semantically different to normal ones, but it’s good to
know why the difference occurs.

I think you are reversing cause and effect. The postfix condition
is not the source of the semantic surprise.

The surprise comes from the way Ruby disambiguates local variables
from argument-less method calls at parse time. You can contrive
other examples of this surprise:

x rescue puts(“x isn’t a method here”)
x = true
puts “x is now a local variable” if x
x rescue puts(“this won’t fail now because parser thinks x is a local
variable”)
x() rescue puts (“x still isn’t a method”)

Gary W.

turnip · January 24, 2011, 7:18pm

Thanks for the explanation. FWIW I think it is a shame that postfix
conditionals are semantically different to normal ones, but it’s good to
know why the difference occurs.

I started to wonder about this when I was using the following code:

bla(foo) if foo = self.foo

In other words, I was trying to put the value of the method foo into the
local variable foo in order to avoid calling the method twice. To me the
above is quite an elegant way of doing that, so it’s a shame that it
does not actually have the intended effect [it all came crashing down
when I renamed the foo variable]

Cheers

turnip · January 25, 2011, 9:56am

On Mon, Jan 24, 2011 at 7:18 PM, Jon L. [email protected]
wrote:

Thanks for the explanation. FWIW I think it is a shame that postfix
conditionals are semantically different to normal ones, but it’s good to
know why the difference occurs.

See Gary’s reply.

I started to wonder about this when I was using the following code:

bla(foo) if foo = self.foo

In other words, I was trying to put the value of the method foo into the
local variable foo in order to avoid calling the method twice. To me the
above is quite an elegant way of doing that, so it’s a shame that it
does not actually have the intended effect [it all came crashing down
when I renamed the foo variable]

Frankly, I don’t find this elegant at all. There is only one
situation where it is reasonable and elegant to place an assignment in
a conditional expression: in case of loops where the expression
changes but the result needs to be reused after the condition, e.g.

while (now = Time.now) < target_time
puts “We have now #{now}”
do_more_work
log.debug “Work for timestamp : #{now}”
end

For all other situations, i.e. if it is a non loop condition, the
expression does not change or is not used inside the body or
afterwards, there is no point at all to assign in the conditional
expression. Your aim to avoid calling self.foo twice is easily
reached without assignment inside the condition:

traditional, easy to understand

f = foo
bla(f) if f

or

slightly more involved but elegant

f = foo and bla(f)

The latter makes up for quite an elegant solution which also has the
advantage that it is executed in the same order as it is read by a
human being (at least in many cultures); even for others the execution
order is the same as for other sequences of statements.

Kind regards

robert

turnip · January 25, 2011, 11:53pm

Robert K. wrote in post #977145:

On Mon, Jan 24, 2011 at 3:22 PM, Jon L. [email protected]
wrote:
…

and even:

if false
foo = 1
bar = 2
end

(results in both foo and bar being assigned nil)

This is not exactly true: foo is not “assigned nil” but rather is foo
initialized as a local variable and initially a variable refers nil.
There is no assignment but because you have an assignment in code
(even though it’s no executed) the local variable comes into existence
(see above).

To understand this better, I used the method ‘defined?’.
Would ‘defined?’ be a correct way to determine if a name was
already ‘initialized as a local variable’ in this case?

$ rvm use 1.9.2 # (same behavior in 1.8.7)
Using /home/peterv/.rvm/gems/ruby-1.9.2-p136
$ irb
001:0> defined?(x) #=> nil
002:0> if false
003:1> x = 10
004:1> end #=> nil
005:0> defined?(x) #=> “local-variable”
006:0> x #=> nil

Thanks,

Peter

turnip · January 25, 2011, 7:09pm

Robert K. wrote in post #977336:

On Mon, Jan 24, 2011 at 7:18 PM, Jon L. [email protected]
wrote:

Thanks for the explanation. FWIW I think it is a shame that postfix
conditionals are semantically different to normal ones, but it’s good to
know why the difference occurs.

See Gary’s reply.

Yes, I agree that it’s surprising this is done at parse time. But
regardless of the cause and effect it is still surprising (to me) that
the two conditional syntaxes have different semantics

I started to wonder about this when I was using the following code:

bla(foo) if foo = self.foo

In other words, I was trying to put the value of the method foo into the
local variable foo in order to avoid calling the method twice. To me the
above is quite an elegant way of doing that, so it’s a shame that it
does not actually have the intended effect [it all came crashing down
when I renamed the foo variable]

Frankly, I don’t find this elegant at all.

It’s a matter of taste. I agree with you that in almost all situations
that assignment during a conditional test is not nice.

However, I liked this solution precisely because the local var was named
exactly the same as the method. So the way I read it in my head was
“call bla with foo, and oh btw, don’t call the foo method twice”. So in
this situation the ‘if foo = self.foo’ was purely a performance tweak
and nothing else. A side note, if you will - it didn’t change the
semantics of the statement.

But as I say, it’s a matter of taste. And it doesn’t work anyway

Jon

turnip · January 26, 2011, 9:15am

On Tue, Jan 25, 2011 at 11:53 PM, Peter V.
[email protected] wrote:

already ‘initialized as a local variable’ in this case?
You can certainly do that for learning purposes but I have not yet had
the need to use defined? in a real program.

$ rvm use 1.9.2 # (same behavior in 1.8.7)
Using /home/peterv/.rvm/gems/ruby-1.9.2-p136
$ irb
001:0> defined?(x) #=> nil
002:0> if false
003:1> x = 10
004:1> end #=> nil
005:0> defined?(x) #=> “local-variable”
006:0> x #=> nil

IRB cannot be trusted on things like this as it has different behavior
for local variables than the Ruby interpreter. Better do something
like this:

09:10:10 ~$ ruby19 <<CODE

def f
p defined?(x)
if false
x=10
end
p defined?(x)
end
f
CODE
nil
“local-variable”
09:11:06 ~$

Kind regards

robert

turnip · January 26, 2011, 10:37am

Peter V. wrote in post #977488:

To understand this better, I used the method ‘defined?’.
Would ‘defined?’ be a correct way to determine if a name was
already ‘initialized as a local variable’ in this case?

$ rvm use 1.9.2 # (same behavior in 1.8.7)
Using /home/peterv/.rvm/gems/ruby-1.9.2-p136
$ irb
001:0> defined?(x) #=> nil
002:0> if false
003:1> x = 10
004:1> end #=> nil
005:0> defined?(x) #=> “local-variable”
006:0> x #=> nil

Yes, but you should rarely if ever need this in practice.

Ruby is in general a highly dynamic language, but one thing which is
static is the decision as to whether a bare name is a local variable or
a method call. This is done at parse time, before code is even
executed (whereas definition of classes and methods are done at run
time, by executing the code containing ‘class’ and ‘def’ statements)

Hence even if you dynamically create a local variable using ‘eval’, a
subsequent non-eval reference to that variable won’t pick it up. Run the
following as a .rb script (not in irb):

def x
“yay”
end
eval “x=1”
puts x # yay
puts defined?(x) # method
puts eval(“x”) # 1

You can see that x was chosen to be a method call, because no assignment
to x was seen previously (the eval isn’t executed until runtime). This
is even clearer if you use something like ParseTree to show the parsed
code.

You can, however, force a bare name to be a method call instead of a
local variable - again, decided at parse time:

def x
“yay”
end
x = 1
puts x # 1
puts x() # yay
puts self.x # private method ‘x’ called

puts y # undefined local variable or method ‘y’
puts y() # undefined method ‘y’

I think the reasons for this approach are:

Language design - you don’t need to declare variables, and you don’t
need to use () after every method call
Decidability - you can tell just by scanning source code whether ‘x’
is a local variable or a method call. (You only need to scan from the
previous ‘def’ or ‘class’ statement, since these start a new scope)
Efficiency - you don’t want to have to search at run time to
determine whether ‘x’ is a local variable or a method call at that point
in the code every time it executes

turnip · January 26, 2011, 1:59pm

Robert K. wrote in post #977599:

On Tue, Jan 25, 2011 at 7:09 PM, Jon L. [email protected]
wrote:

regardless of the cause and effect it is still surprising (to me) that
the two conditional syntaxes have different semantics

Why do you find that surprising?

I guess I expected that the two different concrete syntaxes would be
transformed into the same AST node.

I wouldn’t have thought this would necessarily affect the ability to do
optimisations at parse time, but my experience of language
implementation is limited so I could well be wrong about that.

Jon

turnip · January 26, 2011, 3:07pm

On Wed, Jan 26, 2011 at 1:59 PM, Jon L. [email protected]
wrote:

Robert K. wrote in post #977599:

On Tue, Jan 25, 2011 at 7:09 PM, Jon L. [email protected]
wrote:

regardless of the cause and effect it is still surprising (to me) that
the two conditional syntaxes have different semantics

Why do you find that surprising?

I guess I expected that the two different concrete syntaxes would be
transformed into the same AST node.

But that would lead to a situation where a local variable declared in
an if or else branch of a regular “if” would be detected as defining
the variable before the condition. Then one would consequently have
to recognize all local variables in scope because otherwise the
language rules would be too complex to understand and thus error
prone. But if you do that you get tree traversal again vs. the simple
start from “def” and record all local variables found so far.

Kind regards

robert

turnip · January 26, 2011, 11:49am

On Tue, Jan 25, 2011 at 7:09 PM, Jon L. [email protected]
wrote:

regardless of the cause and effect it is still surprising (to me) that
the two conditional syntaxes have different semantics

Why do you find that surprising? Parse time is exactly the time when
things are done like

determining which code sequence is a method call
determining which code sequence is a class definition
…
and also what code sequence denotes a variable name - and the kind
of variable (instance, local, class)

Ruby is dynamic but this does not mean that source code has arbitrary
semantic. I guess the reason that a following assignment to a local
variable has no effect on prior usage of the identifier is compiler
efficiency. If the assignment can be anywhere in local scope you need
more complex lookups or a second pass whereas with the current rule
you simply collect local variables you have seen so far and check each
identifier occurrence against the current set of known local
variables.

You might want to argue that one could make an exception for postfix
statements but that exception would only be efficient for the simple
case we discussed and be much harder for more complex expressions
before the postfix “if” (these expressions can even span multiple
lines). Also, the rule would be more complicated which is always bad
because that would hinder learning the language (and maybe also bug
hunting).

Frankly, I don’t find this elegant at all.

It’s a matter of taste. I agree with you that in almost all situations
that assignment during a conditional test is not nice.

I’d say it’s not only taste: you should be aware of the implications.
The warning in the assignment is there for a good reason, namely
because it’s a typical typing error to type “=” instead of “==”. This
will also make the code harder to read IMHO and might left the reader
wondering whether it was intentional or caused by named typing error.

However, I liked this solution precisely because the local var was named
exactly the same as the method. So the way I read it in my head was
“call bla with foo, and oh btw, don’t call the foo method twice”. So in
this situation the ‘if foo = self.foo’ was purely a performance tweak
and nothing else. A side note, if you will - it didn’t change the
semantics of the statement.

But as I say, it’s a matter of taste. And it doesn’t work anyway

Well, at least you can do

foo = foo() and bla(foo)

Kind regards

robert