Ruby Whitespace Semantics

Almann_G · February 27, 2006, 8:45am

Can someone please explain the semantics behind the following:

irb(main):001:0> a = ( 4 + 5 )
=> 9
irb(main):002:0> a = ( 4
irb(main):003:1> + 5 )
=> 5
irb(main):004:0> a = ( 4 +
irb(main):005:1* 5 )
=> 9

The first and last statements make sense to me, but why is the second
one
returning 5?

I find semantics like this troubling, and no documentation sheds light
as
to what would cause this behavior.

Thanks,
Almann

Almann_G · February 27, 2006, 8:55am

Almann G. wrote:

The first and last statements make sense to me, but why is the second one
returning 5?

I find semantics like this troubling, and no documentation sheds light as
to what would cause this behavior.

I understand your concern. Let me try to clarify.

Expressions in Ruby can be like standalone statements. Statements are
terminated with an optional semicolon or with a newline. If a statement
is incomplete, it is understood to go on to the next line; if it is
complete, it is just as if terminated with a semicolon.

Therefore:

a = (4
+5)

is the same as

a = (4;
+5)

or even

a = (4; +5)

That is, it evaluates a “4” and then evaluates a “+5” (which then is the
resultant value, as it was the last evaluated).

But with

a = (4+
5)

the parser is able to see that the expression is not complete, and is
apparently continued on the next line.

Almann_G · February 27, 2006, 8:55am

Almann G. wrote:

I’m no expert, but I think it has to do with both first and third having
the + operator on the first line and the second one having the +
operator on the second line - note also that irb understands that the
third case is a continuation of the previous line (), but the second
case is treated almost as 2 separate statements - no ().

I’m sure someone else will have a better idea
Kev

Almann_G · February 27, 2006, 10:26am

=> 9

The first and last statements make sense to me, but why is the second one
returning 5?

I find semantics like this troubling, and no documentation sheds light as
to what would cause this behavior.
This behaviour is documented, at least here:
http://www.rubycentral.com/book/language.html

and probably in other placess too (but I wouldn’t know about them as I
am
learning Ruby for just a little more than 1 week).

Hope it helps,
Alex

Almann_G · February 27, 2006, 9:19am

Thanks, the semantic is clearer now even though I think it is very
clumsy.

As an aside, it is interesting that Ruby allows for suites of
statements to be used in a grouping context (the parenthesis)–most
other languages that are whitespace sensitive don’t allow this since it
gets confusing with expressions (as my example shows).

-Almann

Almann_G · February 27, 2006, 5:42pm

Quoting Almann G. [email protected]:

The first and last statements make sense to me, but why is the
second one returning 5?

a = ( 4 + 5 )
a = ( 4 ; + 5 )
a = ( 4 + 5 )

A line break begins a new statement unless there is a dangling
binary operator or line continuation.

-mental

Almann_G · February 27, 2006, 4:25pm

This behavior actually isn’t well documented, since the semantic is
unclear in that documentation (or the Ruby Manual) what will happen in
the case that the grouping operator is used in this manner.

In a nutshell, newline delimits statements except when it doesn’t
(trail with a binary operator for instance)… not a really good
semantic but I’ll add that to the list of gotchas I need to deal with
in Ruby.

The 1.4 English Ruby Manual says:
Each expression are delimited by semicolons( or newlines.

The current Japanese Ruby Manual essentially says the same thing (my
Japanese isn’t as sharp as it used to be):
å¼ã¨å¼ã®é??ã¯ã?»ã??ã?³ã?ã?³(;)ã¾ã?ã¯æ?¹è¡?ã§å?ºå??ã??ã¾ã?
(shiki to shiki no aida wa semikoron( mata wa kaigyou de kugirimasu)

-Almann

Almann_G · February 27, 2006, 6:16pm

It is generally good coding style in any programming language, when
continuing an expression across more than one line, to split the lines
after
an operator or other punctuation symbol which indicates that the
expression
is unfinished. I.e.

somefunction( ...very long argument list,
                     more arguments)

or

(4 +
5)

This gives the reader a visual hint that the expression is incomplete,
and
continues onto the next line. Ruby just uses the same heuristic that a
human reader would use.

~Avdi

Almann_G · February 27, 2006, 8:04pm

On 2/27/06, Mark W. [email protected] wrote:

is the same as

a = (4;
+5)

But (4 is clearly an incomplete statement.

Actually, notice the semicolon. What Hal is demonstrating is that a
parenthetical group can contain multiple expressions. The newline
terminates the expression ‘4’, which is a complete expression, while
the statement (which happens to include that expression) is continued
on the next line.

Jacob F.

Almann_G · February 27, 2006, 8:01pm

“Hal F.” [email protected] wrote in message
news:[email protected]…

is the same as

a = (4;
+5)

But (4 is clearly an incomplete statement.

Almann_G · February 27, 2006, 8:50pm

You’re not alone. I’ve adopted this practice in C programs ever since I
read about it in a book called Human Factors and Typography for More
Readable Programs, by Ronald M. Baeker and Aaron Markus, ACM Press,
1990. The authors recommend breaking long lines at operators “of
relatively low precedence” and placing the operator at the beginning of
the second line because “it emphasizes at the beginning of the
continuation that the second line is a continuation.”

Almann_G · February 27, 2006, 8:56pm

Quoting Anthony DeRobertis [email protected]:

Well, when I work in languages other than Ruby (not in ruby, of
course, because its a syntax error), I’ve always found:

foo = a + b + c …
+ z

more readable myself, but I’m probably just a weirdo.

It works fine in Ruby if you use a backslash to continue the
statement:

foo = a + b + c
+ z

-mental

Almann_G · February 27, 2006, 10:50pm

Ah. I thought a semicolon terminated a statement.

Almann_G · February 27, 2006, 8:22pm

Avdi G. wrote:

(4 +
5)

This gives the reader a visual hint that the expression is incomplete,
and
continues onto the next line. Ruby just uses the same heuristic that
a human reader would use.

Well, when I work in languages other than Ruby (not in ruby, of course,
because its a syntax error), I’ve always found:

foo = a + b + c …
+ z

more readable myself, but I’m probably just a weirdo.

Almann_G · February 27, 2006, 11:14pm

It does. 4 is a valid statement in Ruby. Ruby doesn’t have the same
sharp division as many other languages. 4 is a statement, 4 + 5 is a
statement, and +5 is a statement. The return value of a series of
statements is the value of the final statement. As a result, the value
of ‘4; +5’ is 5.

Almann_G · February 27, 2006, 11:02pm

Quoting [email protected]:

Ah. I thought a semicolon terminated a statement.

It does; it’s just that the current Ruby grammar permits multiple
statements within the same set of parenthesis (separated by
semicolons or newlines).

-mental

Almann_G · February 27, 2006, 11:33pm

On Tue, Feb 28, 2006 at 04:21:02AM +0900, Anthony DeRobertis wrote:

(…)
Well, when I work in languages other than Ruby (not in ruby, of course,
because its a syntax error), I’ve always found:

foo = a + b + c …
+ z

If this is M(atlab) then the three dots serve to signal
that the line is continued, i.e. serving the same purpose
as does <nl> in ruby. I.e. you’re explicitely stating already
that the expression continues on next line (just as you would
by adding a <nl> or ).

-m

Almann_G · February 27, 2006, 11:45pm

Quoting Timothy G. [email protected]:

It does. 4 is a valid statement in Ruby. Ruby doesn’t have the
same sharp division as many other languages.

Well… not really. There’s the same distinction in Ruby too (see
expr versus compstmt in parse.y).

It’s more that Ruby’s grammar permits statement lists in places that
many other languages only permit expressions (like inbetween
parenthesis).

-mental

Almann_G · February 28, 2006, 4:20pm

Martin S. Weber wrote:

On Tue, Feb 28, 2006 at 04:21:02AM +0900, Anthony DeRobertis wrote:

foo = a + b + c …
+ z

If this is M(atlab) then the three dots serve to signal
that the line is continued, i.e. serving the same purpose
as does <nl> in ruby.

No, that’s pseudo-code and the three dots are an ellipsis indicating
elided material. Yeah, sure, it should have been U+2026 (â?¦) â?? but I’m
lazy.

Almann_G · February 28, 2006, 5:07pm

On Wed, Mar 01, 2006 at 12:18:20AM +0900, Anthony DeRobertis wrote:

No, that’s pseudo-code and the three dots are an ellipsis indicating
elided material. Yeah, sure, it should have been U+2026 (?) ? but I’m
lazy.

snore Sorry

-Martin