Pythonic indentation (or: beating a dead horse)


#1

Greetings, folks. First time poster, so if I breach
any etiquette I sincerely apologize. I’m a bit of a Ruby N. who’s
been
bouncing around between Python and Ruby, not entirely satisfied with
either
and wishing both were better. Two years ago I had no familiarity with
either
language, then I quit working for Microsoft and learned the true joy
of
programming in dynamic languages.

I am not a zealot and have little tolerance for zealotry, and I have
no
desire to get involved in holy wars. I’m a little apprehensive that
I’m
about to step into one, but here goes anyway. In general, I prefer
Ruby’s
computational model to Python’s. I think code blocks are cool, and I
love Ruby’s
very flexible expressiveness. I dig the way every statement is an
expression,
and being able to return a value simply by stating it rather than
using the
‘return’ keyword. I hate Python’s reliance on global methods like len
() and
filter() and map() (rather than making thesem methods members of the
classes
to which they apply) and I absolutely loathe its reliance on magic
method names. Ruby’s ability to reopen and modify any class kicks
ass, and
any Python fan who wants to deride “monkeypatching” can shove it. It
rocks.

That being said, I think monkeypatching could use some syntactic sugar
to
provide a cleaner way of referencing overridden methods, so instead
of:

module Kernel
alias oldprint print
def print(*args)
do_something
oldprint *(args + [" :-)"])
end
end

…maybe something like this:

module Kernel
override print(*args)
do_something
overridden *(args + [" :-)"])
end
end

But I digress… the purpose of this post is to talk about one of the
relatively
few areas where I think Python beats Ruby, and that’s syntatically-
significant
indentation.

Before I get into it, let me say to those of you whose eyes are
rolling way
back in your skulls that I have a proposal that will allow you to keep
your
precious end keyword if you insist, and will be 100% backward
compatible with
your existing code. Skip down to “My proposal is” if you want to cut
to the
chase.

When I encounter engineers who don’t know Python, I sometimes ask them
if they’ve heard anything about the language, and more often than not,
they
answer, “Whitespace is significant.” And more often than not, they
think that’s
about the dumbest idea ever ever. I used to think the same. Then I
learned
Python, and now I think that using indentation to define scope is
awesome.
I started looking over my code in C++ and realized that if some
nefarious
person took all of my code and stripped out the braces, I could easily
write a simple script in either Python or Ruby :wink: to restore them,
because
their locations would be completely unambiguous: open braces go right
before
the indentation level increases, close braces go right before it
decreases. And
having gotten used to this beautiful way of making code cleaner, I
hate that
Ruby doesn’t have it.

I’ve read the two-year-old thread at
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/252034
(well, most of it, anyway) and I’ll answer some of the objections
raised
in it, but first let me illustrate the scope of the problem with the
output
of a quick and dirty script I wrote:

2: 4082
1: 16505

My friends, when ONE OUT OF EVERY SIX of your code lines consists of
just the
word “end”, you have a problem with conciseness. I recognize that
syntactically-
significant indentation is not perfect, and it would bring a few pain
points
with it. But let me say that again: ONE OUT OF EVERY SIX LINES, for
crying out
loud! This should be intolerable to engineers who value elegance.
“Streaks”
means what you’d expect: there are four places in the scanned files
that look
like this:

        end
      end
    end
  end
end

end
end

This is not DRY. Or anything remotely resembling it. This is an
ugly blemidh on a language that otherwise is very beautiful. The
problem of
endless ends is exacerbated by Ruby’s expressiveness, which lends
itself to very short methods, which can make defs and ends take up a
large amount of space relative to lines of code that actually do
something.

Even if you can find some ways in which the explicit “end” keyword is
preferable
to letting indentation do the talking… one out of every six lines.

Matz’s objections in the cited thread were:

  • tab/space mixture

Well, tough. Programmers shouldn’t be using freakin’ tabs anyway, and
if they
are, they definitely shouldn’t be mixing them with spaces. I don’t
think
it’s worthwhile to inflate the code base by a staggering 20% to
accommodate
people who want to write ugly code, mixing tabs and spaces when
there’s no
reason to. And if for some reason it’s really, really critical to
support this
use case, there could be some kernel-level method for specifying how
many
spaces a tab equates to, so the interpreter can figure out just how
far indented
that line with the tabs is.

  • templates, e.g. eRuby

Not having used eRuby and therefore not being very familiar with it, I
don’t
want to comment on specifics other than to note that plenty of Python-
based
template systems manage to get by.

  • expression with code chunk, e.g lambdas and blocks

I don’t really see the problem. My blocks are generally indented
relative to
the context to which they’re being passed, isn’t that standard?

My proposal is to, first, not change a thing with respect to existing
syntax. Second, steal the : from Python and use it to signify a scope
that’s marked by indentation:

while some_condition
# this scope will terminate with an ‘end’ statement
do_something
end

while some_condition:
# this scope will terminate when the indentation level decreases
to the
# level before it was entered
do_something

%w{foo bar baz}.each do |val|
print val
end

%w{foo bar baz}.each do |val|:
print val

A valid objection that was raised in the earlier thread was regarding
a
quick and easy and common debugging technique: throwing in print
statements

def do_something(a, b, c)
print a, b, c # for debugging purposes
a + b + c
end

def do_something(a, b, c):
print a, b, c # error! unexpected indentation level
a + b + c
end

We can get around this by saying that braces, wherever they may
appear,
always define a new scope nested within the current scope, regardless
of
indentation.

def do_something(a, b, c):
{ print a, b, c } # this works
a + b + c

Alternatively, perhaps a character that’s not normally valid at the
start of a
line (maybe !) could tell the interpreter “treat this line as though
it were
indented to the level of the current scope”:

def do_something(a, b, c):
!print a, b, c
a + b + c

Well, I think that’s probably enough for my first post. Thank you for
your
time, and Matz, thanks for the language. Thoughts, anyone?

–J


#2

J Haas wrote:

My friends, when ONE OUT OF EVERY SIX of your code lines consists of
just the

John McCain, is that you? :wink:

word “end”, you have a problem with conciseness. I recognize that
syntactically-

A line consisting of just /\s+end/ is of very low complexity, however.

        end
      end
    end
  end
end

end
end

This is not DRY. Or anything remotely resembling it. This is an
ugly blemidh on a language that otherwise is very beautiful.

It’s a blemish all right, but not on the language.


#3

On Tue, May 19, 2009 at 17:40, J Haas removed_email_address@domain.invalid wrote:

Well, tough. Programmers shouldn’t be using freakin’ tabs anyway, and
if they are, they definitely shouldn’t be mixing them with spaces. I don’t
think it’s worthwhile to inflate the code base by a staggering 20% to
accommodate people who want to write ugly code, mixing tabs and spaces when
there’s no reason to. And if for some reason it’s really, really critical to
support this use case, there could be some kernel-level method for specifying how
many spaces a tab equates to, so the interpreter can figure out just how
far indented that line with the tabs is.

Well, tough?

I like to use tabs to indent, and spaces to align. (a la
http://www.emacswiki.org/emacs/IntelligentTabs )

Don’t like it? Well, tough.

All kidding aside, who cares? Write a preprocessor.

Benjamin K.


#4

Well, I found one of those breaches of etiquette I was worried
about… apparently wrapping my lines at 80 characters was not a good
idea. Sigh.

On May 19, 2:59 pm, Joel VanderWerf removed_email_address@domain.invalid wrote:

John McCain, is that you? :wink:

C’mon, you know perfectly well that John McCain can’t use a
computer. :stuck_out_tongue:

word “end”, you have a problem with conciseness. I recognize that
syntactically-

A line consisting of just /\s+end/ is of very low complexity, however.

Takes up just as much vertical space on the screen as the most complex
line you’ll ever see. And even so… very low complexity or not, it’s
unnecessary, which means any degree of complexity above zero is bad.

This is not DRY. Or anything remotely resembling it. This is an
ugly blemidh on a language that otherwise is very beautiful.

It’s a blemish all right, but not on the language.

If not the language, then where? In the library code? Maybe those four
places where “end” is repeated seven consecutive times are poorly
engineered and could be refactored, but how about the nearly thousand
times “end” is repeated three or more times? Is every one of those the
result of poor engineering on the part of the library programmers, or
were at least some of them forced on the programmers by the language?

One statistic that I didn’t print out from my script was that there
are an average of 135 lines of “end” per file. For a language that
prides itself on expressiveness and brevity, this is just plain silly.

–J


#5

Let me get right to the heart of the issue here. It really comes down
to
this:

On Tue, May 19, 2009 at 3:40 PM, J Haas removed_email_address@domain.invalid wrote:

I think code blocks are cool, and I love Ruby’s very flexible
expressiveness. I dig the way every statement is an expression

These are both incompatible with a Python-style indentation sensitive
syntax. You can have the Pythonic indent syntax or a purely expression
based grammar with multi-line blocks. You can’t have both.

In Python, all indent blocks are statements. This is why Python can’t
have
multi-line lambdas using Python’s indent rules: lambdas are only useful
as
expressions, but all indent blocks in Python are statements. The same
issue
carries over to blocks, as a good deal of the time you want a method
which
takes a block to return a value (e.g. map, inject, filter, sort, grep)

There’s quite an interesting interplay of design decisions to make
Python’s
indent-sensitive grammar work the way it does. Indent blocks in Python
have
no terminator token, whereas every expression in a Ruby-like grammar
must be
terminated with “;” or a newline. This works because Python’s
expressions
are a subset of its statements, so it can have different rules for
statements versus expressions.

Implicit line joining works in Python because the only syntactic
constructions which can exist surrounded in […] (…) {…} tokens are
expressions, so you can’t put an indent block inside of these. If you
have
an indent-sensitive Ruby with implicit line joining, you limit the
expressiveness of what you can do inside any syntactic constructs
enclosed
by these tokens.

If you want to have indent blocks in a purely expression-based grammar,
you
need to use a syntax more like Haskell. I’ve seen a somewhat
Python-looking
language called Logix which uses Haskell’s indent rules. It was created
by
Tom L., who has since gone on to author Hobo in Ruby, and for what
it’s
worth now says he prefers Ruby’s syntax. Go figure.

P.S. I tried to make a Ruby-like language with an indentation-sensitive
syntax. These are the lessons I learned. I gave up and added an “end”
keyword.


#6

On May 19, 3:20 pm, Benjamin K. removed_email_address@domain.invalid wrote:

Well, tough?

I could probably have found a more tactful way of putting this. Sorry.

I like to use tabs to indent, and spaces to align. (a lahttp://www.emacswiki.org/emacs/IntelligentTabs)

This wouldn’t be a problem, at least it’s not a problem in Python and
needn’t be a problem in Ruby. Having an unclosed paren, bracket, or
brace results in automatic line continuation and you can put whatever
combination of spaces and tabs you’d like on the next line. It’ll be
logically considered part of the line before.

Also, I should add that mixing tabs and spaces would only be a problem
if you did something like this: (leading dots represent spaces)

…while some_condition:
\t\tdo_something # interpreter can’t tell indentation level here

You could freely mix tabs and spaces as long as they match up from the
start:

…while some_condition:
…\tdo_something # interpreter can tell that this indentation
level is “one more” than previous

All kidding aside, who cares? Write a preprocessor.

Don’t need to; it’s already been done. But I’d rather see the language
improved.

–J


#7

On May 19, 3:23 pm, Tony A. removed_email_address@domain.invalid wrote:

On Tue, May 19, 2009 at 3:40 PM, J Haas removed_email_address@domain.invalid wrote:

I think code blocks are cool, and I love Ruby’s very flexible
expressiveness. I dig the way every statement is an expression

These are both incompatible with a Python-style indentation sensitive
syntax. You can have the Pythonic indent syntax or a purely expression
based grammar with multi-line blocks. You can’t have both.

I’m having a hard time following why. Can you provide an example of a
Ruby snippet that couldn’t be done with scoping defined by
indentation?

In Python, all indent blocks are statements. This is why Python can’t have
multi-line lambdas using Python’s indent rules: lambdas are only useful as
expressions, but all indent blocks in Python are statements.

This seems like a problem with Python, not a problem with indentation.

The same issue
carries over to blocks, as a good deal of the time you want a method which
takes a block to return a value (e.g. map, inject, filter, sort, grep)

Again, I would really like to see an example of the sort of thing
you’d want to do here that simply requires “end” to work.

There’s quite an interesting interplay of design decisions to make Python’s
indent-sensitive grammar work the way it does. Indent blocks in Python have
no terminator token, whereas every expression in a Ruby-like grammar must be
terminated with “;” or a newline.

Well, every expression in a Ruby-like grammar must be terminated by a
token. What that token must be depends on the grammar. Why not
something like this? (and please forgive the highly unorthodox
pseudocode syntax)

parse_line_indent:
if indentation = previous_line_indentation: do_nothing
if indentation > previous_line_indentation:
push_indentation_to_indent_stack_and_enter_new_scope
if indentation < previous_line_indentation:
while indentation > top_of_indent_stack:
insert_backtab_token # here’s your statement terminator
pop_top_of_indent_stack
if indentation != top_of_indent_stack: raise IndentationError

In other words, the parser treats an indentation level less than the
indentation level of the previous line as a statement-terminating
token.

Implicit line joining works in Python because the only syntactic
constructions which can exist surrounded in […] (…) {…} tokens are
expressions, so you can’t put an indent block inside of these. If you have
an indent-sensitive Ruby with implicit line joining, you limit the
expressiveness of what you can do inside any syntactic constructs enclosed
by these tokens.

This sorta makes sense but I’d really like to see a concrete example
of what you’re talking about. It doesn’t seem like this would be an
insurmountable difficulty but it’s hard to say without the example.

If you want to have indent blocks in a purely expression-based grammar, you
need to use a syntax more like Haskell.

Being completely unfamiliar with Haskell (functional programming’s
never been my strong suit) I can’t really comment.

P.S. I tried to make a Ruby-like language with an indentation-sensitive
syntax. These are the lessons I learned. I gave up and added an “end”
keyword.

I’ll be glad to take the benefit of your practical experience, but at
the risk of seriously violating DRY, some sort of demonstration of
something that you can do with “end” but couldn’t do with indentation
would be nice.

–J


#8

On Tue, May 19, 2009 at 4:50 PM, J Haas removed_email_address@domain.invalid wrote:

I’m having a hard time following why. Can you provide an example of a
Ruby snippet that couldn’t be done with scoping defined by
indentation?

A multi-line block returning a value, e.g.

foo = somemethod do |arg1, arg2, arg3|
x = do_something arg1
y = do_something_else x, arg2
and_something_else_again y, arg3
end

Or for that matter, a multi-line lambda:

foo = lambda do |arg1, arg2, arg3|
x = do_something arg1
y = do_something_else x, arg2
and_something_else_again y, arg3
end

I’m sure you’re aware the “multi-line lambda” problem is somewhat
infamous
in the Python world. Guido van Rossum himself has ruled it an
“unsolvable
problem” because of the statement-based nature of Python indent blocks.
Lambdas must be expressions or they are worthless, and there is no way
to
embed an indent block inside of a Python expression.

And a bit of supplemental information: I conducted a poll of what
Rubyists’
favorite features are in the language. Blocks were #1 by a wide margin.

This seems like a problem with Python, not a problem with indentation.

As I said, a Haskell-like syntax would facilitate including indent
blocks in
a purely expression-based grammar. It’s Python’s statement-structured
syntax that’s incompatible. However the sort of syntax you would get
from a
Haskell-like approach is going to be different than Python’s.

You can have a look at Logix, which is a purely expression based
language
which tries to mimic Python’s syntax while using Haskell-styled indent
rules. This is about the best you can do:

http://web.archive.org/web/20060517203300/www.livelogix.net/logix/tutorial/3-Introduction-For-Python-Folks.html

Well, every expression in a Ruby-like grammar must be terminated by a

token. [… snip …]

In other words, the parser treats an indentation level less than the
indentation level of the previous line as a statement-terminating
token.

Because there are statements which contain multiple indent blocks, such
as
if or try/catch. If you wanted to carry over Rubyisms, this would
include
the case statement, e.g.

case foo
when bar

when baz

Therefore you can’t just treat a “dedent” as a statement terminator,
because
a single statement may itself contain multiple “dedent” tokens.

The best solution I could think of for this was a syntactically relevant
blank line, which sucks. It also requires lexer logic more complex than
Python to handle the case of a syntactically relevant newline, which in
turn
pollutes the grammar.

of what you’re talking about. It doesn’t seem like this would be an
insurmountable difficulty but it’s hard to say without the example.

This is valid Ruby:

on_some_event(:something, :filter => proc do
something_here
another_thing_here
etc
end)

Implicit line joining removes any newline tokens inside of (…) […]
{…}
type syntactic constructions. So it becomes impossible to embed
anything
with an indent block inside of expressions enclosed in any of these
tokens.

And now we’ve hit an entirely new can of worms: how do you make implicit
line joining work when parens are optional?


#9

On Tue, May 19, 2009 at 19:49, Eric H. removed_email_address@domain.invalid wrote:

Does anybody complain about terminating ‘}’ in C, C++ or Java?

Python programmers?

:slight_smile:

 Does anybody
complain about terminating ‘.’ on sentences? Â (There’s a following capital
letter for disambiguation!)

I agree with your point, but I don’t this argument helps - computer
languages and human languages are two fairly distinct classes, with
different origins, requirements, and, um…parsers.

In my opinion they aren’t always comparable.

Ben Kudria


#10

On May 19, 2009, at 15:25, J Haas wrote:

This is not DRY. Or anything remotely resembling it. This is an
ugly blemidh on a language that otherwise is very beautiful.

It’s a blemish all right, but not on the language.

If not the language, then where? In the library code? Maybe those four
places where “end” is repeated seven consecutive times are poorly
engineered and could be refactored,

They almost certainly could be, this is a sign of strong code-smel

but how about the nearly thousand
times “end” is repeated three or more times? Is every one of those the
result of poor engineering on the part of the library programmers, or
were at least some of them forced on the programmers by the language?

One statistic that I didn’t print out from my script was that there
are an average of 135 lines of “end” per file. For a language that
prides itself on expressiveness and brevity, this is just plain silly.

Does anybody complain about terminating ‘}’ in C, C++ or Java? Does
anybody complain about terminating ‘.’ on sentences? (There’s a
folowing capital letter for disambiguation!) I think we need to
remove all useles constructs from all languages


#11

On May 19, 2009, at 16:58, Benjamin K. wrote:

capital
letter for disambiguation!)

I agree with your point, but I don’t this argument helps - computer
languages and human languages are two fairly distinct classes, with
different origins, requirements, and, um…parsers.

In my opinion they aren’t always comparable.

I may have ben too sutle Maybe your email program spel-chex I
certainly didnt have useles double leters in my original


#12

On Tue, May 19, 2009 at 5:40 PM, J Haas removed_email_address@domain.invalid wrote:

Well, I think that’s probably enough for my first post. Thank you for
your
time, and Matz, thanks for the language. Thoughts, anyone?

Implement it, post it on github, then post back here and see if people
like it. These conversations in which people pretend to try to
convince one another just to assert their views are much less
productive than just solving whatever the original problem is.

-greg


#13

On Tue, May 19, 2009 at 20:11, Eric H. removed_email_address@domain.invalid wrote:

I may have ben too sutle  Maybe your email program spel-chex  I certainly
didnt have useles double leters in my original

Doh :slight_smile:

I have a compulsive habit of correcting (perceived!) typos in quotes.
I should probably stop.

Ben


#14

On Wed, May 20, 2009 at 08:58:44AM +0900, Benjamin K. wrote:

I agree with your point, but I don’t this argument helps - computer
languages and human languages are two fairly distinct classes, with
different origins, requirements, and, um…parsers.

In my opinion they aren’t always comparable.

Did you include Comparable and implement <=> ?


#15

J Haas wrote:

I’m a bit of a Ruby N.

I have a proposal

Funny how suggestions for radical changes mainly come from people who,
by their own admission, have not used Ruby seriously in its current
form. But I certainly don’t hold this against anyone, because I was the
same myself at first.

Space-delimited syntax has its place: it works well for HAML, which I
love. But I’d hate it for Ruby. I want to be able to disable blocks of
code by wrapping them with

if false

end

and generally throw code around without having to re-indent it (even
though I do normally stick strongly to standard indentation). In
practice, it’s much less frequent that I comment out a block of HAML,
say.

There’s one case where the current behaviour does trip me up, and
that’s in DSLs. For example:

context “a test” do <<<
setup do <<<
@foo = Foo.new
end
should “be empty” do <<<
assert @foo.empty?
end
end

Miss one of the magic 'do’s and you get an error later (perhaps much,
much later, say at the end of the file). These can be hard to find;
sometimes I resort to a binary chop. However if I happen to have ruby1.9
lying around I can run it through that, and it gives me warnings about
where indentation is not as expected.

But even then, this does not bug me as much as having Python syntax
would.

Of course, syntax in itself adds nothing to the functionality of the
language, but people have extremely strong preferences. LISP programmers
are strongly wedded to its syntax; Python programmers are strongly
wedded to its syntax too. So if you like Python syntax (and that’s more
important to you than other language features), then program in Python.


#16

On May 19, 4:49 pm, Eric H. removed_email_address@domain.invalid wrote:

Does anybody complain about terminating ‘}’ in C, C++ or Java?

I do, now. Redundancy irritates me.

Does anybody complain about terminating ‘.’ on sentences?

If the period accounted for one-sixth of English text, perhaps they
would.


#17

On May 19, 2009, at 5:40 PM, J Haas wrote:

My friends, when ONE OUT OF EVERY SIX of your code lines consists of
just the word “end”, you have a problem with conciseness. I

Yes, it’s a problem. There is no point in pretending otherwise. The
language
would be even better if this issue were solved. It sounds like it
could be
adequately solved with some type of meaningful indentation.

It doesn’t seem like it has to be one way or another. You could have
meaningful indentation and still use end for ambiguous cases. That
would be
nice. This would also be consistent with Ruby’s optional parens for
function
use and declaration. Backward compatibility would also make it
possible to
make the transition.


#18

On May 20, 2009, at 10:10 AM, J Haas wrote:

Ugh, pass. I’ve wasted far too much of my life coding what I thought
were useful features for open-source projects only to find that the
committers didn’t share my opinion. I ain’t touching something like
this unless there’s at least some reasonable chance the patch might
actually get accepted.

One of the nice advantages of an open source project like ruby is that
you can fork it, and take it in directions not held by the original
developers. Become your own committer. That should be the least of
your worries. If you really do have a better mouse-trap you will have
no problems about finding people who will join you rather than the
other way around.

Cheers–

Charles

Charles J.
Advanced Computing Center for Research and Education
Vanderbilt University


#19

On May 19, 5:19 pm, Gregory B. removed_email_address@domain.invalid wrote:

Implement it, post it on github, then post back here and see if people
like it. These conversations in which people pretend to try to
convince one another just to assert their views are much less
productive than just solving whatever the original problem is.

Ugh, pass. I’ve wasted far too much of my life coding what I thought
were useful features for open-source projects only to find that the
committers didn’t share my opinion. I ain’t touching something like
this unless there’s at least some reasonable chance the patch might
actually get accepted.

If you want to try this out, as I said earlier in this thread,
preprocessors exist. And seem to work, which kind of belies the claim
that this change would be impossible.


#20

On Wed, May 20, 2009 at 11:05 AM, J Haas removed_email_address@domain.invalid wrote:

On May 19, 4:49 pm, Eric H. removed_email_address@domain.invalid wrote:

Does anybody complain about terminating ‘}’ in C, C++ or Java?

I do, now. Redundancy irritates me.

Does anybody complain about terminating ‘.’ on sentences?

If the period accounted for one-sixth of English text, perhaps they
would.

wellwhydontwegetridofallpunctuationthingslikespacescommassemicolonsquotesetcperiodcertainlytakeupmuchmoreofourprosethantheydeserveandcapitalizationeatsupvaluableverticalspaceandnoneedforparagraphseparationeither

Seriously, if you measure things by avoiding extra keystrokes, get a
better editor. I value readability over parsimony of lexical items.


Rick DeNatale

Blog: http://talklikeaduck.denhaven2.com/
Twitter: http://twitter.com/RickDeNatale
WWR: http://www.workingwithrails.com/person/9021-rick-denatale
LinkedIn: http://www.linkedin.com/in/rickdenatale