Beautiful Code : Pity he didn't ask here

amin1426 · July 10, 2007, 2:16am

Just got the O’Reilly announcement of the book “Beautiful
Code”… here is the sample chapter by Tim B. on Ruby!

Pity he didn’t run his examples by this forum first, I’m sure we could
have made them more beautiful…

Well, maybe for the 2nd edition, heres my bash at cleaning it up…

For example scanning a log file…

EXAMPLE 4-2 . Printing article names
1 ARGF.each_line do |line|
2 if line =~ %r{GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ .]+)
}
3 puts $1
4 end
5 end

Since most Linux distro’s roll the log files to a reasonable size I
would have just gone with…

ARGF.read.scan( %r{GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ .]+)
}) do |match|
puts $1
end

In Example 4.3 he has…
1 counts = {}
2 counts.default = 0

Personally I prefer…
counts = Hash.new(0)
or
counts = Hash.new{|hash,key| hash[key]=0}
buts that’s personal preference I guess.

In example 4.4 he rightly identifies line 10 as a little ugly and
bemoans the lack of sort_by_value in hash.
10 keys_by_count = counts.keys.sort { |a, b| counts[b] <=> counts[a]
}
it seems he doesn’t know about…

count.keys.sort_by{|key| count[key]}

Of course he could have done…

class Hash
def sort_keys_by_value
keys.sort_by{|key| fetch(key)}
end
end

Example 4.5 looks like a poster child for the Hash.new block approach…
4 @hash = {}
.
.

9 if @hash[s[0]]
10 @hash[s[0]] << [ s[1], article ]
11 else
12 @hash[s[0]] = [ s[1], article ]
13 end

Replace that with…

4 @hash = Hash.new{|hash,key| hash[key] = []}

10 @hash[s[0]] << [ s[1], article ]

That makes it real pretty code.

In fact, I can’t find any pretty reason to have the class BigHash at
all…looks like his Java roots are showing through.

Personally I’d just drop it.

Partly the point of the exercise here is it is slow because it is so
big, a couple of things I wonder about…

For example I would time how long it takes to just read the file and do
nothing…

He mentions the program grew to 1.56gb

Well, he is creating an Array per log entry.

Maybe if he created two hashes instead of an array per entry.

Incidently, I wouldn’t have chosen that chunk of Java for illustrating
the binary search…

glibc’s implementation is just so clean…

/* Perform a binary search for KEY in BASE which has NMEMB elements
of SIZE bytes each. The comparisons are done by (*COMPAR)(). */
void *
bsearch (const void *key, const void *base, size_t nmemb, size_t size,
int (*compar) (const void *, const void *))
{
size_t l, u, idx;
const void *p;
int comparison;

l = 0;
u = nmemb;
while (l < u)
{
idx = (l + u) / 2;
p = (void *) (((const char *) base) + (idx * size));
comparison = (*compar) (key, p);
if (comparison < 0)
u = idx;
else if (comparison > 0)
l = idx + 1;
else
return (void *) p;
}

return NULL;
}

On the other hand both have the classic bug that (high + low) or (l+u)
overflows on large arrays…

John C. Phone : (64)(3) 358 6639
Tait Electronics Fax : (64)(3) 359 4632
PO Box 1645 Christchurch Email : [email protected]
New Zealand

amin1426 · July 10, 2007, 2:54am

On Jul 9, 2007, at 7:15 PM, John C. wrote:

5 end

Since most Linux distro’s roll the log files to a reasonable size I
would have just gone with…

ARGF.read.scan( %r{GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/
[^ .]+) }) do |match|
puts $1
end

Hmm, I don’t think we should slurp when we don’t need too. There’s
no reason to be wasteful.

I really liked the rest of your fixes though.

James Edward G. II

amin1426 · July 10, 2007, 8:56am

On Jul 9, 2007, at 17:53 , James Edward G. II wrote:

ARGF.read.scan( %r{GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/
[^ .]+) }) do |match|
puts $1
end

Hmm, I don’t think we should slurp when we don’t need too. There’s
no reason to be wasteful.

Actually there is plenty of reason to be wasteful:

We can.
It leads to code that reads like the above.
We can!
It is only a short hop to:

puts ARGF.read.scan( %r%GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/
\d\d/[^ .]+) %).join("\n")
We can!!
It is our moral obligation to use our oft-idle and underused
computers to their fullest.

There was a 7th… but I forgot what it was.

amin1426 · July 10, 2007, 9:58am

On 7/10/07, John C. [email protected] wrote:

Since most Linux distro’s roll the log files to a reasonable size I would have just gone with…

ARGF.read.scan( %r{GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ .]+) }) do |match|
puts $1
end

ARGF.read.scan( %r{GET /ongoing/When/\d{3}x/(\d#{4}/\d#{4}/[^ .]+) }) do
|match|
puts $1
end

martin

amin1426 · July 10, 2007, 11:49am

On 7/10/07, Martin DeMello [email protected] wrote:

On 7/10/07, John C. [email protected] wrote:

Since most Linux distro’s roll the log files to a reasonable size I would have just gone with…

ARGF.read.scan( %r{GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ .]+) }) do |match|
puts $1
end

I am with James do not slurp in the whole string it just does not make
sense to me. I would feel very uneasy using regexps on large strings
even if right now there seems no backtracking issues to be present,
but code evolves sometimes

ARGF.read.scan( %r{GET /ongoing/When/\d{3}x/(\d#{4}/\d#{4}/[^ .]+) }) do |match|

Nope Martin read the original code again!! All the / and \ ;).
I was about to fall into the same trap.
As a matter of fact I miss all beauty in that code The regexp is
just ugly, some ideas for cosmetic surgery

def dig_dirs *args
Regexp.new args.map{|n| “\d” * n}.join(“/”)
end

puts ARGF.map { |line|
line =~ %r{ GET /ongoing/When/\d{3}x/(#{dig_dirs 4, 2, 2}/[^ .])}
$1
}.compact

But beauty lies to everybody (or was that “in the eyes of the beholder”?
).

Cheers
Robert

amin1426 · July 10, 2007, 2:14pm

John C. [email protected] writes:

Just got the O’Reilly announcement of the book “Beautiful
Code”… here is the sample chapter by Tim B. on Ruby!

…

For example scanning a log file…

Since most Linux distro’s roll the log files to a reasonable size I would have just gone with…

ARGF.read.scan( %r{GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ .]+) }) do |match|
puts $1
end

As James said, I’m going to have to disagree on this, though I’d clean
it up to:

article_pattern =
%r{GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ .]+) }
ARGF.each_line do |line|
line.scan(article_pattern) do |article,|
puts article
end
end

Look, no global variables!

I agree with your use of Hash.new(0), but disagree that it’s mere
personal preference - it’s much more common to use even the single
argument form in the constructor than to use default=.

It is odd that he seems unaware of Array.sort_by - anyone know when
this was added to Ruby core? The book could be rather out of date.

Example 4.5 looks like a poster child for the Hash.new block approach…

Agreed. I wonder if perhaps he’s spent more time in python or some
other dynamic language than Ruby.

In fact, I can’t find any pretty reason to have the class BigHash at
all…looks like his Java roots are showing through.

Yeah; in Java, you can’t have code existing outside of a class, and I
suspect that’s a big part of it.

On the other hand both have the classic bug that (high + low) or
(l+u) overflows on large arrays…

Actually, his code doesn’t: java arrays can only have 2**31 entries
(since their index is a java int), and java ints are always signed.
Therefore, any overflow in the addition in java yields a negative
integer, and >>> in java does right shift without sign extension.
(That is, with 0s shifted in at the left).

The glibc code does have that problem, but only on arrays that have
more than (maximum size_t)/2 elements.

amin1426 · July 10, 2007, 12:11pm

On 7/10/07, John C. [email protected] wrote:

Just got the O’Reilly announcement of the book "Beautiful

In Example 4.3 he has…
1 counts = {}
2 counts.default = 0

count.keys.sort_by{|key| count[key]}

Of course he could have done…

class Hash
def sort_keys_by_value
keys.sort_by{|key| fetch(key)}
end
end
or

tops = counts.sort_by{|*kv| kv.last}.reverse[0…9]
puts tops.map{ |k,v| “#{v}: #{k}”}

#each is just a primitive for Enumerables, right?

12 @hash[s[0]] = [ s[1], article ]
That makes it real pretty code.
Save for

File.open(some_name).each_line

Let the GC close the file!? It is not going to happen !!

Robert

amin1426 · July 10, 2007, 2:34pm

Hello.

On 7/10/07, John C. [email protected] wrote:

count.keys.sort_by{|key| count[key]}

IMHO, the most elegant way to do this is:

hash.sort_by { |k, v| v }.map { |k, v| k }

…where k = key and v = value.

Cheers,

amin1426 · July 10, 2007, 2:42pm

On Jul 10, 2007, at 08:27 , Christoffer S. wrote:

Cheers,

Ack, pet peeve.

hash.sort_by { | key, value | value } .map { | key, value | key }

If you have to specify “where k = key and v = value” then these
should have been used in your code.

Always favor readability at the expense of verbosity, both your
future self and whoever else maintains your code will thank you.

amin1426 · July 10, 2007, 3:15pm

On 7/10/07, Wayne E. Seguin [email protected] wrote:

…where k = key and v = value.
Always favor readability at the expense of verbosity, both your
future self and whoever else maintains your code will thank you.

I find myself always using |k,v|, when used on something hashlike I
don’t think readability suffers.

amin1426 · July 10, 2007, 3:24pm

On Jul 10, 2007, at 8:13 AM, Gregory B. wrote:

Always favor readability at the expense of verbosity, both your
future self and whoever else maintains your code will thank you.

I find myself always using |k,v|, when used on something hashlike I
don’t think readability suffers.

I’ve also recently adopted the trick of using _ as an unused
parameter name. I believe it was Ara that first suggested this and I
think it’s a great idea:

hash.sort_by { |key, _| … }…

James Edward G. II

amin1426 · July 10, 2007, 4:07pm

“Robert D.” [email protected] writes:

On 7/10/07, John C. [email protected] wrote:

Just got the O’Reilly announcement of the book "Beautiful

That makes it real pretty code.

Save for

File.open(some_name).each_line

Let the GC close the file!? It is not going to happen !!

Good point. In many ways, it feels that the author is not as
comfortable with Ruby blocks as one should be. A better version,
since we also don’t need a separate class:

def get_big_hash(filename)
hash = Hash.new {|h,k| h[k]=[]}
lines = 0
File.open(filename) do |file|
file.each_line do |line|
s = line.split
article = s[2].intern
hash[s[0]] << [ s[1], article ]
lines += 1
STDERR.puts “Line: #{lines}” if (lines % 100_000) == 0
end
end
return hash
end

I really like the block-using form of File.open. It makes me feel all
nice and safe like being wrapped in with-open-file or a carefully
constructed dynamic-wind.

amin1426 · July 10, 2007, 2:15pm

Hi –

On Tue, 10 Jul 2007, John C. wrote:

In Example 4.3 he has…
1 counts = {}
2 counts.default = 0

Personally I prefer…
counts = Hash.new(0)
or
counts = Hash.new{|hash,key| hash[key]=0}
buts that’s personal preference I guess.

Yes, but be careful: your two lines don’t do the same thing as each
other:

irb(main):007:0> counts = Hash.new(0)
=> {}
irb(main):008:0> counts[“hi”]
=> 0
irb(main):009:0> counts
=> {}
irb(main):010:0> counts = Hash.new{|hash,key| hash[key]=0}
=> {}
irb(main):011:0> counts[“hi”]
=> 0
irb(main):012:0> counts
=> {“hi”=>0}

def sort_keys_by_value
keys.sort_by{|key| fetch(key)}
end
end

It’s best not to get into changing the core language, though, if the
book isn’t going to discuss the benefits and pitfalls of doing so in
depth.

David

amin1426 · July 10, 2007, 4:26pm

On 10/07/07, James Edward G. II [email protected] wrote:

I’ve also recently adopted the trick of using _ as an unused
parameter name. I believe it was Ara that first suggested this and I
think it’s a great idea:

hash.sort_by { |key, _| â€¦ }â€¦

I thought it came from Haskell, where _ is a wildcard parameter:

Prelude> let power _ 0 = 1
Prelude> power 4 0
1
Prelude> power 100 0
1
Prelude> power 4 2
*** Exception: :1:4-16: Non-exhaustive patterns in function
power
Prelude> let power a b = a ** b
Prelude> power 4 2
16.0

… however, I wouldn’t be surprised to find that Haskell took it from
some other language.

Paul.

amin1426 · July 10, 2007, 4:28pm

On Jul 10, 2007, at 9:24 AM, Paul B. wrote:

On 10/07/07, James Edward G. II [email protected] wrote:

I’ve also recently adopted the trick of using _ as an unused
parameter name. I believe it was Ara that first suggested this and I
think it’s a great idea:

hash.sort_by { |key, _| … }…

I thought it came from Haskell, where _ is a wildcard parameter:

I meant that I believe it was Ara who recommended it as a good name
for an unused block parameter in Ruby.

James Edward G. II

amin1426 · July 10, 2007, 4:48pm

On Jul 9, 2007, at 5:15 PM, John C. wrote:

Just got the O’Reilly announcement of the book “Beautiful
Code”… here is the sample chapter by Tim B. on Ruby!

Beautiful Code [Book]

Pity he didn’t run his examples by this forum first, I’m sure we could
have made them more beautiful…

Yep, mostly because for some reason I didn’t know about sort_by. I
wonder how I missed that? Obviously, the main point of my article
was to encourage people to use the idiom of reading lines, picking
'em apart with a regex, and accumulating in a hash, invented in awk,
brought to the world in perl, perfected in Ruby.

sort_by aside, I’m not convinced that any of the alternatives to my
main loop are more beautiful or readable. I personally think that
the reason Ruby will enjoy an ever increasing market share in the
programming-languages space is because it’s more readable.

I’m looking at the following…

ARGF.read.scan( %r{GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/
[^ .]+) }) do |match|
puts $1
end

Really? Considering most of the readership is new to Ruby?

In Example 4.3 he has…
1 counts = {}
2 counts.default = 0

Personally I prefer…
counts = Hash.new(0)
or
counts = Hash.new{|hash,key| hash[key]=0}
buts that’s personal preference I guess.

Gosh, I think I totally disagree on readability grounds.

-Tim

amin1426 · July 10, 2007, 4:50pm

On Jul 10, 2007, at 5:11 AM, Daniel M. wrote:

Look, no global variables!

Yep, much better, I wish I’d done it that way.

I agree with your use of Hash.new(0), but disagree that it’s mere
personal preference - it’s much more common to use even the single
argument form in the constructor than to use default=.

Why? default= makes it more obvious what you’re doing. I think it’s
better practice.

Example 4.5 looks like a poster child for the Hash.new block
approach…

Agreed. I wonder if perhaps he’s spent more time in python or some
other dynamic language than Ruby.

Perl

-Tim

amin1426 · July 10, 2007, 4:55pm

On Jul 10, 2007, at 09:23 , James Edward G. II wrote:

I’ve also recently adopted the trick of using _ as an unused
parameter name. I believe it was Ara that first suggested this and
I think it’s a great idea:

hash.sort_by { |key, _| â€¦ }â€¦

James Edward G. II

That is a good idea, I also use that in several places within blocks.

amin1426 · July 10, 2007, 7:15pm

On Tue, Jul 10, 2007 at 11:47:26PM +0900, Tim B. wrote:

buts that’s personal preference I guess.

Gosh, I think I totally disagree on readability grounds.

Actually, I think this:

counts = Hash.new(0)

. . . is perfectly readable, and a bit more elegant than the two-line
version from “Example 4.3”. On the other hand, this bit looks pretty
ugly to me:

counts = Hash.new{|hash,key| hash[key]=0]}

. . . especially since it looks like someone is afraid he has a limited
number of space characters to use, and doesn’t want to run out. Even
with some spaces added to it, though, it’s not as pretty and readable.

amin1426 · July 10, 2007, 9:51pm

On Jul 10, 2007, at 05:41 , Wayne E. Seguin wrote:

Always favor readability at the expense of verbosity, both your
future self and whoever else maintains your code will thank you.

I won’t. k and v work Just Fine™ for me and whomever else I work
with. Verbosity for verbosity’s sake, esp when it pushes outside of
common and accepted idioms, is a form of mentarbation.