Beautiful Code : Pity he didn't ask here


#1

Just got the O’Reilly announcement of the book “Beautiful
Code”… here is the sample chapter by Tim B. on Ruby!

http://www.oreilly.com/catalog/9780596510046/chapter/ch04.pdf

Pity he didn’t run his examples by this forum first, I’m sure we could
have made them more beautiful…

Well, maybe for the 2nd edition, heres my bash at cleaning it up…

For example scanning a log file…

EXAMPLE 4-2 . Printing article names
1 ARGF.each_line do |line|
2 if line =~ %r{GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ .]+)
}
3 puts $1
4 end
5 end

Since most Linux distro’s roll the log files to a reasonable size I
would have just gone with…

ARGF.read.scan( %r{GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ .]+)
}) do |match|
puts $1
end

In Example 4.3 he has…
1 counts = {}
2 counts.default = 0

Personally I prefer…
counts = Hash.new(0)
or
counts = Hash.new{|hash,key| hash[key]=0}
buts that’s personal preference I guess.

In example 4.4 he rightly identifies line 10 as a little ugly and
bemoans the lack of sort_by_value in hash.
10 keys_by_count = counts.keys.sort { |a, b| counts[b] <=> counts[a]
}
it seems he doesn’t know about…

count.keys.sort_by{|key| count[key]}

Of course he could have done…

class Hash
def sort_keys_by_value
keys.sort_by{|key| fetch(key)}
end
end

Example 4.5 looks like a poster child for the Hash.new block approach…
4 @hash = {}
.
.

9 if @hash[s[0]]
10 @hash[s[0]] << [ s[1], article ]
11 else
12 @hash[s[0]] = [ s[1], article ]
13 end

Replace that with…

4 @hash = Hash.new{|hash,key| hash[key] = []}

10 @hash[s[0]] << [ s[1], article ]

That makes it real pretty code.

In fact, I can’t find any pretty reason to have the class BigHash at
all…looks like his Java roots are showing through.

Personally I’d just drop it.

Partly the point of the exercise here is it is slow because it is so
big, a couple of things I wonder about…

For example I would time how long it takes to just read the file and do
nothing…

He mentions the program grew to 1.56gb

Well, he is creating an Array per log entry.

Maybe if he created two hashes instead of an array per entry.

Incidently, I wouldn’t have chosen that chunk of Java for illustrating
the binary search…

glibc’s implementation is just so clean…

/* Perform a binary search for KEY in BASE which has NMEMB elements
of SIZE bytes each. The comparisons are done by (*COMPAR)(). */
void *
bsearch (const void *key, const void *base, size_t nmemb, size_t size,
int (*compar) (const void *, const void *))
{
size_t l, u, idx;
const void *p;
int comparison;

l = 0;
u = nmemb;
while (l < u)
{
idx = (l + u) / 2;
p = (void *) (((const char *) base) + (idx * size));
comparison = (*compar) (key, p);
if (comparison < 0)
u = idx;
else if (comparison > 0)
l = idx + 1;
else
return (void *) p;
}

return NULL;
}

On the other hand both have the classic bug that (high + low) or (l+u)
overflows on large arrays…

John C. Phone : (64)(3) 358 6639
Tait Electronics Fax : (64)(3) 359 4632
PO Box 1645 Christchurch Email : removed_email_address@domain.invalid
New Zealand


#2

On Jul 9, 2007, at 7:15 PM, John C. wrote:

5 end

Since most Linux distro’s roll the log files to a reasonable size I
would have just gone with…

ARGF.read.scan( %r{GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/
[^ .]+) }) do |match|
puts $1
end

Hmm, I don’t think we should slurp when we don’t need too. There’s
no reason to be wasteful.

I really liked the rest of your fixes though.

James Edward G. II


#3

On Jul 9, 2007, at 17:53 , James Edward G. II wrote:

ARGF.read.scan( %r{GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/
[^ .]+) }) do |match|
puts $1
end

Hmm, I don’t think we should slurp when we don’t need too. There’s
no reason to be wasteful.

Actually there is plenty of reason to be wasteful:

  1. We can.

  2. It leads to code that reads like the above.

  3. We can!

  4. It is only a short hop to:

    puts ARGF.read.scan( %r%GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/
    \d\d/[^ .]+) %).join("\n")

  5. We can!!

  6. It is our moral obligation to use our oft-idle and underused
    computers to their fullest.

There was a 7th… but I forgot what it was.


#4

On 7/10/07, John C. removed_email_address@domain.invalid wrote:

Since most Linux distro’s roll the log files to a reasonable size I would have just gone with…

ARGF.read.scan( %r{GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ .]+) }) do |match|
puts $1
end

ARGF.read.scan( %r{GET /ongoing/When/\d{3}x/(\d#{4}/\d#{4}/[^ .]+) }) do
|match|
puts $1
end

martin


#5

On 7/10/07, Martin DeMello removed_email_address@domain.invalid wrote:

On 7/10/07, John C. removed_email_address@domain.invalid wrote:

Since most Linux distro’s roll the log files to a reasonable size I would have just gone with…

ARGF.read.scan( %r{GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ .]+) }) do |match|
puts $1
end

I am with James do not slurp in the whole string it just does not make
sense to me. I would feel very uneasy using regexps on large strings
even if right now there seems no backtracking issues to be present,
but code evolves sometimes :wink:

ARGF.read.scan( %r{GET /ongoing/When/\d{3}x/(\d#{4}/\d#{4}/[^ .]+) }) do |match|

Nope Martin read the original code again!! All the / and \ ;).
I was about to fall into the same trap.
As a matter of fact I miss all beauty in that code :frowning: The regexp is
just ugly, some ideas for cosmetic surgery :wink:

def dig_dirs *args
Regexp.new args.map{|n| “\d” * n}.join("/")
end

puts ARGF.map { |line|
line =~ %r{ GET /ongoing/When/\d{3}x/(#{dig_dirs 4, 2, 2}/[^ .])}
$1
}.compact

But beauty lies to everybody (or was that “in the eyes of the beholder”?
).

Cheers
Robert


#6

John C. removed_email_address@domain.invalid writes:

Just got the O’Reilly announcement of the book “Beautiful
Code”… here is the sample chapter by Tim B. on Ruby!

For example scanning a log file…

Since most Linux distro’s roll the log files to a reasonable size I would have just gone with…

ARGF.read.scan( %r{GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ .]+) }) do |match|
puts $1
end

As James said, I’m going to have to disagree on this, though I’d clean
it up to:

article_pattern =
%r{GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/[^ .]+) }
ARGF.each_line do |line|
line.scan(article_pattern) do |article,|
puts article
end
end

Look, no global variables!

I agree with your use of Hash.new(0), but disagree that it’s mere
personal preference - it’s much more common to use even the single
argument form in the constructor than to use default=.

It is odd that he seems unaware of Array.sort_by - anyone know when
this was added to Ruby core? The book could be rather out of date.

Example 4.5 looks like a poster child for the Hash.new block approach…

Agreed. I wonder if perhaps he’s spent more time in python or some
other dynamic language than Ruby.

In fact, I can’t find any pretty reason to have the class BigHash at
all…looks like his Java roots are showing through.

Yeah; in Java, you can’t have code existing outside of a class, and I
suspect that’s a big part of it.

On the other hand both have the classic bug that (high + low) or
(l+u) overflows on large arrays…

Actually, his code doesn’t: java arrays can only have 2**31 entries
(since their index is a java int), and java ints are always signed.
Therefore, any overflow in the addition in java yields a negative
integer, and >>> in java does right shift without sign extension.
(That is, with 0s shifted in at the left).

The glibc code does have that problem, but only on arrays that have
more than (maximum size_t)/2 elements.


#7

On 7/10/07, John C. removed_email_address@domain.invalid wrote:

Just got the O’Reilly announcement of the book "Beautiful

In Example 4.3 he has…
1 counts = {}
2 counts.default = 0

count.keys.sort_by{|key| count[key]}

Of course he could have done…

class Hash
def sort_keys_by_value
keys.sort_by{|key| fetch(key)}
end
end
or

tops = counts.sort_by{|*kv| kv.last}.reverse[0…9]
puts tops.map{ |k,v| “#{v}: #{k}”}

#each is just a primitive for Enumerables, right?

12 @hash[s[0]] = [ s[1], article ]
That makes it real pretty code.
Save for

File.open(some_name).each_line

Let the GC close the file!? It is not going to happen !!

Robert

#8

Hello.

On 7/10/07, John C. removed_email_address@domain.invalid wrote:

count.keys.sort_by{|key| count[key]}

IMHO, the most elegant way to do this is:

hash.sort_by { |k, v| v }.map { |k, v| k }

…where k = key and v = value.

Cheers,


#9

On Jul 10, 2007, at 08:27 , Christoffer S. wrote:

Cheers,

Ack, pet peeve.

hash.sort_by { | key, value | value } .map { | key, value | key }

If you have to specify “where k = key and v = value” then these
should have been used in your code.

Always favor readability at the expense of verbosity, both your
future self and whoever else maintains your code will thank you.


#10

On 7/10/07, Wayne E. Seguin removed_email_address@domain.invalid wrote:

…where k = key and v = value.
Always favor readability at the expense of verbosity, both your
future self and whoever else maintains your code will thank you.

I find myself always using |k,v|, when used on something hashlike I
don’t think readability suffers.


#11

On Jul 10, 2007, at 8:13 AM, Gregory B. wrote:

Always favor readability at the expense of verbosity, both your
future self and whoever else maintains your code will thank you.

I find myself always using |k,v|, when used on something hashlike I
don’t think readability suffers.

I’ve also recently adopted the trick of using _ as an unused
parameter name. I believe it was Ara that first suggested this and I
think it’s a great idea:

hash.sort_by { |key, _| … }…

James Edward G. II


#12

“Robert D.” removed_email_address@domain.invalid writes:

On 7/10/07, John C. removed_email_address@domain.invalid wrote:

Just got the O’Reilly announcement of the book "Beautiful

That makes it real pretty code.

Save for

File.open(some_name).each_line

Let the GC close the file!? It is not going to happen !!

Good point. In many ways, it feels that the author is not as
comfortable with Ruby blocks as one should be. A better version,
since we also don’t need a separate class:

def get_big_hash(filename)
hash = Hash.new {|h,k| h[k]=[]}
lines = 0
File.open(filename) do |file|
file.each_line do |line|
s = line.split
article = s[2].intern
hash[s[0]] << [ s[1], article ]
lines += 1
STDERR.puts “Line: #{lines}” if (lines % 100_000) == 0
end
end
return hash
end

I really like the block-using form of File.open. It makes me feel all
nice and safe like being wrapped in with-open-file or a carefully
constructed dynamic-wind.


#13

Hi –

On Tue, 10 Jul 2007, John C. wrote:

In Example 4.3 he has…
1 counts = {}
2 counts.default = 0

Personally I prefer…
counts = Hash.new(0)
or
counts = Hash.new{|hash,key| hash[key]=0}
buts that’s personal preference I guess.

Yes, but be careful: your two lines don’t do the same thing as each
other:

irb(main):007:0> counts = Hash.new(0)
=> {}
irb(main):008:0> counts[“hi”]
=> 0
irb(main):009:0> counts
=> {}
irb(main):010:0> counts = Hash.new{|hash,key| hash[key]=0}
=> {}
irb(main):011:0> counts[“hi”]
=> 0
irb(main):012:0> counts
=> {“hi”=>0}

def sort_keys_by_value
keys.sort_by{|key| fetch(key)}
end
end

It’s best not to get into changing the core language, though, if the
book isn’t going to discuss the benefits and pitfalls of doing so in
depth.

David


#14

On 10/07/07, James Edward G. II removed_email_address@domain.invalid wrote:

I’ve also recently adopted the trick of using _ as an unused
parameter name. I believe it was Ara that first suggested this and I
think it’s a great idea:

hash.sort_by { |key, _| … }…

I thought it came from Haskell, where _ is a wildcard parameter:

Prelude> let power _ 0 = 1
Prelude> power 4 0
1
Prelude> power 100 0
1
Prelude> power 4 2
*** Exception: :1:4-16: Non-exhaustive patterns in function
power
Prelude> let power a b = a ** b
Prelude> power 4 2
16.0

… however, I wouldn’t be surprised to find that Haskell took it from
some other language.

Paul.


#15

On Jul 10, 2007, at 9:24 AM, Paul B. wrote:

On 10/07/07, James Edward G. II removed_email_address@domain.invalid wrote:

I’ve also recently adopted the trick of using _ as an unused
parameter name. I believe it was Ara that first suggested this and I
think it’s a great idea:

hash.sort_by { |key, _| … }…

I thought it came from Haskell, where _ is a wildcard parameter:

I meant that I believe it was Ara who recommended it as a good name
for an unused block parameter in Ruby.

James Edward G. II


#16

On Jul 9, 2007, at 5:15 PM, John C. wrote:

Just got the O’Reilly announcement of the book “Beautiful
Code”… here is the sample chapter by Tim B. on Ruby!

http://www.oreilly.com/catalog/9780596510046/chapter/ch04.pdf

Pity he didn’t run his examples by this forum first, I’m sure we could
have made them more beautiful…

Yep, mostly because for some reason I didn’t know about sort_by. I
wonder how I missed that? Obviously, the main point of my article
was to encourage people to use the idiom of reading lines, picking
'em apart with a regex, and accumulating in a hash, invented in awk,
brought to the world in perl, perfected in Ruby.

sort_by aside, I’m not convinced that any of the alternatives to my
main loop are more beautiful or readable. I personally think that
the reason Ruby will enjoy an ever increasing market share in the
programming-languages space is because it’s more readable.

I’m looking at the following…

ARGF.read.scan( %r{GET /ongoing/When/\d\d\dx/(\d\d\d\d/\d\d/\d\d/
[^ .]+) }) do |match|
puts $1
end

Really? Considering most of the readership is new to Ruby?

In Example 4.3 he has…
1 counts = {}
2 counts.default = 0

Personally I prefer…
counts = Hash.new(0)
or
counts = Hash.new{|hash,key| hash[key]=0}
buts that’s personal preference I guess.

Gosh, I think I totally disagree on readability grounds.

-Tim


#17

On Jul 10, 2007, at 5:11 AM, Daniel M. wrote:

Look, no global variables!

Yep, much better, I wish I’d done it that way.

I agree with your use of Hash.new(0), but disagree that it’s mere
personal preference - it’s much more common to use even the single
argument form in the constructor than to use default=.

Why? default= makes it more obvious what you’re doing. I think it’s
better practice.

Example 4.5 looks like a poster child for the Hash.new block
approach…

Agreed. I wonder if perhaps he’s spent more time in python or some
other dynamic language than Ruby.

Perl :slight_smile:

-Tim


#18

On Jul 10, 2007, at 09:23 , James Edward G. II wrote:

I’ve also recently adopted the trick of using _ as an unused
parameter name. I believe it was Ara that first suggested this and
I think it’s a great idea:

hash.sort_by { |key, _| … }…

James Edward G. II

That is a good idea, I also use that in several places within blocks.


#19

On Tue, Jul 10, 2007 at 11:47:26PM +0900, Tim B. wrote:

buts that’s personal preference I guess.

Gosh, I think I totally disagree on readability grounds.

Actually, I think this:

counts = Hash.new(0)

. . . is perfectly readable, and a bit more elegant than the two-line
version from “Example 4.3”. On the other hand, this bit looks pretty
ugly to me:

counts = Hash.new{|hash,key| hash[key]=0]}

. . . especially since it looks like someone is afraid he has a limited
number of space characters to use, and doesn’t want to run out. Even
with some spaces added to it, though, it’s not as pretty and readable.


#20

On Jul 10, 2007, at 05:41 , Wayne E. Seguin wrote:

Always favor readability at the expense of verbosity, both your
future self and whoever else maintains your code will thank you.

I won’t. k and v work Just Fine™ for me and whomever else I work
with. Verbosity for verbosity’s sake, esp when it pushes outside of
common and accepted idioms, is a form of mentarbation.