RubyDictionary - First Try

maxmotor · March 31, 2010, 12:40am

Hello folks,

Last week I started learning Ruby by the book The Ruby P.ming
Language if it’s relevant. Today I decided to practise a little bit with
my new knowledge by trying to develop my first “serious” application in
Ruby. The idea was to create a small virtual dictionary based on CLI
with a few self-explaining functions like inserting/removing/search for
a data set or save/load it in a *.txt file.

Since this is my first attempt to a script language I wanted to know
what you think of my program. Are there any passages which could be
written more efficient in performance? In general, is the coding
style/commenting OK? What can I improve regarding OOP? Furthermore,
there are some commented in-code questions on line 26,93 and 99 - can
you answer them maybe?

Thanks for any help,

A Ruby novice

maxmotor · March 31, 2010, 9:48am

On Wed, Mar 31, 2010 at 12:40 AM, Max S.
[email protected] wrote:

Hello folks,

Last week I started learning Ruby by the book The Ruby P.ming
Language if it’s relevant. Today I decided to practise a little bit with
my new knowledge by trying to develop my first “serious” application in
Ruby.

Welcome to Ruby and very good job. You achieved very nice things.

The idea was to create a small virtual dictionary based on CLI
with a few self-explaining functions like inserting/removing/search for
a data set or save/load it in a *.txt file.

One question: when I think dictionary, I think this:
Associative array - Wikipedia, in which what is
unique is the keys, not the pairs. Your implementation allows to
insert two pairs with the same key:

irb(main):159:0* d = Dictionary.new
=> #<Dictionary:0xb7be8db0 @entries=[]>
irb(main):161:0> d.insert_words “aa”, “bb”
=> [#<DictionaryEntry:0xb7bdd8fc @word2=“bb”, @word1=“aa”>]
irb(main):162:0> d.insert_words “aa”, “cc”
=> [#<DictionaryEntry:0xb7bdd8fc @word2=“bb”, @word1=“aa”>,
#<DictionaryEntry:0xb7bd6494 @word2=“cc”, @word1=“aa”>]

and this looks a bit strange. But, anyway, I’ll continue with the
assumption that this is what you really want.
If this is not the case, and you want unique keys, you will have to
change many things.

Since this is my first attempt to a script language I wanted to know
what you think of my program. Are there any passages which could be
written more efficient in performance? In general, is the coding
style/commenting OK? What can I improve regarding OOP? Furthermore,
there are some commented in-code questions on line 26,93 and 99 - can
you answer them maybe?

Some comments, mostly idiomatic:

1.-
def initialize(entries=nil)
if entries==nil
@entries = Array.new
else
@entries = entries
end
end

this could be done like this:

def initialize(entries=nil)
@entries = entries || [] # or Array.new if you prefer
end

2.- def insert_words: I would call the method:

def []=(word1, word2) which allows you to say d[“word”] = “second word”
Nice syntactic sugar, although this goes more in line of what I said
about the dictionary being a key-value set.

3.-
def self.is_word? word
if (word.match(/[a-zA-Z]{2,15}/)[0].length == word.length) then true
else false
end
# (word.match(/[a-zA-Z]{2,15}/)[0].length == word.length)
# easier way?
end

Why don’t you allow one letter words? There’s a problem with this
implementation though: if the string doesn’t match you get a

irb(main):160:0> d.insert_words “a”, “b”
NoMethodError: undefined method []' for nil:NilClass from (irb):94:in is_word?’

because match will return nil, and you are calling [0] on that.

def self.is_word? word
word.match(/\A[:alpha:][:alpha:]+\z/) #you can remove the first
[:alpha:] to allow 1-letter words
end

4.- def insert_dict_entry(entry)
insert_words(entry.word1, entry.word2) # any better solution?
end

Another way, not necesarily better:

def insert_dict_entry(entry)
@entries << entry.dup
end

5.- def insert_array(words1, words2)
# arrays have to have same length
raise ArgumentError, “Invalid array arguments (not same length)”
unless words1.length==words2.length

# iterate on words1 and try to insert every pair
words1.each_index {|x| insert_words(words1[x], words2[x])}

end

I would do:

def insert_array(words1, words2)
words1.zip(words2).each {|first,second| insert_words(first,second)
if first && second}
end

this way, even if the arrays have different size, you store as much as
you can. If you still want the check you can add it.

6.- def remove(r_word1, r_word2)

def remove(word1, word2)
@entries.delete_if {|entry| entry == [word1,word2]} # if you keep
the enumerable equality check, see below
end

7.- def remove_at(index)

This one doesn’t make sense for the public interface of a dictionary.
If you implement remove as above, you don’t need it anymore.

8.- def search s_word

def search word
@entries.select {|entry| entry.word1.include?(word) ||
entry.word2.include?(word)}
end

9.- def has_word_pair?(word1, word2)

Why do you need the index as a return value?

def has_word_pair?(word1,word2)
@entries.find {|entry| entry == [word1,word2]}
end

10.- to_s

def to_s
s = “Current state of Dictionary (#{self.object_id})\n”
@entries.each_with_index {|entry, i| s << sprintf(“%3d %15s |
%15s\n”, i, entry.word1 , entry.word2)
s
end

DictionaryEntry

When you create a dictionary entry, you might want to dup the words to
avoid aliasing:

def initialize (word1, word2)
@word1 = word1.dup unless word1.frozen?
@word2 = word2.dup unless word2.frozen?
end

11.-
def ==(other)
if (other.is_a?(Enumerable) && @word1 == other[0] && @word2 ==
other[1])
true
elsif (other.is_a?(DictionaryEntry) && @word1 == other.word1 &&
@word2 == other.word2) # not tested
puts “dict entry”
true
else
false
end
end

Enumerable doesn’t provide the method [], so if you are allowing a
equality test against an Enumerable you should only use Enumerable
methods. For example:

def ==(other)
case other
when Enumerable
@word1 == other.to_a[0] && @word2 == other.to_a[1]
when DictionaryEntry
@word1 == other.word1 && @word2 == other.word2
else
false
end
end

I’m not sure you want to keep the equality comparison with an
Enumerable, because I think you added it more as a convenience than as
a real requirement. If you remove it, some of my suggestions above
should be reviewed.

I don’t have time now to comment on the CLI interface, but I think
that could be refactored a little bit too.

Jesus.

maxmotor · March 31, 2010, 11:55am

Hello and thank you for this fast widespread answer!

Your implementation allows to insert two pairs with the same key
This totally was my intention, because one word can be translated in
many different ways. Associative Arrays are called “Hashs” in Ruby,
aren’t they?

1 -

def initialize(entries=nil)
@entries = entries || [] # or Array.new if you prefer
end
Nice idea, I never would consider a boolean operator as a expression,
which returns anything else than “true” or “false”

2 -

def []=(word1, word2)
Don’t you think that this definition is rather confusing, if Dictionary
is NOT an associative array? I have to admit that I am still not
familiarized with these method-like operators…

3 -

Why don’t you allow one letter words?
Because strings with one letter aren’t any words, they are characters.
My aim was to develop a dictionary where you can organize “real-life”
words

if the string doesn’t match you get a

irb(main):160:0> d.insert_words “a”, “b”
NoMethodError: undefined method []' for nil:NilClass from (irb):94:inis_word?’

Ok, fixed it like this:

match = word.match(/[a-zA-Z]{2,15}/);
if (match!=nil && match[0].length == word.length) then true

def self.is_word? word
word.match(/\A[:alpha:][:alpha:]+\z/) #you can remove the first
[:alpha:] to allow 1-letter words
end

I wanted to limit the word’s length to 15 as you can see in my regular
expression. Are the “\A” (start position) and “\z” (end position)
necessary?

4 -

Another way, not necesarily better:

def insert_dict_entry(entry)
@entries << entry.dup
end

I should have written the question more accurate. The problem was to
check the entry for compability by invoking insert_words() before
appending to @entries

I considered making a special function for that, like “is_insertable?”,
but I was still unsure about that

5 -

this way, even if the arrays have different size, you store as much as
you can. If you still want the check you can add it.

Hmm, I want the length-check remain.

def insert_array(words1, words2)
words1.zip(words2).each {|first,second| insert_words(first,second)
if first && second}
end

Why do you prefer to a implementation which produces a temporaly array?

6 -

7 -

This one doesn’t make sense for the public interface of a dictionary.
If you implement remove as above, you don’t need it anymore.

This was meant to be a time-saver. You can output the Dictionary by
typing “o/output”, find the index of the entry you want to delete and
then apply it to the “r/remove” - command

8 -

Very cool!

9 -

Why do you need the index as a return value?
for “remove” - I first searched for the word pair and if found I removed
the pair by calling remove_at(found_index)

@entries.find {|entry| entry == [word1,word2]}
I slowly realize that I will have to clearly go through the Array
methods this afternoon.

10 -

@entries.each_with_index {|entry, i| s << sprintf("%3d %15s |
%15s\n", i, entry.word1 , entry.word2)

This means it is actually never recommended to use

@entries.length.times {|i| … }
to iterate through an array?

DictionaryEntry

11 -

you might want to dup the words to
avoid aliasing
Do you mean to avoid exceptions when the words are frozen? If yes, I
would throw an Exception if either of the two words is frozen.

12 -

Enumerable doesn’t provide the method []
Ok, I was convinced that Enumerables behave like Arrays

I’m not sure you want to keep the equality comparison with an
Enumerable
… then I should replace the Enumberable with i.e. an Array like this?

case other
when Array
@word1 == other[0] && @word2 == other[1]
when DictionaryEntry
@word1 == other.word1 && @word2 == other.word2
else
false
end

Again, thanks for your effort

maxmotor · March 31, 2010, 4:58pm

3 -

Why don’t you allow one letter words?
Because strings with one letter aren’t any words, they are characters.
My aim was to develop a dictionary where you can organize “real-life”
words

“a” is a word
This is a good argument, I was obviously too focused on my native
language

if (match!=nil && match[0].length == word.length) then true

def self.is_word? word
ï¿½word.match(/\A[:alpha:][:alpha:]+\z/) #you can remove the first
[:alpha:] to allow 1-letter words
end

I wanted to limit the word’s length to 15 as you can see in my regular
expression. Are the “\A” (start position) and “\z” (end position)
necessary?

It’s a way of saying that the full string should match your
restrictions. It avoids having to check that the match is equal to the
length of the original string:

def self.is_word? word
word.match(/\A[:alpha]{2,15}\z/)
end

should do what you want.
Oh, now I understand. Checking the length of the original string was a
work-a-round, because I did not know about this regexp feature

Do you mean to avoid exceptions when the words are frozen? If yes, I
would throw an Exception if either of the two words is frozen.

No, what I mean is that the client can modify the String object after
inserting it in the dictionary:
You surely mean the DictionaryEntry

@word1 = word1.dup unless word1.frozen?
@word2 = word2.dup unless word2.frozen?

The frozen check is just an optimization, because if the string is
frozen it’s not going to be modified by the client, and so you are
safe just using it.

If you assume that word1 is frozen, then @word1 would be assigned to
nil, wouldn’t it? Or is the unless refering to the - and exclusively -
to the “.dup”?

If it’s just for
convenience, I think you should drop it, because it’s giving those
objects a behaviour that they shouldn’t have.

Do you mean that the boolean equality operator (==) normally exclusively
compares with the class it is defined by?

Another idea: if want to see the dictionary as a set of things, a
common thing is to include Enumerable and provide an each method, so
that the clients of the dictionary can use the cool methods Enumerable
provides.
Nice idea, I implemented this.

Last question:
I posted this code on an other ruby forum and was criticized about my
coding style: Indents are always done with two whitespaces rather than
with tabs like I have. Have you heard anything about such a “unwritten
rule”?

Thanks

maxmotor · March 31, 2010, 12:12pm

On Wed, Mar 31, 2010 at 11:55 AM, Max S.
[email protected] wrote:

Hello and thank you for this fast widespread answer!

Your implementation allows to insert two pairs with the same key
This totally was my intention, because one word can be translated in
many different ways. Associative Arrays are called “Hashs” in Ruby,
aren’t they?

Yes, but the term Dictionary is a common technical term with that
meaning, that’s why I was asking…

Don’t you think that this definition is rather confusing, if Dictionary
is NOT an associative array? I have to admit that I am still not
familiarized with these method-like operators…

Yes, you are right. That syntactic sugar makes more sense when it’s an
associative array, not in your case.

3 -

Why don’t you allow one letter words?
Because strings with one letter aren’t any words, they are characters.
My aim was to develop a dictionary where you can organize “real-life”
words

“a” is a word

if (match!=nil && match[0].length == word.length) then true

def self.is_word? word
word.match(/\A[:alpha:][:alpha:]+\z/) #you can remove the first
[:alpha:] to allow 1-letter words
end

I wanted to limit the word’s length to 15 as you can see in my regular
expression. Are the “\A” (start position) and “\z” (end position)
necessary?

It’s a way of saying that the full string should match your
restrictions. It avoids having to check that the match is equal to the
length of the original string:

def self.is_word? word
word.match(/\A[:alpha]{2,15}\z/)
end

should do what you want.

end
This was meant to be a time-saver. You can output the Dictionary by
for “remove” - I first searched for the word pair and if found I removed

This means it is actually never recommended to use

@entries.length.times {|i| … }
to iterate through an array?

It’s usually not recommended. If you really, really need the index
each_with_index handles that.

DictionaryEntry

11 -

you might want to dup the words to
avoid aliasing
Do you mean to avoid exceptions when the words are frozen? If yes, I
would throw an Exception if either of the two words is frozen.

No, what I mean is that the client can modify the String object after
inserting it in the dictionary:

d = Dictionary.new
w1 = “hello”
w2 = “world”
d.insert_words(w1,w2)
w1 << “something crazy which maybe should be in the dictionary at this
point”

The string referenced by w1 is modified after being inserted. With
your solution, you store the same object, and so the dictionary is
modified by this external action.

The frozen check is just an optimization, because if the string is
frozen it’s not going to be modified by the client, and so you are
safe just using it.

12 -

Enumerable doesn’t provide the method []
Ok, I was convinced that Enumerables behave like Arrays

I’m not sure you want to keep the equality comparison with an
Enumerable
… then I should replace the Enumberable with i.e. an Array like this?

I don’t know, depends on what your use case is. If it’s just for
convenience, I think you should drop it, because it’s giving those
objects a behaviour that they shouldn’t have.

case other
when Array
@word1 == other[0] && @word2 == other[1]
when DictionaryEntry
@word1 == other.word1 && @word2 == other.word2
else
false
end

Another idea: if want to see the dictionary as a set of things, a
common thing is to include Enumerable and provide an each method, so
that the clients of the dictionary can use the cool methods Enumerable
provides. In your case:

class Dictionary
include Enumerable

    def each (&blk)
            @entries.each &blk
    end

end

that way people could do:

d = Dictionary.new
#populate it somehow
d.find {|entry| entry.word1 == “hello”}

and all the rest of the cool stuff Enumerable provides.

Jesus.

maxmotor · March 31, 2010, 5:08pm

On Wed, Mar 31, 2010 at 4:58 PM, Max S.
[email protected] wrote:

3 -

Why don’t you allow one letter words?
Because strings with one letter aren’t any words, they are characters.
My aim was to develop a dictionary where you can organize “real-life”
words

“a” is a word
This is a good argument, I was obviously too focused on my native
language

Sorry about that ! You know your requirements

The frozen check is just an optimization, because if the string is
frozen it’s not going to be modified by the client, and so you are
safe just using it.

If you assume that word1 is frozen, then @word1 would be assigned to
nil, wouldn’t it? Or is the unless refering to the - and exclusively -
to the “.dup”?

No, I screwed up and you are right. I’d do it something like:

@word1 = word1.frozen? ? word1 : word1.dup

If it’s just for
convenience, I think you should drop it, because it’s giving those
objects a behaviour that they shouldn’t have.

Do you mean that the boolean equality operator (==) normally exclusively
compares with the class it is defined by?

I think generally yes. I’m not sure if it’s a hard rule, though, and
in some cases it might make sense. If you do that, I would certainly
include Enumerable in your class, so that then both are Enumerable, so
you might argue that the comparison really makes sense

Last question:
I posted this code on an other ruby forum and was criticized about my
coding style: Indents are always done with two whitespaces rather than
with tabs like I have. Have you heard anything about such a “unwritten
rule”?

Yeah, the typical convention is to use two spaces instead of tabs. But
criticized? Come on, it’s a convention and you said you were new to
Ruby…
The reason I dislike tabs most is that I cannot copy and paste code
with tabs into irb, cause it shows me the current directory listing,
like a completion in the shell.

So, in my opinion, if you want to share code with the Ruby community,
I think you should try to adapt to the general conventions, it makes
it easier for everybody.

Jesus.

maxmotor · March 31, 2010, 5:13pm

On Wed, Mar 31, 2010 at 5:07 PM, Peter H.
[email protected] wrote:

On 31 March 2010 15:58, Max S. [email protected]wrote:

Last question:
I posted this code on an other ruby forum and was criticized about my
coding style: Indents are always done with two whitespaces rather than
with tabs like I have. Have you heard anything about such a “unwritten
rule”?

This is a religious issue, (flame)wars have started over less.

Ah, I thought it was pretty well-established…
Well, then, Max, what I said before may not hold true.
I still rather not have tabs because what I said about irb, but I’m
certainly not religious about it…

Jesus.

maxmotor · March 31, 2010, 5:08pm

On 31 March 2010 15:58, Max S.
[email protected]wrote:

Last question:
I posted this code on an other ruby forum and was criticized about my
coding style: Indents are always done with two whitespaces rather than
with tabs like I have. Have you heard anything about such a “unwritten
rule”?

This is a religious issue, (flame)wars have started over less.

maxmotor · March 31, 2010, 6:33pm

On Wed, Mar 31, 2010 at 6:23 PM, Josh C. [email protected]
wrote:

this could be done like this:
   def initialize(entries=nil)
      @entries = entries || []   # or Array.new if you prefer
   end
How about
def initialize( entries = Array.new )
@entries = entries
end

Yep, that’s cleaner.

BTW, I think it is better to use setters and getters than to directly access
the ivars, because if you restrict access to the method, then you only have
to go to one place if you change implementation.

Sounds good to me, enter the Self Encapsulate Field refactoring (yes,
I’m now reading Fowler’s book

attr_accessor :entries

def initialize(entries = Array.new)
self.entries = entries
end

Although maybe you want the accessor to be private at first, unless
your public interface calls for that.

Jesus.

maxmotor · March 31, 2010, 6:26pm

2010/3/31 JesÃºs Gabriel y GalÃ¡n [email protected]

   def initialize(entries=nil)
      @entries = entries || []   # or Array.new if you prefer
   end
How about
def initialize( entries = Array.new )
@entries = entries
end

BTW, I think it is better to use setters and getters than to directly
access
the ivars, because if you restrict access to the method, then you only
have
to go to one place if you change implementation.

maxmotor · April 1, 2010, 1:20am

Hello,

As Max explicitly asked for Cosmetics/Style, and this is one of the
biggest flaws of his program in this regard, so I had to mention it.

I promise to chance this bad style in my next ruby program.

Although maybe you want the accessor to be private at first, unless
your public interface calls for that.

No, @entries should be private (it already is), you are right.

Good night

maxmotor · March 31, 2010, 7:26pm

On Mar 31, 2010, at 5:07 PM, Peter H. wrote:

On 31 March 2010 15:58, Max S. [email protected]wrote:

Last question:
I posted this code on an other ruby forum and was criticized about my
coding style: Indents are always done with two whitespaces rather than
with tabs like I have. Have you heard anything about such a “unwritten
rule”?
This is a religious issue, (flame)wars have started over less.

I was the one who made that comment, because it was the first thing that
jumped right at me, especially when viewing the code in out code
highlighter (that uses 4 spaces per tab).

This may be a religious issue, but thats the case with every coding
standard. I never criticize people for their use of do … end, {} and
such, but “2 spaces, no tabs” is such widely used in the ruby world that
you can indeed call it a common standard and good practice.

It is also not an “unwritten rule”, but referred to in many guides, for
example:

http://www.caliban.org/ruby/rubyguide.shtml#indentation

github.com

leahneukirchen/styleguide/blob/master/RUBY-STYLE

= Leah Neukirchen's Ruby Style Guide

You may not like all rules presented here, but they work very well for
me and have helped producing high quality code.  Everyone is free to
code however they want, write and follow their own style guides, but
when you contribute to my code, please follow these rules:


== Formatting:

* Use ASCII (or UTF-8, if you have to).

* Use 2 space indent, no tabs.

* Use Unix-style line endings.  On the last line of the file, too.

* Use spaces around operators, after commas, colons and semicolons,
  around { and before }.

* No spaces after (, [ and before ], ).

This file has been truncated. show original

… as well as many open repositories to be found on github.

As Max explicitly asked for Cosmetics/Style, and this is one of the
biggest flaws of his program in this regard, so I had to mention it.

Regards,
Florian

–
Florian G.

smtp: [email protected]
jabber: [email protected]
gpg: 533148E2

maxmotor · April 1, 2010, 1:30am

Hello again,

Does it make sense to seperate the “case”-blocks of the main program in
different functions?