Equivalent of collections in Java

Hi list,

I am reading the “from concepts to code” book from Jacquie Barker, but
this book uses Java as the language to learn OO programming. This
doesn’t bother me at all although I’m learning Ruby, but sometimes I
don’t know if it is in Ruby and what it’s called in Ruby.
I am now reading about the concept of Collections, that is Collection
objects, in Java, but I can’t find something about it in my Pickaxe
book. Does someone know the equivalent in Ruby?

Also another tiny question:

I read in the Pickaxe book : Integer(gets), how doest this work? I
thought Integer is an object type and not a method which can indeed be
called with an argument, like method(argument) ?

Thanx again,
Krekna

On 5/25/06, Krekna M. [email protected] wrote:

Hi list,

I am reading the “from concepts to code” book from Jacquie Barker, but
this book uses Java as the language to learn OO programming. This
doesn’t bother me at all although I’m learning Ruby, but sometimes I
don’t know if it is in Ruby and what it’s called in Ruby.
I am now reading about the concept of Collections, that is Collection
objects, in Java, but I can’t find something about it in my Pickaxe
book. Does someone know the equivalent in Ruby?

At the most abstract level every programming language needs some form
of “collections” in the sense of data structures used to contain
information. In Ruby there is Array and Hash classes, which can handle
most of what the core Java Collection classes can handle. The
statically typed nature of Java requires such a huge variety of
classes, whereas Ruby gets much of the same functionality with just
two classes.

One benefit of the Java libraries is you can get very specific about
what type of data structure you want to use, even taking into account
the internal representation. For example you can choose to use an
ArrayList or a LinkedList depending on which style of list would be
the most efficient for your needs. Some might argue this is against
the spirit of OO since it exposes the internal representation of the
data structure.

Either way in the case of Ruby core classes, you just have Array and
Hash.

Also another tiny question:

I read in the Pickaxe book : Integer(gets), how doest this work? I
thought Integer is an object type and not a method which can indeed be
called with an argument, like method(argument) ?

The above is actually an Integer method in the Kernel class:

C:\P4WS>ri Kernel#Integer
--------------------------------------------------------- Kernel#Integer
Integer(arg) => integer

 Converts _arg_ to a +Fixnum+ or +Bignum+. Numeric types are
 converted directly (with floating point numbers being truncated).
 If _arg_ is a +String+, leading radix indicators (+0+, +0b+, and
 +0x+) are honored. Others are converted using +to_int+ and +to_i+.
 This behavior is different from that of +String#to_i+.

    Integer(123.999)    #=> 123
    Integer("0x1a")     #=> 26
    Integer(Time.new)   #=> 1049896590

Ryan

I should have read a bit further, they just mean collections as I
already know it, like arrays (ordered lists) and hashes. Nothing new
under the sun, to my understanding.

So, taken into account I am right, please forget this question :slight_smile:

Krekna

2006/5/25, Krekna M. [email protected]:

2006/5/25, Ryan L. [email protected]:

On 5/25/06, Krekna M. [email protected] wrote:

At the most abstract level every programming language needs some form
of “collections” in the sense of data structures used to contain
information. In Ruby there is Array and Hash classes, which can handle
most of what the core Java Collection classes can handle. The
statically typed nature of Java requires such a huge variety of
classes, whereas Ruby gets much of the same functionality with just
two classes.

The static typing of Java is not responsible for the comparatively
large number of collection classes. All Java collections are based on
Object as element type which is as generic as it can get without
generics.

Java collections indeed offer more functionality: these are the
collection types which do not have an equivalent in Ruby’s std lib:
TreeSet, TreeMap, LinkedList, IdentityHashMap, Stack and the
synchronized Collections Vector and Hashtable.

One benefit of the Java libraries is you can get very specific about
what type of data structure you want to use, even taking into account
the internal representation. For example you can choose to use an
ArrayList or a LinkedList depending on which style of list would be
the most efficient for your needs. Some might argue this is against
the spirit of OO since it exposes the internal representation of the
data structure.

That’s not against OO at all. All classes have certain
characteristics - algorithmic complexity being one of them. It’s
perfectly valid to have several implementations of the same concept
(aka interface) from which a developer can pick the most appropriate
for the solution he wants to build. In fact, OO usually makes
exchanging one implementation for another much easier through
inheritance (duck typing in Ruby).

Either way in the case of Ruby core classes, you just have Array and Hash.

You could add to that Set and Queue (thread safe) which come with the
std distribution.

Kind regards

robert

On 25-May-06, at 11:59 AM, Ryan L. wrote:

Java collections indeed offer more functionality: these are the
collection types which do not have an equivalent in Ruby’s std lib:
TreeSet, TreeMap, LinkedList, IdentityHashMap, Stack and the
synchronized Collections Vector and Hashtable.

I think having to guess which one of these you want to use based on
perceived need is a form of premature optimization, and this overly
complicates the development process. How many Ruby programs have truly
needed classes like the above? Also Ruby’s Array can act much like
Java’s Stack.

I think that if I were to come across a situation where I needed a
list data structure and my program was going to be making lots of
random accesses, picking a linked list would just be silly. So while
in some cases, I think you would be right with premature
optimization, you cannot just go making that sweeping claim responsibly.

Ryan


Jeremy T.
[email protected]

“The proof is the proof that the proof has been proven and that’s the
proof!” - Jean Chrétien

On 5/25/06, Robert K. [email protected] wrote:

The static typing of Java is not responsible for the comparatively
large number of collection classes. All Java collections are based on
Object as element type which is as generic as it can get without
generics.

Let me clarify: I think the design of Java, including the static
typing, results in a large number of highly specified and frequently
not interchangeable Collection classes. I feel this is a negative.

Java collections indeed offer more functionality: these are the
collection types which do not have an equivalent in Ruby’s std lib:
TreeSet, TreeMap, LinkedList, IdentityHashMap, Stack and the
synchronized Collections Vector and Hashtable.

I think having to guess which one of these you want to use based on
perceived need is a form of premature optimization, and this overly
complicates the development process. How many Ruby programs have truly
needed classes like the above? Also Ruby’s Array can act much like
Java’s Stack.

That’s not against OO at all. All classes have certain
characteristics - algorithmic complexity being one of them. It’s
perfectly valid to have several implementations of the same concept
(aka interface) from which a developer can pick the most appropriate
for the solution he wants to build. In fact, OO usually makes
exchanging one implementation for another much easier through
inheritance (duck typing in Ruby).

Java does not have duck typing and in fact most of the Collection
classes are not at all interchangeable, unless you stick to the
interfaces of their common abstract parent classes (which removes much
of the benefit of their specialization.)

I think in Ruby there are only a few specialized cases that would
require changing the algorithm behind a data structure, and if speed
is that important profiling will find the true culprits slowing things
down, which frequently will not be the data structures.

Ryan

On 5/25/06, Robert K. [email protected] wrote:

Java collections indeed offer more functionality: these are the
collection types which do not have an equivalent in Ruby’s std lib:
TreeSet, TreeMap, LinkedList, IdentityHashMap, Stack and the
synchronized Collections Vector and Hashtable.

I found this to be a deficiency in ruby for what I was doing (a
generic parser). The collections framework I made for ruby is called
“cursor” and is on rubyforge. It combines concepts from Java
collections/streams, C++ collections/iterators, and I guess even
ruby’s Array/IO classes. One API for many different datastructures -
array/string, linked lists, file/io, circular structures, gap buffer,
multi-threaded queue, etc. Any data-structure that has a (or can have
a) linear order could fit into this framework.

2006/5/25, Ryan L. [email protected]:

On 5/25/06, Robert K. [email protected] wrote:

The static typing of Java is not responsible for the comparatively
large number of collection classes. All Java collections are based on
Object as element type which is as generic as it can get without
generics.

Let me clarify: I think the design of Java, including the static
typing, results in a large number of highly specified and frequently
not interchangeable Collection classes. I feel this is a negative.

Again, static typing is not responsible for the number of collection
classes because variation is on functionality, invariants, algorithmic
complexity - and not on type.

needed classes like the above? Also Ruby’s Array can act much like
Java’s Stack.

Yes, and so can Java’s LinkedList. This doesn’t make a Stack class
superfluous.

interfaces of their common abstract parent classes (which removes much
of the benefit of their specialization.)

I think in Ruby there are only a few specialized cases that would
require changing the algorithm behind a data structure, and if speed
is that important profiling will find the true culprits slowing things
down, which frequently will not be the data structures.

With all due respect, what you wrote shows that you lack some basic
understanding of software engineering. Choosing between a Set and a
List is by far not a premature optimization but a deliberate design
decision. Knowing algorithmic properties of these basic abstract data
types is one of the required skills of someone engaged in software
engineering - even if Ruby has only so few of them, and probably needs
less of them because of its design. I suggest you get yourself a book
on Data Structures and Algorithms (for example the excellent books of
Robert Sedgewick) and digest it.

Kind regards

robert

On Friday 26 May 2006 7:56 am, Ryan L. wrote:

work with. Do you consider this a flaw Robert? It seems you do. I
would argue adding too many data structures would overly and
needlessly complicate Ruby, at least where those data structures
provide duplicate functionality, but through different means (linked
list versus array list versus stack, etc.)

One of the pleasant things, IMHO, with Ruby is that the core data
structures
are extremely flexible. So, for many things, they just work and details
about how they work can be ignored in lieu of simply getting things
done.

  1. In most uses of Ruby, drilling down to determine the right data
    structure in the beginning of development is a waste of time and is
    certainly premature optimization. If you are writing a script to

I don’t think this is the right way to say it. One needs to think about
the
data structure, but one needs to think about it in the context of
behavior,
not implementation.

If Array or Hash has the behavior one needs, move on. You data
structure due
diligence is done for now. Obviously, if the needed behavior isn’t
present,
then it’s time to start thinking about the data structure that will
deliver
the needed behavior. Now, it would be handy if one could just use a
well
built LikedList class out of the core if one needed it, but the lack of
such
a thing doesn’t strike me as a problem, either, precisely because the
lack of
a dizzying array of builtin data structures encourages one to think more
about behavior than implementation; it encourages what is, IMHO, the
right
approach for the vast majority of cases.

And for those corner cases that really do need a LinkedList class, the
cost of
not having one in the core still isn’t high because a) it’s easy to
write
these things, and b) a lot of them can be found in a 3rd party library
like
Facets (facets.rubyforge.org).

Kirk H.

2006/5/26, Ryan L. [email protected]:

Robert Sedgewick) and digest it.

  1. Ruby doesn’t really provide a wide variety of data structures to
    work with. Do you consider this a flaw Robert? It seems you do. I
    would argue adding too many data structures would overly and
    needlessly complicate Ruby, at least where those data structures
    provide duplicate functionality, but through different means (linked
    list versus array list versus stack, etc.)

I did not say that I consider Ruby’s design of collection classes
flawed. Some classes would come in quite handy once in a while (a
sorted set for example) but it’s nothing I’m seriously missing. And
it’s certainly not a handicap of Java to have so many of them. In
fact I like that having such a rich standard lib - this is something
that sets Java apart from C++ where you usually need some external
libs to get the same functionality - even collection wise - which
often causes a lot of headache.

I am arguing against your statement that Java’s design is negative
because it contains so many collection classes. And I was pointing
out that Java’s multitude of collection classes is not caused by
static typing. I still don’t see that static typing causes this, maybe
you can shed some more light on this.

  1. In most uses of Ruby, drilling down to determine the right data
    structure in the beginning of development is a waste of time and is
    certainly premature optimization. If you are writing a script to
    process some text files I seriously doubt using the “proper” data
    structure will improve performance in any noticeable way. Even with
    bigger apps like Rails most of the overhead is in the database and
    network calls, so why would you waste time optimizing data structures
    which will not really impact real performance? That sir, is premature
    optimization.

Choosing data structures is not only about performance. It’s about
functionality and also documentation - if I need a set I choose Set,
if I need a map I choose Hash, if I need a stack or list I choose
Array… And it doesn’t take me much “drilldown” time to determine if
I use a set or a list.

Since we are recommending reading, I would suggest you read some
articles or books about agile development, and such concepts as You
Aren’t Gonna Need It (YAGNI.)

I don’t think agile methodologies make the distinction between sets
and maps or sets and arrays superfluous…

Regards

robert

On 5/26/06, Robert K. [email protected] wrote:

With all due respect, what you wrote shows that you lack some basic
understanding of software engineering. Choosing between a Set and a
List is by far not a premature optimization but a deliberate design
decision. Knowing algorithmic properties of these basic abstract data
types is one of the required skills of someone engaged in software
engineering - even if Ruby has only so few of them, and probably needs
less of them because of its design. I suggest you get yourself a book
on Data Structures and Algorithms (for example the excellent books of
Robert Sedgewick) and digest it.

Believe it or not, I actually have a degree in Computer Engineering
and have been paid to develop software for about 10 years. I realize
in languages like Java and C++ you are going to consider the
application when choosing a data structure…that is drilled into you
in algorithms and data structures class. I get that.

My argument is that in my experience, this is not necessary in high
level languages like Ruby. There are two reasons I feel this:

  1. Ruby doesn’t really provide a wide variety of data structures to
    work with. Do you consider this a flaw Robert? It seems you do. I
    would argue adding too many data structures would overly and
    needlessly complicate Ruby, at least where those data structures
    provide duplicate functionality, but through different means (linked
    list versus array list versus stack, etc.)

  2. In most uses of Ruby, drilling down to determine the right data
    structure in the beginning of development is a waste of time and is
    certainly premature optimization. If you are writing a script to
    process some text files I seriously doubt using the “proper” data
    structure will improve performance in any noticeable way. Even with
    bigger apps like Rails most of the overhead is in the database and
    network calls, so why would you waste time optimizing data structures
    which will not really impact real performance? That sir, is premature
    optimization.

Since we are recommending reading, I would suggest you read some
articles or books about agile development, and such concepts as You
Aren’t Gonna Need It (YAGNI.)

Regards,
Ryan

On 5/26/06, Robert K. [email protected] wrote:

I am arguing against your statement that Java’s design is negative
because it contains so many collection classes. And I was pointing
out that Java’s multitude of collection classes is not caused by
static typing. I still don’t see that static typing causes this, maybe
you can shed some more light on this.

My declaration that Java’s design is negative was made as compared to
Ruby. After 10 years of Java development and 5 years of Ruby
development here is what I feel in regards to data structures: in a
situation where one might choose to code something in Ruby or Java
(because their uses don’t always overlap), the simpler choices Ruby
provides in data structures will speed up development. There will not
be a need to think about exactly what data structure you need in Ruby
(beyond choosing Hash or Array) because most of the time it does not
matter. Maybe this only applies to small scripts where just getting it
working is the most important thing. In large scale development it may
be moot, and that is probably where Ruby’s simpler choice in data
structures might lack (but I’d argue that is not guaranteed to happen
all the time.)

For the sake of bringing this discussion to a close, I will retract my
statement about Java’s static typing causing so many data structures.
It seems it was really a design choice and the typing didn’t affect it
too much. Though I think the plethora of classes is a result of a
certain design philosophy that comes from static typing (this may seem
contradictory to the previous sentence but I think I’m just having a
hard time articulating my point.) As a counterexample I would be
curious to see a dynamically typed language which has such a huge
variety of collection classes built into the language or standard
library.

Choosing data structures is not only about performance. It’s about
functionality and also documentation - if I need a set I choose Set,
if I need a map I choose Hash, if I need a stack or list I choose
Array… And it doesn’t take me much “drilldown” time to determine if
I use a set or a list.

In Ruby, of course it doesn’t take long. And don’t expect me to argue
that you never need to choose between a Hash or Array or Set, that is
just silly. You have to at least choose that, of course.

But then in Java you need to decide if you are going to use a HashSet
or a TreeSet or a LinkedHashSet or maybe an ArrayList but then there
is LinkedList but of course also HashMap or TreeMap or LinkedHashMap,
plus PriorityQueue. Having a lot of choices will always slow a human
down. Maybe where you work speed of development does not matter, but
for me it is important.

I don’t think agile methodologies make the distinction between sets
and maps or sets and arrays superfluous…

I never said it did. I’m mostly speaking about the second level you
get in Java: ArrayList versus LinkedList, HashMap versus TreeMap, etc.
But when you are coding Java I guess you are just used to that. I’d
rather code Ruby.

Ryan

On 26-May-06, at 11:11 PM, Ryan L. wrote:

For the sake of bringing this discussion to a close, I will retract my
statement about Java’s static typing causing so many data structures.
It seems it was really a design choice and the typing didn’t affect it
too much. Though I think the plethora of classes is a result of a
certain design philosophy that comes from static typing (this may seem
contradictory to the previous sentence but I think I’m just having a
hard time articulating my point.) As a counterexample I would be
curious to see a dynamically typed language which has such a huge
variety of collection classes built into the language or standard
library.

See the various Smalltalks. Most are rich with a variety of data
structures.

I don’t think agile methodologies make the distinction between sets
and maps or sets and arrays superfluous…

I never said it did. I’m mostly speaking about the second level you
get in Java: ArrayList versus LinkedList, HashMap versus TreeMap, etc.
But when you are coding Java I guess you are just used to that. I’d
rather code Ruby.

As I mentioned before in a previous posting, the choice between an
ArrayList and a LinkedList is very simple. If you need fast random
access, use an ArrayList; if you don’t, a LinkedList is probably
best. It’s not exactly rocket science, and if it takes you longer
than a half a second to figure out what you need, perhaps you’re on
the wrong career path, or should get some rest.

Ryan


Jeremy T.
[email protected]

“The proof is the proof that the proof has been proven and that’s the
proof!” - Jean Chrétien

Not much to add to this informative discussion, except that the only
data structure I ever seem to miss in ruby is a reasonably efficient
sorted hash. It would be great to get rbtree into the stdlib. Or some
similar data structure (I don’t much care how it is implemented as long
as insert and search are sublinear).

2006/5/27, Ryan L. [email protected]:

matter. Maybe this only applies to small scripts where just getting it
working is the most important thing. In large scale development it may
be moot, and that is probably where Ruby’s simpler choice in data
structures might lack (but I’d argue that is not guaranteed to happen
all the time.)

Point taken. Maybe it’s also a topic where individual experience and
approach make a difference; I personally did not experience that the
amount of classes to choose from in Java caused me a headache or
serious slowdown. In fact I’d rather have even more collection
classes. Maybe it’s also that thinking about data structures is one of
the first things I do when thinking about a problem to solve; I even
create some specific classes even for small scripts as it’s so easy in
Ruby. It helps me personally to think about the problem and also to
document what I did.

For the sake of bringing this discussion to a close, I will retract my
statement about Java’s static typing causing so many data structures.
It seems it was really a design choice and the typing didn’t affect it
too much. Though I think the plethora of classes is a result of a
certain design philosophy that comes from static typing (this may seem
contradictory to the previous sentence but I think I’m just having a
hard time articulating my point.)

I think I understand your point now. Maybe it’s not a direct causal
relationship but rather the “spirit” of following an engineering
approach to software engineering. I can imagine that both, Java’s type
system as well as the rich set of collection classes, were inspired
from a more formalized approach to software engineering (stress on
“engineering”).

As a counterexample I would be
curious to see a dynamically typed language which has such a huge
variety of collection classes built into the language or standard
library.

I’d have guessed that Smalltalk does this. Thanks, Jeremy for the
confirmation. Btw, just out of curiosity: what about common lisp?
Does CLOS also contain a rich set of abstract data types?

In Ruby, of course it doesn’t take long. And don’t expect me to argue
that you never need to choose between a Hash or Array or Set, that is
just silly. You have to at least choose that, of course.

:-))

But then in Java you need to decide if you are going to use a HashSet
or a TreeSet or a LinkedHashSet or maybe an ArrayList but then there
is LinkedList but of course also HashMap or TreeMap or LinkedHashMap,
plus PriorityQueue. Having a lot of choices will always slow a human
down. Maybe where you work speed of development does not matter, but
for me it is important.

Most of these classes solve different problems so once the problem is
understood the choice is usually fairly easy. I cannot remember
having a problem with this selection process but YMMV.

I don’t think agile methodologies make the distinction between sets
and maps or sets and arrays superfluous…

I never said it did. I’m mostly speaking about the second level you
get in Java: ArrayList versus LinkedList, HashMap versus TreeMap, etc.
But when you are coding Java I guess you are just used to that. I’d
rather code Ruby.

Both have their strenghts and weaknesses, I fact I like both very
much. I like Java for it’s powerful IDE’s (namely Eclipse), rich
standard class library, native threading support - and I like Ruby for
compact code, low overhead OO and the dynamic nature (inclusive easy
meta programming).

Thanks for taking the time and sharing your thoughts!

Kind regards

robert

On Sat, May 27, 2006 at 07:54:47PM +0900, Robert K. wrote:

I’d have guessed that Smalltalk does this. Thanks, Jeremy for the
confirmation. Btw, just out of curiosity: what about common lisp?
Does CLOS also contain a rich set of abstract data types?

Foremost, lisp does have nested lists. If you look closer, you’ll find
lists are made up of cons-boxes, value.pointer, in a singly linked
list style. So (1 2 3) is really (1.(2.(3.nil))).

Common lisp also has a native array with efficient indexed access
operations, and native strings for collections of characters (or maybe
these are arrays too).

Often, simple data structures like sets are just made up with a few
functions that treat lists as sets. Or other functions that treat
“attribute lists” (symbol.value pairs in a list) as simple hashes. A
list is a natural stack too. That’s all quite akin to ruby’s duck
typing. For small amounts of data this just works. For larger amounts,
more elaborate functions that manage deeper nested lists and/or arrays
will do, in a similiar way ruby often nests Array and Hash.

I don’t know what CLOS adds to this mix beside classes which are just
dumb structs with multiple inheritance. The limited amount of CLOS
code I’ve ever seen did not need to use additional collection classes,
but rather kept things simple and used lists and arrays.

To disgress somewhat, CLOS’ power lies in generic functions, aka
methods, forming their own hierarchy with a powerful and customizable
generic function dispatch on all function arguments. For example, you
can specify for method error-p to be called for some object’s class
and all parent classes that implement it, and their results to be
or-ed, without having each of them call the aequivalent of ruby’s
super. Generic functions don’t look nor behave syntactically different
than plain old lisp functions. Compare to Ruby (and Smalltalk, Java,
C++), where method dispatch always depends on the first argument’s
class (which is implicitly self), and the first method implementation
in a search upwards the class hierarchy will be executed.

Finely crafted CLOS code looks awesome in a rails-like way. But I am
not sure if I myself could ever produce such, or just a big mess. I
feel comfortable enough in Ruby’s slightly more restrictive OO
environment.

Jürgen

On 6/5/06, Juergen S. [email protected] wrote:

For most problems, it is like catching a fly with a nuke, and I like
ruby because it is the exact opposite: easy stuff just works, and
complex problems are solveable.

Thanks Jürgen, I think you’ve nicely summarized what I was trying to
say in this whole thread but had a hard time expressing :slight_smile:

Regards,
Ryan

On Sat, May 27, 2006 at 07:25:26AM +0900, Robert K. wrote:

on Data Structures and Algorithms (for example the excellent books of
Robert Sedgewick) and digest it.

Believe it or not, I actually have a degree in Computer Engineering
and have been paid to develop software for about 10 years. I realize
in languages like Java and C++ you are going to consider the
application when choosing a data structure…that is drilled into you
in algorithms and data structures class. I get that.

As far as I remember, the STL (Standard Template Library) is part of
the C++ standard libraries. And it is the most complete and powerful
collection library I’ve ever seen, abstracting more than mere
collections. It also has an external iterator interface, I/O adaptors,
abstracts algorithmns, and can be used in a functional programming
style. Thanks to heavy use of templates and preprocessor magic, it
will blow up your binary a lot. And just for reading the docs, I
recommend a CS or math degree.

For most problems, it is like catching a fly with a nuke, and I like
ruby because it is the exact opposite: easy stuff just works, and
complex problems are solveable.

Jürgen