Ensuring uniqueness of an object at creation time

dubstep · February 26, 2011, 6:45pm

Hi all,

I would like to ensure that some attributes of an object are unique
between
all the instances of that class.
So, I would like to prevent the creation of instances of a class that
holds
attributes “equal” to the attributes of an already created instance.

(solely) As an example, the problem of creating an instance of a Person
class that has the same name of an existent instance.

My simple person class would be:

http://pastie.org/1610255

I found my way through overriding the Person.new method.
(Thanks “The Ruby P.ming Language” book).
So that the new instance is not even allocated if there’s already one
with
the same name.

http://pastie.org/1610260

QUESTIONS:

Could this be considered a poor design choice?
What other ways to accomplish this?

Any opinions, tips, critics are welcome.

Some more comments:
Doing this way made me able to treat the object creation in a generic
way.
The logic behind the uniqueness of the instances is hold by the class
itself, not by the (running) code.
This is desirable for me in this specific set because I’m parsing an xml
file with tags in a recursive manner.
I have a Hash that maps tags to classes.
The uniqueness attribute requirements of each class is different.
For example: Person unique by name. Telephone unique by number.
The parser only:

Take the tag
Look for the class that is mapped by the tag
Create the class using the attributes and text as argument to new
The class is the “when” and “where” the uniqueness validation is in
action

Thanks in advance,
Abinoam Jr.

PS:
My real problem is a little different (not Person or Telephone).
I used the Person class just as an example.
If someone curious about the code, tell me that I post the whole code.

abinoampraxedes_m · February 27, 2011, 10:14am

Abinoam Jr. wrote in post #984123:

I found my way through overriding the Person.new method.
(Thanks “The Ruby P.ming Language” book).
So that the new instance is not even allocated if there’s already one
with
the same name.

There’s no need to do that. You could just raise an exception from
within the initialize method.

The logic behind the uniqueness of the instances is hold by the class
itself, not by the (running) code.
This is desirable for me in this specific set because I’m parsing an xml
file with tags in a recursive manner.

I think it’s a poor design choice to enforce uniqueness within the
class, because it limits the usefulness of your Person class - you could
not have two XML parsers parsing two separate documents, for example, or
send and receive Person instances using DRb.

I think it would be better to have a ‘person collection’ object which
enforces the uniqueness. You create a new person, and get an error if
you try to add it into the collection where one already exists.

This is the same sort of model as you get with SQL uniqueness
constraints within a table, of course.

Regards,

Brian.

abinoampraxedes_m · February 27, 2011, 9:03pm

Hi Brian,

Thank you very much for replying.

On Sun, Feb 27, 2011 at 5:14 AM, Brian C. [email protected]
wrote:

Abinoam Jr. wrote in post #984123:

I found my way through overriding the Person.new method.
(Thanks “The Ruby P.ming Language” book).
So that the new instance is not even allocated if there’s already one
with
the same name.

There’s no need to do that. You could just raise an exception from
within the initialize method.

Person#initialize is called by Person.new
The Person.allocate is called before Person#initialize.

So, if I raise an exception at initialize point the object was already
allocated.

I think it would be better to have a ‘person collection’ object which
enforces the uniqueness. You create a new person, and get an error if
you try to add it into the collection where one already exists.

This is the same sort of model as you get with SQL uniqueness
constraints within a table, of course.

Regards,

Brian.

I think I was not clear enough. (I tried to simplify it, and ended
OVERsimplifying it).

Look at this xml snippet.

PREPARE_A COMPLETE_A

<message>PREPARE_B</message>
<message>COMPLETE_B</message>

<message>PREPARE_C</message>
<message>COMPLETE_C</message>

After declaring all those messages, I just want to use them in my
rule/action table.

         active

PREPARE_B completing

Look at the <send_message>PREPARE_B</send_message>
This “PREPARE_B” message is the same of the previously declared one.
I’m just “using” it.
In this specific case of its “uniqueness” is based on message
text.
(There’s other classes that has its uniqueness based on something
different)
“If it smells like dog, it should be a dog” (Or “THAT specific” dog).
If it’s a message and has the same text, it should be the SAME message
(not a new one).

So, I don’t want to raise an exception, I just want to return the
existing object without even allocating a new one.

This kind of behavior makes me able to design my parser in a
“generic”/“agnostic” manner.
I just have to have a ‘table’ mapping xml tags to classes.
The parser just get the tag, see what class should be instantiated,
and calls the .new and iterate to the next tag.
It’s up to the class all the logic to ensure its uniqueness.

But, I’m feeling I’m forgetting something.

What do you think?

a = :prepare_b
b = :prepare_b

With Symbol, if it has the same value, it’s the SAME Symbol, not
different ones.

a = “prepare_b”
b = “prepare_b”

With String, even if they have the same value, they are different
objects.

I would like to resemble/extend this kind of behaviour to more generic
objects.

Thank you again,
Abinoam Jr.

abinoampraxedes_m · February 27, 2011, 9:30pm

Hi Gary,

It’s pretty elegant using a name such as “find_or_create” for the
factory method. It really describes its behavior much better than
“new”.
And setting “new” as private, good point too.

In my specific piece of software I just have to make all the class a
“find_or_create” method, even if it doesn’t have any uniqueness
constraints. This is because my parser is “agnostic”, so I don’t want
it to decide where to use “new” or “find_or_create”. It has just to
use “find_or_create” in every object it tries to create.

Thank you for your comments,
Abinoam Jr.

abinoampraxedes_m · February 27, 2011, 9:14pm

On Feb 27, 2011, at 3:00 PM, Abinoam Jr. wrote:

a = :prepare_b
b = :prepare_b

With Symbol, if it has the same value, it’s the SAME Symbol, not different ones.

Don’t get hung up on using the default new/initialize framework. You
can define your own constructors that return existing instances instead
of allocating a new instance if you want to model ‘value’ semantics.

message = Message.find_or_create(‘message_a’, other, args)

Then define find_or_create such that it manages a cached collection of
messages that it can dip into if it finds a match or it can create a new
message if necessary.

In general I think it is better to define your own constructors if the
‘normal’ semantics of new/initialize aren’t what you want instead of
redefining new. You can always make new private if you want to ‘force’
the use of your own constructors.

If the only attribute of your messages is their unique name, then
symbols might be exactly what you need but be aware that symbols are
generally not garbage collected so that if you create them based on
external data you might be opening yourself to a memory exhaustion
attack (i.e. the attacker can cause your memory footprint to grow
without bounds). Whether this is a concern or not just depends on where
the external data is coming from.

Gary W.

abinoampraxedes_m · February 27, 2011, 10:52pm

Abinoam Jr. wrote in post #984287:

There’s no need to do that. You could just raise an exception from
within the initialize method.

Person#initialize is called by Person.new
The Person.allocate is called before Person#initialize.

So, if I raise an exception at initialize point the object was already
allocated.

Yes, but it will be garbage-collected later.

So, I don’t want to raise an exception, I just want to return the
existing object without even allocating a new one.

Then you could make a class method:

class MyClass
@all_objects = {}

non-threadsafe version

def self.create(args)
return @all_objects[args] if @all_objects[args]
@all_objects[args] = new(args)
end
end

a = :prepare_b
b = :prepare_b

With Symbol, if it has the same value, it’s the SAME Symbol, not
different ones.

a = “prepare_b”
b = “prepare_b”

With String, even if they have the same value, they are different
objects.

That’s correct. And normal objects are like String; Symbol is very much
a special case, baked into the language, to give efficient method
dispatch.

Why is it important in your application for your objects to have the
singleton behaviour like Symbol? What bad things would happen if there
were two objects representing the same message?

Regards,

Brian.

abinoampraxedes_m · February 27, 2011, 11:55pm

One option:

class Object
def self.find_or_create(*args,&blk)
new(*args,&blk)
end
end

Then you override it in those classes where you need to.

Good! Thank you.

Abinoam Jr.

abinoampraxedes_m · February 27, 2011, 10:53pm

Abinoam Jr. wrote in post #984295:

In my specific piece of software I just have to make all the class a
“find_or_create” method, even if it doesn’t have any uniqueness
constraints. This is because my parser is “agnostic”, so I don’t want
it to decide where to use “new” or “find_or_create”. It has just to
use “find_or_create” in every object it tries to create.

One option:

class Object
def self.find_or_create(*args,&blk)
new(*args,&blk)
end
end

Then you override it in those classes where you need to.

abinoampraxedes_m · February 28, 2011, 12:03am

So, if I raise an exception at initialize point the object was already
allocated.

Yes, but it will be garbage-collected later.

You’re right. In my specific piece of software I think this will not a
problem.
But, if it’s a huge one, the computational cost of allocating and
deallocating can be important.

non-threadsafe version

Thank you for advising me to always think in “thread-safe” way.

Why is it important in your application for your objects to have the
singleton behaviour like Symbol? What bad things would happen if there
were two objects representing the same message?

Look, there’s some problems I have circumvented by defining a (class)#==
method.
The “==” method relied on the comparison of the instance variables (
@message, for example ).
So, even if there are 2 instances representing the same message they
would be considered equal to each other throughout the software.
But… again, I thought it could a little ugly code.

Abinoam Jr.

abinoampraxedes_m · February 28, 2011, 12:13am

On Sun, Feb 27, 2011 at 6:02 PM, Abinoam Jr. [email protected] wrote:

You’re right. In my specific piece of software I think this will not a problem.
But, if it’s a huge one, the computational cost of allocating and
deallocating can be important.

And you’ve already proven that allocation is a significant factor in
your program’s runtime efficiency using rigorous profiling… right?