How to check record duplication before saving?

limatsao · April 11, 2008, 10:36pm

Hi, everyone,

I am a beginner in ruby programming and I have a minor question about
data checking, if someone who could give me a suggestion that would be
great.

The system needs to prevent a duplicated record happened in database,
the record is consisted of two fields which are column A and column B.
Supposing a record has been saved in database that the A=1, B=1.
Afterward, the same set of record can not be accepted again, for
example, record: A=1, B=2 and A=2, B=1 both are acceptable. In other
words, record: A=1, B=1 can not be saved in database.

So how to check and prevent a new record that has the same columns with
same values tend to be saved in database?

Frank

limatsao · April 11, 2008, 10:42pm

On Apr 11, 2008, at 3:36 PM, Frank T. wrote:

So how to check and prevent a new record that has the same columns
with
same values tend to be saved in database?

Try creating a unique database index that spans the two columns.

Regards

Dave

limatsao · April 11, 2008, 10:48pm

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Frank T. wrote:
| Hi, everyone,
|
| I am a beginner in ruby programming and I have a minor question about
| data checking, if someone who could give me a suggestion that would be
| great.
|
| The system needs to prevent a duplicated record happened in database,
| the record is consisted of two fields which are column A and column B.
| Supposing a record has been saved in database that the A=1, B=1.
| Afterward, the same set of record can not be accepted again, for
| example, record: A=1, B=2 and A=2, B=1 both are acceptable. In other
| words, record: A=1, B=1 can not be saved in database.
|
| So how to check and prevent a new record that has the same columns with
| same values tend to be saved in database?

Most database systems can handle that, be defining a UNIQUE constraint
on columns.

However, that means that A = 1 and B = 2 won’t work.

Different methods to access databases have different ways to handle
these kinds of collisions, or at least different ways to implement the
collision detection.

(For example, you can fetch all records, and check if they are already
set, but that is rather bad for memory consumption and speed, but might
be enough in your case.)

So, if you can tell us how you access your database (and what it is), we
can help you out much better.

Phillip G.
Twitter: twitter.com/cynicalryan

~ - You know you’ve been hacking too long when…
…your brain keeps hallucinating random “system error: collision with
stack heap” or “Guru Meditation Mode # Three billion and fifty-two,
press left button to continue” or even “This is not a DOS disk.” error
messages.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkf/zngACgkQbtAgaoJTgL/2mwCfU4Ndpir25pE11+V9JyTYvAmB
CAkAn0N6erBRi8ISLPD/ptHkM/GhFMGU
=oV1Q
-----END PGP SIGNATURE-----

limatsao · April 11, 2008, 11:03pm

On Apr 11, 2008, at 3:47 PM, Phillip G. wrote:

Most database systems can handle that, be defining a UNIQUE constraint
on columns.

However, that means that A = 1 and B = 2 won’t work.

Sure it will- you define the unique index to span the two columns, and
then only non-unoque combinations of those columns will fail.

Dave

limatsao · April 12, 2008, 1:01am

[email protected] wrote:

If you are using Rails and migrations

add_index(:table, [:column1, :column2], :unique => true)

if not something like this sql

CREATE UNIQUE INDEX index_name ON table(column1,column2);

Becker

Thanks for your help, but the situation in A=1, B=2 or A=2, B=1 that
wont be accepted if using unique.

My point is that when record(A=1, B=1) was existed in database, any new
record(A=1, B=1) can not be valided. They can be (A=1, B=2), (A=1, B=3)
or (A=5, B=1) etc.

limatsao · April 12, 2008, 1:02am

Dave T. wrote:

On Apr 11, 2008, at 3:36 PM, Frank T. wrote:

So how to check and prevent a new record that has the same columns
with
same values tend to be saved in database?

Try creating a unique database index that spans the two columns.

Regards

Dave

Thanks for your help, but the situation in A=1, B=2 or A=2, B=1 that
wont be accepted if using unique.

My point is that when record(A=1, B=1) was existed in database, any new
record(A=1, B=1) can not be valided. They can be (A=1, B=2), (A=1, B=3)
or (A=5, B=1) etc.

limatsao · April 12, 2008, 1:10am

Thanks for your help. I just used Instant Rails 2.0 for developing.
Mongrel and Sqlite.

limatsao · April 11, 2008, 11:46pm

If you are using Rails and migrations

add_index(:table, [:column1, :column2], :unique => true)

if not something like this sql

CREATE UNIQUE INDEX index_name ON table(column1,column2);

Becker

limatsao · April 12, 2008, 1:20am

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Frank T. wrote:
| Thanks for your help. I just used Instant Rails 2.0 for developing.
| Mongrel and Sqlite.

As far as I can see, you have to iterate of all the present records to
check for uniqueness of values, since SQLite3 doesn’t allow for a
multi-column uniqueness.

With MySQl and other RDBMSes, Dave T.’ idea should work (I don’t
know, since I never had to handle this situation yet myself).

Phillip G.
Twitter: twitter.com/cynicalryan

Rule of Open-Source Programming #13:

Your first release can always be improved upon.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkf/8hUACgkQbtAgaoJTgL88EgCfWkL9pAPPKoEJ/8fBNWdaoa48
PbUAoINqSXEudAti8l6/YlauOAgPp1/f
=ZaP/
-----END PGP SIGNATURE-----

limatsao · April 12, 2008, 6:35pm

On Fri, Apr 11, 2008 at 6:43 PM, Todd B. [email protected]
wrote:

You probably have a primary key constraint on one of the columns
(unless you are using kindergarten SQLite).

That probably sounded bad. Sorry.

good idea…

create table stuff (
id int not null primary key,
colA int not null,
colB int not null,
unique (colA, colB)
)

In rails, I should point out I have to use the integer ID for my
primary key, which irks me to no end because it goes against
everything I know about set theory. That whining aside, I might try
to fix it, but the underlying rails code is intimidating.

Todd

limatsao · April 12, 2008, 9:28pm

I like to use guids for ids.

create_table(:name,{:id=>false}) do |t|
t.column(:id,:string,:limit=>64)
end
execute(sql_statement_to_create_primary_key)

Becker

limatsao · April 12, 2008, 10:56pm

On Apr 12, 2008, at 10:34 AM, Todd B. wrote:

In rails, I should point out I have to use the integer ID for my
primary key, which irks me to no end because it goes against
everything I know about set theory. That whining aside, I might try
to fix it, but the underlying rails code is intimidating.

no db actually deals with sets - try to do a query and get the results
back in different order each time. in practice they are ordered
tuples of key value pairs and every db i’ve used is written with this
implicit understanding. having an id is no different that having an
‘updated_at’ field - it simply provides a total ordering which is
arbitrary. fighting the id is just swimming upstream to nowhere - i
know as i did it for a long time.

a @ http://codeforpeople.com/

limatsao · April 12, 2008, 1:44am

On Fri, Apr 11, 2008 at 6:01 PM, Frank T. [email protected]
wrote:

Thanks for your help, but the situation in A=1, B=2 or A=2, B=1 that
wont be accepted if using unique.

My point is that when record(A=1, B=1) was existed in database, any new
record(A=1, B=1) can not be valided. They can be (A=1, B=2), (A=1, B=3)
or (A=5, B=1) etc.

You probably have a primary key constraint on one of the columns
(unless you are using kindergarten SQLite). Example without correct
caps…

create table stuff (
colA int not null,
colB int not null,
primary key (colA, colB)
)

…or if you succumb to the idea integer IDs for the primary key are a
good idea…

create table stuff (
id int not null primary key,
colA int not null,
colB int not null,
unique (colA, colB)
)

Not tested.

Todd

limatsao · April 13, 2008, 2:15am

On Sat, Apr 12, 2008 at 3:55 PM, ara.t.howard [email protected]
wrote:

no db actually deals with sets - try to do a query and get the results back
in different order each time. in practice they are ordered tuples of key
value pairs and every db i’ve used is written with this implicit
understanding. having an id is no different that having an ‘updated_at’
field - it simply provides a total ordering which is arbitrary. fighting
the id is just swimming upstream to nowhere - i know as i did it for a long
time.

a @ http://codeforpeople.com/

I understand that, for example, postgresql and mysql have labeled all
things with an integer. But, in the table itself, umm, doesn’t make
much sense…

create table my_set (
id int not null primary key,
a int,
b int,
);

insert into my_set values (1, 1, 1);

insert into my_set values (2, 1, 1);

It gets even worse with built in functions like auto_increment, where
you’ve tossed all identity of the tuple to the wind (think of moving
such a table to another db).

I’m still a firm believer in using logical primary keys instead of
arbitrary ones anyway, even if it takes a little more effort, but I’m
willing to listen to other ways

Todd

limatsao · April 13, 2008, 12:09am

Hi,

Excerpts from shuhao.tsao’s message of Fri Apr 11 16:02:10 -0700 2008:

Dave

Thanks for your help, but the situation in A=1, B=2 or A=2, B=1 that
wont be accepted if using unique.

My point is that when record(A=1, B=1) was existed in database, any new
record(A=1, B=1) can not be valided. They can be (A=1, B=2), (A=1, B=3)
or (A=5, B=1) etc.

You could try creating a third column. Let’s suppose that you know that
all A & B values are 6 digits or less, create a unique string column
that will then contain the value of A & B sorted, zero padded and
joined.

u = [a, b].sort.map { |i| “%06d” %i }.join

So, the database would contain 3 columns: a, b, u with u being defined
as unique in the database.

So, with a = 1; b = 2 we get u = “000001000002”

or with a = 2; b = 1 we also get u = “000001000002”

limatsao · April 13, 2008, 3:38am

On Sat, Apr 12, 2008 at 7:14 PM, Todd B. [email protected]
wrote:

I understand that, for example, postgresql and mysql have labeled all
insert into my_set values (1, 1, 1);

insert into my_set values (2, 1, 1);

It gets even worse with built in functions like auto_increment, where
you’ve tossed all identity of the tuple to the wind (think of moving
such a table to another db).

I’m still a firm believer in using logical primary keys instead of
arbitrary ones anyway, even if it takes a little more effort, but I’m
willing to listen to other ways

I’ll clarify a little (I may have told this little story before). I
once worked for a company that had complete confidence in its ERP
database (to be nameless). Suddenly, one day, there’s about a half
mil missing for no real reason except for what the reports said.
Having been embezzled before, they wanted to find the culprit. The
culprit turned out to be the database schema (and they were so ready
for a witch hunt).

Todd

limatsao · April 13, 2008, 3:49am

On Sat, Apr 12, 2008 at 8:37 PM, Todd B. [email protected]
wrote:

to fix it, but the underlying rails code is intimidating.
a @ http://codeforpeople.com/
);
arbitrary ones anyway, even if it takes a little more effort, but I’m
Todd
Wow. That sounded incriminating, didn’t it! My apologies for the
noise, but it’s true. The company I worked for lost a bunch of money
because of a culmination of practices. They were paying for new part
revs and the db didn’t reflect it. “Bad” shipments were dropped/lost
in the shipping part of the database. Also, it was full of tables
called: 12400010, 12400011, etc. and the relations were not even
close to sane. The “help” was paid about $1000/hr to set it up.

Todd

limatsao · April 13, 2008, 4:04am

On Apr 12, 2008, at 7:48 PM, Todd B. wrote:

The “help” was paid about $1000/hr to set it up.

guess we have something to learn from them!

a @ http://codeforpeople.com/

limatsao · April 13, 2008, 4:02am

On Apr 12, 2008, at 6:14 PM, Todd B. wrote:

It gets even worse with built in functions like auto_increment, where
you’ve tossed all identity of the tuple to the wind (think of moving
such a table to another db).

yeah that’s valid. i have at least one db where every id is a
globally unique uuid - and by globally i mean according to rfc. still

olddb.id

is a valid concept…

a @ http://codeforpeople.com/

limatsao · April 13, 2008, 4:55am

On Sat, Apr 12, 2008 at 9:02 PM, ara.t.howard [email protected]
wrote:

On Apr 12, 2008, at 7:48 PM, Todd B. wrote:

The “help” was paid about $1000/hr to set it up.

guess we have something to learn from them!

Yeah, it’s funny. That may sound like an exaggeration, but after you
add up the flight cost and motel stay, it probably was more like
$1200/hr. The guy worked about 4 hours per day for a total of 3 or 4
days.