How to make Ruby _THE_ scripting language of choice, fold in SQLite

amin1426 · September 26, 2008, 1:44am

Much has been written about the OOP / RDBMS impedance mismatch…

But when it comes down to it, SQL wins for certain tasks, and
ruby/perl/python wins for other tasks.

Sure we have adapters that provide access from Ruby to SQL databases,
but at some primitive level… the query is written as a string, the
string is fed to a SQL interpreter, the query is optimized by the SQL
engine and the results returned.

In some cases the results are returned as strings in other cases
various levels of conversion can be ommitted.

The fact is the function call notation is insufficiently rich to
specify an arbitary SQL statement without reducing to the (very)
lowest common denominator (string) and then invoking a (heavyweight)
SQL interpretor.

Suggestion to make Ruby the undisputed leader of the scripting
languages…

Extend Ruby syntax to accept a useful subset of SQLite SQL.
incorporate the appropriate bits of SQLITE into the Ruby
interpretor to evaluate it.
and perhaps optionally generate string queries for
(mysql/postgres/oracle…) backends.
provide a means of transparently mapping certain classes to
tables so that they are automagically persisted at commit points,
on garbage collection or program exit, and automagically
revivified on reference.
Hide the distinction between Object “handles” and foreign
keys. ie. If instance variable @c of an object of class A is
a reference to an instance of class / table C, then
following that reference may cause that row of table C to be
looked up and revivified as an instance of class C.

Attaches, unless told otherwise to sqlite db FILE.db

require ‘rubysql’

class Customer
attr_primary_key :name
attr_reader :street, :city

Map this class to a sql table called Customer with primary key :name

and fields street and city.
include persistent
end

class Invoice
attr_foreign :customer, Customer

end

It’s persisted, so even if this appears a useless and can be garbage

collected, it will at least be added to Customer table

Customer.new( “Joe Blogs”, “52 hell”, “Doggcity”)

jane = Customer.new( “Jane Blag”, “52 Sell”, “Dodgcity”)

a = Invoice.new

The following three lines should be equivalent…

a.customer = “Jane Blag”
a.customer = jane
a.customer = select * from Customer where name == “Jane Blag”

An both should transparently and automagically allow …

assert_equal( “Dodgcity”, a.customer.city)

and a.customer.city be the same as…
select city from Customer where name == a.customer;

amin1426 · September 26, 2008, 1:54am

On Sep 25, 2008, at 5:43 PM, John C. wrote:

In some cases the results are returned as strings in other cases

attr_primary_key :name
end
a.customer = “Jane Blag” a.customer = jane
a.customer = select * from Customer where name == “Jane Blag”

An both should transparently and automagically allow …

assert_equal( “Dodgcity”, a.customer.city)

and a.customer.city be the same as…
select city from Customer where name == a.customer;

i think that’s looking backwards. something like couchdb takes only a
few hundred lines of code to interface with and already does a ton of
that - the rest would be trivial. imho sql is a dead-end road,
nevertheless much of what you are describing is already done by sequel
(the ruby orm).

cheers.

a @ http://codeforpeople.com/

amin1426 · September 26, 2008, 3:33am

On Thursday 25 September 2008 18:43:31 John C. wrote:

But when it comes down to it, SQL wins for certain tasks, and
ruby/perl/python wins for other tasks.

The tasks SQL wins for, it wins mostly because it is executed inside the
database engine. I doubt very much that, given the choice between
writing SQL
and writing Ruby, I would rather write SQL.

CouchDB will let me write custom queries in JavaScript.

And there was a similar, Ruby-based project (rddb?), but I haven’t seen
any
activity there in awhile. Still, the idea is the same – I can write
queries
in pure ruby.

at some primitive level… the query is written as a string, the
string is fed to a SQL interpreter, the query is optimized by the SQL
engine and the results returned.

In other words, you’re wanting to fundamentally change the language –
syntax,
parser, interpreter, top to bottom – for a performance hack, to make it
perform better with SQL databases.

SQL databases aren’t the only kind of database, nor is a database the
only way
to store data. Nor is SQLite the only SQL database, or necessarily the
best
to optimize for.

Suggestion to make Ruby the undisputed leader of the scripting languages…

I don’t think any of the other scripting languages are suffering greatly
because of lack of SQL.

In fact, Ruby has more often been slower than other languages, in my
experience. Often several orders of magnitude slower than Python, for
instance – and Python does have Django.

Consider that – Ruby sees plenty of production use despite being
slow.

Extend Ruby syntax to accept a useful subset of SQLite SQL.

Sequel does this, somewhat. So does ActiveRecord, for that matter.

Neither of which is bound to one database engine.

Now, yes, they currently generate strings, at some point. However, since
you
don’t have to directly interact with the strings, your step 2 is nothing
more
than a performance hack, which only applies to SQLite.

provide a means of transparently mapping certain classes to
tables so that they are automagically persisted at commit points,
on garbage collection or program exit, and automagically
revivified on reference.

In other words, an ORM.

Like ActiveRecord, DataMapper, Og, Sequel, etc.

Only difference you’re proposing is to make the persistence transparent.
Programmers working with ActiveRecord, in particular, don’t seem to
think
about concurrency issues enough – how many of us actually use
transactions?
But you’re suggesting we bury that problem even further in abstractions.

What happens when two programs try to access the same object?

Hide the distinction between Object “handles” and foreign
keys.

Yeah, like ActiveRecord, DataMapper…

The foreign keys are there if you need them, but you don’t have to deal
with
them directly.

It’s persisted, so even if this appears a useless and can be garbage

collected, it will at least be added to Customer table

Customer.new( “Joe Blogs”, “52 hell”, “Doggcity”)

Great – what happens if, somewhere very soon after this point, an
exception
is raised? What about validations?

Never mind that it’s kind of ugly that you’re using positional arguments
there…

In a modern ORM, I’d do something like:

cust = Customer.new( :name => ‘Joe blogs’)
cust.save!

Maybe an address wasn’t required, and this will work. Maybe an address
was
required, in which case, an exception will be raised, and the record
won’t be
saved. It’s useful to separate this from creation for other reasons, too

suppose I did provide an address, and now want to remove it:

cust = Customer.find_by_name ‘Joe blogs’
cust.address = nil
cust.save!

In this case, again, if address was required, I’ll get a validation
error, and
this particular change to the record won’t be saved.

amin1426 · September 26, 2008, 4:51am

On Sep 25, 2008, at 8:19 PM, Clifford H. wrote:

If by “transactions” you mean ActiveRecord::Base.transaction,
possibly no-one.
Attempting rollback on exceptions from a user-mode process without
using
a two-phase locking protocol (as many/most of AR’s adapters do) is
utterly
flawed and not transactional at all.

using them may not even help

http://drawohara.com/post/12411960/sane-transactions-for-activerecord

though i have not tested this lately…

a @ http://codeforpeople.com/

amin1426 · September 26, 2008, 4:20am

David M. wrote:

SQL databases aren’t the only kind of database, nor is a database the only way
to store data. Nor is SQLite the only SQL database, or necessarily the best
to optimize for.

However, most of the non-SQL things that are included in such
definitions
of “database” are not actually transactional, and hence don’t qualify
as databases in any true sense of the word. The purpose of a database is
to do things that filesystems do not - ACID. Any “database” that doesn’t
provide strong guarantees of ACID behaviours doesn’t deserve the title.

SQL sucks, but it is strongly identified with true transactional
behaviour,
an identification that only the dills who wrote MyISAM seem to have been
able to ignore and break down.

Programmers working with ActiveRecord, in particular, don’t seem to think
about concurrency issues enough – how many of us actually use transactions?

If by “transactions” you mean ActiveRecord::Base.transaction, possibly
no-one.
Attempting rollback on exceptions from a user-mode process without using
a two-phase locking protocol (as many/most of AR’s adapters do) is
utterly
flawed and not transactional at all.

Transactional is when you can pull the power cord at any time and your
data
is guaranteed consistent, and nothing that you committed has been lost.

Sorry to butt in like this, and you (David) probably know it all anyhow,
but
I wanted to reinforce the true meaning of “database” and “transaction”,
since
they’re so widely abused, berated and misunderstood.

Clifford H., Data Constellation.
Agile information Management and Design.

amin1426 · September 26, 2008, 5:17am

On Fri, 2008-09-26 at 08:53 +0900, ara.t.howard wrote:

i think that’s looking backwards. something like couchdb takes only a
few hundred lines of code to interface with and already does a ton of
that - the rest would be trivial. imho sql is a dead-end road,
nevertheless much of what you are describing is already done by sequel
(the ruby orm).

Sequel is fundamentally different than a traditional ORM, and in my
opinion, much better. SQLAlchemy is a conceptually similar project for
python, and both are much more promising than any ORM.

As far as SQL being a “dead-end road”, it needs to have a replacement,
first. Sequel/SQLAlchemy are a start in that direction, but the SQL
DBMSs provide a lot of things aside from just language.

Regards,
Jeff D.

amin1426 · September 26, 2008, 5:09am

On Fri, 2008-09-26 at 08:43 +0900, John C. wrote:

The fact is the function call notation is insufficiently rich to
specify an arbitary SQL statement without reducing to the (very)
lowest common denominator (string) and then invoking a (heavyweight)
SQL interpretor.

I suggest you look at Sequel http://sequel.rubyforge.org/

This isn’t just an ORM or DBI style layer. This is using Ruby syntax to
form relational expressions with an SQL flavor.

The python equivalent is SQLAlchemy.

These are really using the right approach, they do not try to map some
not-very-well-defined object model onto SQL, they just offer a SQL-ish
language inside of Ruby that generates real SQL, and sends it off for
evaluation.

Suggestion to make Ruby the undisputed leader of the scripting languages…

Extend Ruby syntax to accept a useful subset of SQLite SQL.

SQLite is pretty far from the SQL standard, last time I checked. I don’t
consider it the pinnacle of database semantics.

incorporate the appropriate bits of SQLITE into the Ruby
interpretor to evaluate it.

You haven’t shown why any of this needs to be in the interpreter
itself. What are you trying to accomplish, and why can’t you accomplish
it through Ruby’s rich extensibility?

and perhaps optionally generate string queries for
(mysql/postgres/oracle…) backends.

What is the language you’re trying to use for this purpose, and how is
this simpler than current systems that can generate strings to send to
SQL DBMSs?

provide a means of transparently mapping certain classes to
tables so that they are automagically persisted at commit points,
on garbage collection or program exit, and automagically
revivified on reference.

The devil is in the details. “Automagically”? Is it more automagical
than current solutions like AR?

What about constraints and concurrency? These automagical persistence
engines tend to have difficulty in a centralized environment unless your
data is 100% semantically independent.

Hide the distinction between Object “handles” and foreign
keys. ie. If instance variable @c of an object of class A is
a reference to an instance of class / table C, then
following that reference may cause that row of table C to be
looked up and revivified as an instance of class C.

You’re not the first person to have thought of this. The problem is,
there is a distinction between those two concepts, and for good
reason.

What about a summary object, that represents the result of some GROUP BY
query? What is the object ID of that, and what is the database key?

Attaches, unless told otherwise to sqlite db FILE.db

What about ruby code that’s not stored in a file?

The code example you gave looks pretty much like every other ORM. How is
yours fundamentally different, aside from not being able to connect to a
remote server?

Regards,
Jeff D.

amin1426 · September 26, 2008, 6:12am

On Thursday 25 September 2008 21:19:37 Clifford H. wrote:

David M. wrote:

SQL databases aren’t the only kind of database, nor is a database the only
way

to store data. Nor is SQLite the only SQL database, or necessarily the
best

to optimize for.

However, most of the non-SQL things that are included in such definitions
of “database” are not actually transactional, and hence don’t qualify
as databases in any true sense of the word.

However, many things that I would consider are, in fact, quite
transactional.
Not as much as they could be, perhaps, but close enough.

First example: Filesystems. With ordered write mode, you can simply
write to a
temporary file, then rename said temporary file into place. With
unordered
writes, add an fsync between those two steps.

Any modern, journaled filesystem will make sure that the rename either
succeeds or doesn’t, and you’ve made sure all data is successfully on
disk
before you rename.

Implementation: Maildir.

Second example: Amazon’s Dynamo. This powers S3 and SimpleDB, among
other
things. They’ve got a paper on it. Some implementations can, indeed, be
considered purely-ACID.

What sets it apart from (most) SQL is, conflicts are expected as part of
normal operation, and are left to the application to sort out. That is:
Rather than preventing anyone else from modifying a record while I
update it,
simply allow two versions of a record to exist, and provide common
algorithms
for either choosing which version “wins”, or for combining the two
versions
into a third.

Thus, inconsistency is allowed, but only temporarily. It is never
exposed to
the end-user, or even to (most of) the application.

Third example: Distributed version control, like Git. I mention this
mostly
because it reflects the same philosophy as Dynamo above, albeit with
more
human intervention – but also because this is the one most developers
are
likely to be intimately familiar with (or should be).

If by “transactions” you mean ActiveRecord::Base.transaction, possibly
no-one.

Precisely so.

I remember that one of the first things I wrote, when developing a new
Rails
app, was this method:

def double_save
transaction do
save(false)
yield
save!
end
end

This to allow the creation of circular record structures – for example,
every
domain must have an SOA record, and every record (of any kind) must have
a
domain. Since I’d made these fields NOT NULL on the database, I would
often
do this hack with new records of this kind:

d = Domain.new(…)
d.soa_id = -1
d.double_save do
d.soa = Soa.create!(…)
end

That -1, of course, would’ve killed my validations.

Attempting rollback on exceptions from a user-mode process without using
a two-phase locking protocol (as many/most of AR’s adapters do) is utterly
flawed and not transactional at all.

I actually don’t see a lot of that – more simple assumptions that
nothing
will go wrong, or that dangling records aren’t a problem.

Quite a lot of defensive programming, too, based on the idea that the
data
WILL get corrupted somehow, someday, and being able to handle corrupt
data is
more important than avoiding the corruption in the first place.

amin1426 · September 26, 2008, 5:58am

On Sep 25, 2008, at 9:16 PM, Jeff D. wrote:

the SQL
DBMSs provide a lot of things aside from just language

indeed. it’s sql that’s the issue imho, not acid properties.

a @ http://codeforpeople.com/

amin1426 · September 26, 2008, 6:17am

On Sep 25, 2008, at 9:34 PM, David M. wrote:

not trying to be difficult but…

Any modern, journaled filesystem will make sure that the rename either
succeeds or doesn’t, and you’ve made sure all data is successfully
on disk
before you rename.

Implementation: Maildir.

rename is not atomic on NFS, nor many other network filesystems. on
NFS you have only softlink and rmdir as atomic operations.

a @ http://codeforpeople.com/

amin1426 · September 26, 2008, 7:10am

ara.t.howard wrote:

On Sep 25, 2008, at 9:34 PM, David M. wrote:

First example: Filesystems. With ordered write mode, you can simply
write to a temporary file, then rename said temporary file into place.
With unordered writes, add an fsync between those two steps.
rename is not atomic on NFS, nor many other network filesystems. on
NFS you have only softlink and rmdir as atomic operations.

Yes, the Walter Tichy method, first used (to my knowledge) by him in
RCS.

However, without the fsync, its still not transactional, as you can pull
the power cord after the operation and there’s no guarantee that the
disk
blocks that make up the file content are actually on disk. All the FS
journalling does is to ensure that the block allocation tables and
inodes
are clean.

Also I don’t know the defined POSIX semantics now, but the system-wide
“sync” call would return as soon as every dirty block was in the disk
I/O
queues, and before they were on disk. You had to call it twice before
being sure, as the scan for dirty blocks couldn’t complete until the
queues were empty. (I don’t believe fsync has this flaw).

Nevertheless, though this method can solve the (D)urability problem of
ACID, it offers no solution to A, C, or I, so is still not
transactional.

Neither is Amazon’s approach, and neither is DVCS, despite their success
in solving different problems.

Clifford H…

amin1426 · September 26, 2008, 3:06pm

David M. wrote:

Nevertheless, though this method can solve the (D)urability problem of
ACID, it offers no solution to A, C, or I, so is still not transactional.

I might be labouring a point here, and I really don’t mean to
give you a hard time. However, perhaps I can clarify further…

Most or all transactional systems rely on the ability to perform
some finite-sized atomic operation on persistent storage. By finite
size, I mean more than just flipping a single bit. Often it’s
expected that if a disk block is written, the whole block is written.
Some systems put a LSN on the start and end of the block, and if
they’re both intact on read, they assume that all the stuff in
between got written too. If you have the memory bandwidth for it,
you can also use “torn page detection”, where the contents is
checksummed and written in a header, which does check every bit
to whatever level of confidence your checksum warrants.

This ability to perform single operations with ACID semantics is
assumed - because without it we can’t do any more - and hence
isn’t interesting. If the chunk you choose is a whole file, it
still isn’t interesting, because it’s still all or nothing on a
single object. No-one rewrites an entire database to save one
record.

The interesting cases arise when you must coordinate updates to
more than one such object. It’s then that isolation, consistency
and atomicity actually seriously mean something.

The examples you gave do not provide a use case where all 4 ACID
properties are supported in such multi-atom situations - they only
support or require some subset. There are interesting applications
that can live with such less-than-transactional behaviour, but
don’t confuse those implementations with databases. They’re useful,
but they’re not transactional in any non-trivial sense.

Clifford H…

amin1426 · September 26, 2008, 8:21am

On Friday 26 September 2008 00:09:50 Clifford H. wrote:

However, without the fsync, its still not transactional, as you can pull
the power cord after the operation and there’s no guarantee that the disk
blocks that make up the file content are actually on disk. All the FS
journalling does is to ensure that the block allocation tables and inodes
are clean.

Unless you’ve got data journaling and ordered writes.
I mentioned the ordered writes; I forgot to mention the data journaling.

Nevertheless, though this method can solve the (D)urability problem of
ACID, it offers no solution to A, C, or I, so is still not transactional.

Definitions paraphrased from Wikipedia:

(A)tomicity: Either all of the tasks are performed or none of them are.

In this case, I’m assuming that the temp directory is regularly cleaned,
often
on boot. But I’m not sure that’s relevant – extra tempfiles waste
space, but
they don’t constitute a half-finished transaction, if the actual data
being
operated on is the target file/directory.

Counting a dangling tempfile as a point against atomicity is a bit like
counting an unfinished journal (which must be replayed and likely rolled
back) the same way.

©onsistency: the database remains in a consistent state after the
transaction is over (whether successful or not).

Since only completely-flushed (fsynced) files are renamed, assuming
rename is
atomic, the directory in question will always be consistent – you’ll
never
have half an email actually committed in maildir.

Again, I’m not counting a dangling tempfile towards consistency any more
than
I would count a dirty journal which must be played back / cleaned up
after.

(I)solation: other operations cannot see or access the data in an
intermediate
state during the transaction.

Granted, if you really wanted to, you could access anothers’ tempfile –
but
then, if you really wanted to, you could read the database/filesystem’s
journal. But this would have to be deliberate – it is not difficult to
generate unique tempfile names, without knowledge of what other
processes are
doing. (See Maildir.)

I realize it’s an ugly, brutal hack, and I realize it’s ancient. I
wanted to
demonstrate that relational databases do not have a monopoly on
transactional
properties – not only that, such transactions are not actually all that
difficult to get right.

Neither is Amazon’s approach, and neither is DVCS, despite their success
in solving different problems.

I’m up too late to try to figure out what you mean here.
It’s entirely possible I’m wrong.

I suspect a more rigorous definition is in order, though. After all,
from
Wikipedia, Durability implies:

“once the user has been notified of success, the transaction will
persist, and
not be undone.”

DVCS and Amazon both allow for such transactions to be “undone”, in a
sense,
by further changes overwriting the result of the transaction. But the
transaction itself is never forgotten.

amin1426 · September 27, 2008, 2:10am

On Friday 26 September 2008 07:34:41 Clifford H. wrote:

David M. wrote:

Nevertheless, though this method can solve the (D)urability problem of
ACID, it offers no solution to A, C, or I, so is still not transactional.

This ability to perform single operations with ACID semantics is
assumed - because without it we can’t do any more - and hence
isn’t interesting. If the chunk you choose is a whole file, it
still isn’t interesting, because it’s still all or nothing on a
single object. No-one rewrites an entire database to save one
record.

No, the point here is that a file is not a whole database, but is rather
one
record. In this way, Maildir is a step up from mbox, in that maildir
allows
single emails to be created, destroyed, or moved, without rewriting the
entire database.

The interesting cases arise when you must coordinate updates to
more than one such object.

That invalidates the filesystem example, and (to some extent) S3, but
not
Dynamo or DVCS, as I understand it.

The examples you gave do not provide a use case where all 4 ACID
properties are supported in such multi-atom situations - they only
support or require some subset.

Again, it’s difficult to argue against this when you haven’t told me
which
ones aren’t supported, and why.

I’ll go read the Dynamo paper again, then…

amin1426 · September 28, 2008, 3:35am

David M. wrote:

That invalidates the filesystem example, and (to some extent) S3, but not
Dynamo or DVCS, as I understand it.

The examples you gave do not provide a use case where all 4 ACID
properties are supported in such multi-atom situations - they only
support or require some subset.
Again, it’s difficult to argue against this when you haven’t told me which
ones aren’t supported, and why.

DVCS exists because isolation isn’t needed, since tools like patch
can often resolve conflicts and when they can’t, a human can.

I’ll go read the Dynamo paper again, then…

From http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html:
“To achieve this level of availability, Dynamo sacrifices consistency
under certain failure scenarios.”

Clifford H…

amin1426 · September 28, 2008, 6:11am

John C. wrote:

Sure we have adapters that provide access from Ruby to SQL databases,
but at some primitive level… the query is written as a string, the
string is fed to a SQL interpreter, the query is optimized by the SQL
engine and the results returned.

I think the fundamental problem with changing this (and this applies to
most of the arguments I’ve seen in this thread) is that if you put the
burden of getting the data to disk on ruby’s doorstep, you’re begging
for incompatibility with other implementations (not everyone is going
to, or should have to, use ruby for everything).

The alternative is that ruby generates some kind of intermediate
language that’s sent to the database… We already have this, SQL. SQL
has fundamental problems that are largely borne by the competition in
RDBMSs out there, but it works, it’s not that hard to generate and
it’s nice to ensure your database interface isn’t causing you trouble.
Anyone that’s attempted to write bindings against libdb[2-4]? will
understand that very well. Additionally, while many think of ORMs when
tackling this problem, there are plenty of (perhaps advanced) reasons to
skip the ORM and write the query manually. Not everything in a database
is or even can be represented by the relationships that most ORMs
provide, and “not everything” gets considerably larger in size when
performance is an issue.

I guess what I’m saying is, the problems really don’t surround the
relationship between ruby and sql, or even ruby and the database, as
much as they involve the inherent inability of SQL to handle new
features and remain compatible with the specification that defines it.
It’s probably better handled there.

amin1426 · September 28, 2008, 5:30am

On Saturday 27 September 2008 20:34:32 Clifford H. wrote:

David M. wrote:

Again, it’s difficult to argue against this when you haven’t told me which
ones aren’t supported, and why.

DVCS exists because isolation isn’t needed, since tools like patch
can often resolve conflicts and when they can’t, a human can.

DVCS does, in fact, provide isolation (most checkouts are not public
until
pushed to a public server).

And tools like patch are severely limited by not actually knowing much
about
the application at hand. Most newer VCSes are much smarter about
resolving
conflicts, but at a certain point, since the VCS doesn’t actually know
what a
certain line of code does, it can’t necessarily know which version of
that
line is the correct one – it can only guess from context.

Compare this to application-specific conflict resolution – it’s quite
possible that simply examining the two conflicting states is enough to
decide
which is correct.

I’ll go read the Dynamo paper again, then…

From http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html:
“To achieve this level of availability, Dynamo sacrifices consistency under
certain failure scenarios.”

From the same article:

“Dynamo is used to manage the state of services that have very high
reliability requirements and need tight control over the tradeoffs
between
availability, consistency, cost-effectiveness and performance.”

But you’re right here, and I withdraw any claim of solid ACID properties
of
Dynamo. I do agree with this, though:

“Experience at Amazon has shown that data stores that provide ACID
guarantees
tend to have poor availability. This has been widely acknowledged by
both the
industry and academia.”

amin1426 · September 28, 2008, 11:55am

David M. wrote:

On Saturday 27 September 2008 20:34:32 Clifford H. wrote:

David M. wrote:

Again, it’s difficult to argue against this when you haven’t told me which
ones aren’t supported, and why.
DVCS exists because isolation isn’t needed, since tools like patch
can often resolve conflicts and when they can’t, a human can.

DVCS does, in fact, provide isolation (most checkouts are not public until
pushed to a public server).

That’s not the definition of isolation. Isolation means that
two transactions that update the same piece of data will only
ever do so in a way that guarantees that the resultant state
of the whole store is valid, assuming that the two transactions
were valid when run independently. DVCS cannot do this, and most
importantly, do not even attempt to, though it’s remarkably
effective at getting close most of the time.

Go check Wikipedia or anywhere else to verify this definition
before trying to create your own personal definition.

“Experience at Amazon has shown that data stores that provide ACID guarantees
tend to have poor availability. This has been widely acknowledged by both the
industry and academia.”

This doesn’t bear on the ACID properties, but for the record, many
DBMSs sustain high transaction loads with astonishing availability.
Consider for example the availability of most major stock exchanges,
which run transaction loads in the many hundreds per second, every
trading day of the year, with almost no lack of availability. Many
other databases run high loads 24x7 for years without being restarted.
That’s the sort of market where Tandem made their cash.

There’s no chance in a million years that Amazon will ever achieve
the same availability measures, and their claim is either radically
out of context or just plain lies.

Clifford H…

amin1426 · September 29, 2008, 12:31am

Thanks for the many responses, all made Good and perfectly valid
points… alas, mostly in the sense of very astute folk standing a
little too close to an elephant…

So rather going into a point by point rebuttal (since, as I said,
everybodies responses were perfectly valid and hence I have no
objection) I rather highlight the response that glimpsed the full
Elephant.

On Fri, 26 Sep 2008, Jeff D. wrote:

As far as SQL being a “dead-end road”, it needs to have a replacement,
first. Sequel/SQLAlchemy are a start in that direction, but the SQL
DBMSs provide a lot of things aside from just language.

Let’s face it there are a lot more SQL users out there than Ruby
users…pity SQL is so ugly.

Big Money brooks no dead ends. SQL is a huge, very useful, successful
money spinner. Hence it will be and is being extended both on the
client and server side to do what scripting languages like Ruby does.

No sane language designer would choose such a path, the SQL extended
languages I have seen are as ugly as sin… but as I say, nothing
stops the money.

So tell me, assuming you, like millions of developers worldwide, have
to use some form of SQL database.

Which would you rather code in?

One of the SQLextended (distended)? abortions that comes with the DB?

Or Ruby?

Silly question (in this forum):-). The answer obviously is Ruby.

Is the integration between Ruby and SQL as slick as you could imagine?

The current adapters are damn Good, but not as slick as I could imagine.

Neither in terms of syntax, Object / Relational integration, exception
handling nor performance.

Can you use Ruby in your server side stored procedures? Probably not.

So let’s talk a bit about what SQL gives, that I’d like to see cleanly
and comfortably and slickly integrated in ruby…

Relational Data model. Yup. I’m aware of the perfectly valid
criticisms that SQL doesn’t go anywhere near close enough to Codd’s
Laws… but it’s one of the closest practical things that does. And
yes, I’d love it if Ruby had a true relational algebra embedded
within it.

Despite what various flavours of Object Oriented Analysis guru’s
will tell you…

A class is not a relation, an Object is not a row, a Hash map
from primary key to class instances is not a table. So ORM / Db
adapters try mash these notions together, but the semantics aren’t
quite the same.

I bet we could use Ruby Mixins to designate and extend certain
classes as tables.

I bet we could find a really Good Ruby way of defining the Data
Model that would be better than “create table”. Indeed some of the
ORM libraries mention are excellent examples of this.
Data Persistence. Yip, marshal and restore gives us that. But is
that enough?
- Indexes / updates and deletes?
- Concurrent access?
- Even if an application can persist it’s object model it’s
  incredibly useful to query, and access that same data model from
  a
  query tool, or (perhaps indirectly via view or a projection) by
  another (loosely related) application.
Data Model / Database Management. Whoops. Adding / deleting /
renaming columns to a data table created via marshal / restore is
painful.
Set operations (on things too big to fit in ram). We’re Ruby
programmers. We think procedurally. Can I recommend the book “The
Art of SQL”? The Art of SQL - Stephane Faroult, Peter Robson - Google Books

It is a Good Reminder that screens and screens of procedural code
can be
rewritten as set op one liners.
ACID - Well, actually that would be kind of handy in a number of
other areas not traditionally “Database”.
Query Language - selects, joins, group by’s, sorts. Rich and
expressive stuff indeed.
Client / Server / Concurrency - Actually Ruby is very Good at that
already, except there is no support for concurrency at the marshal
and restore level.

Does this involve a multiyear rewrite of the core of Ruby?

Nah.

It involves folding in preexisting public domain code. Sqlite. No,not
at the C API level…
C/C++ Interface For SQLite Version 3 (old)

but folding in the appropriate parts of (select,update,delete,insert) of
Sqlite’s parse.y
http://www.sqlite.org/br3317/artifact/405
into
http://svn.ruby-lang.org/cgi-bin/viewvc.cgi/trunk/parse.y?view=markup

John C. Phone : (64)(3) 358 6639
Tait Electronics Fax : (64)(3) 359 4632
PO Box 1645 Christchurch Email : [email protected]
New Zealand

amin1426 · October 1, 2008, 8:02am

On Sunday 28 September 2008 17:30:02 John C. wrote:

Big Money brooks no dead ends. SQL is a huge, very useful, successful
money spinner. Hence it will be and is being extended both on the
client and server side to do what scripting languages like Ruby does.

No sane language designer would choose such a path, the SQL extended
languages I have seen are as ugly as sin… but as I say, nothing
stops the money.

In other words: Much like COBOL.

COBOL is, in fact, very successful. There’s a huge amount of old COBOL
code
lying around, and there’s quite a lot of high-paying jobs for
maintaining it.
Because of the sheer size of these codebases, it’s been extended to do
all
kinds of things.

But it’s still COBOL, and still ugly as sin.

So tell me, assuming you, like millions of developers worldwide, have
to use some form of SQL database.

To continue the analogy, suppose I had to code in COBOL.

What are the reasons I would have to use COBOL?
Most likely, it’s because I’m supporting a legacy app.

When was the last time you had to maintain a legacy Ruby app?
Generally, if I’m being paid to write Ruby, it’s because I’m developing
a
relatively new app, and with a corporation that has the presence of mind
to
choose the best tool for the job – not the most popular tool for the
job.

So, if I’m being paid to write a brand-new Ruby app, don’t you think
there’d
be discussion of whether or not to use a SQL database? Of whether a
document-based database might be a better choice?

In my discussion with Clifford H. on this thread, I learned that
there are
applications which are suitable to a traditional transactional database
(probably SQL), and not to a document-based database. I would guess that
such
applications are still the vast minority.

Is the integration between Ruby and SQL as slick as you could imagine?

Define “slick”.

The current adapters are damn Good, but not as slick as I could imagine.

Neither in terms of syntax, Object / Relational integration, exception
handling nor performance.

Extraordinary claims require extraordinary proof.

So let’s talk a bit about what SQL gives, that I’d like to see cleanly
and comfortably and slickly integrated in ruby…

Relational Data model. Yup. I’m aware of the perfectly valid
criticisms that SQL doesn’t go anywhere near close enough to Codd’s
Laws… but it’s one of the closest practical things that does. And
yes, I’d love it if Ruby had a true relational algebra embedded within
it.

Google isn’t giving me a lot of help here. What’s this about, exactly?
What
would it look like in Ruby?

So ORM / Db
adapters try mash these notions together, but the semantics aren't
quite the same.

Which differences in semantics are bothering you?

I bet we could use Ruby Mixins to designate and extend certain
classes as tables.

Or maybe inheritance.

I know! We can make all classes that want to be tables inherit from a
common
base class! I think I’ll call it ActiveRecord::Base.

I bet we could find a really Good Ruby way of defining the Data
Model that would be better than "create table". Indeed some of the
ORM libraries mention are excellent examples of this.

In other words, an ORM library could do this – nothing needs to be
changed in
Ruby itself.

Data Persistence. Yip, marshal and restore gives us that.

Marshal and restore aren’t generally used for persistence. Generally,
the ORM
in question provides its own semantics for that – certain properties of
it
correspond directly to columns, and telling the object to persist
updates the
row with those values.

But, suppose we’re talking about something like CouchDB, or RDDB:

 - Indexes / updates and deletes?

Done, done, and done. Calling “save” on any model will update the record
with
the values which have changed, or insert a new record if one doesn’t
exist.
Calling “destroy” will delete it.

 - Concurrent access?

This is a more fundamental problem, and it’s not unique to Ruby. But
there is
native (and easy) support for optimistic and pessimistic locking.

 - Even if an application can persist it's object model it's
   incredibly useful to query, and access that same data model from a
   query tool, or (perhaps indirectly via view or a projection) by
   another (loosely related) application.

Or within the same. But yes, that’s kind of inherent in the fact that a
SQL
database is used here:

Data Persistence. Yip, marshal and restore gives us that. But is
that enough?

Suppose we split it into one record per file. I’m not advocating this –
I
would much rather

 - Indexes / updates and deletes?

Indexes: Yes, maintained separately.

Updates: Unmarshal the record you need, change it, then re-marshal and
save.

Deletes: Delete

 - Concurrent access?

Can be done on a filesystem. Maildir gives a very simple example of how
this
might be done.

For that matter, concurrent access in most databases cannot be done more
finely than by table or by row. A system like flock should be more than
adequate for this purpose.

 - Even if an application can persist it's object model it's
   incredibly useful to query, and access that same data model from a
   query tool, or (perhaps indirectly via view or a projection) by
   another (loosely related) application.

Define “loosely related” – why not simply load up the model in question
into
this other application?

Data Model / Database Management. Whoops. Adding / deleting /
renaming columns to a data table created via marshal / restore is
painful.

Not really. Maybe if you’re marshalling/restoring the objects
directly…

But consider: Adding a “column” is trivial; you need only provide a
default
value, and adjust the model class to know about this. You don’t actually
have
to change any data at all, until people start setting values for
that “column”. Existing document-based databases work exactly this way.

Deleting a “column” could be done the same way – let the model class
know it
should be deleted.

Renaming would be the most difficult. Here, you’d have to version your
data,
and give the model class a procedure to run when loading records older
than a
certain version.

ACID - Well, actually that would be kind of handy in a number of
other areas not traditionally “Database”.

Not all apps need or want ACID. In fact, it’s much easier to scale most
apps
if you don’t make them ACID-compliant.

Query Language - selects, joins, group by’s, sorts. Rich and
expressive stuff indeed.

And confusing, archaic, and fundamentally different than how we program
everything else.

For that matter, we do have this, somewhat – we have Enumerable#select
and
Enumerable#sort.

Client / Server / Concurrency - Actually Ruby is very Good at that
already, except there is no support for concurrency at the marshal
and restore level.

No, it’s done by serializing with something else – generally, an ORM
like
ActiveRecord. But as I’ve shown above, you can, in fact, have
concurrency
with marshal and restore.

Does this involve a multiyear rewrite of the core of Ruby?

Nah.

It involves folding in preexisting public domain code. Sqlite.

If it is so trivial, fork Ruby and do it yourself. Show us that it can
be
done, and that it is useful, and that it’s worth polluting Ruby’s core
namespace with SQL junk.

And don’t be surprised if, when you’re done, nobody’s interested, or
nobody’s
willing to take the risk of breaking old code. After all, Ruby 1.9
barely
changes syntax at all, and look how long it’s taken – it’s more than
just a
technical problem. It’s a design problem.

but folding in the appropriate parts of (select,update,delete,insert) of
Sqlite’s parse.y

Ruby is already a powerful and expressive language for creating new
domain
specific languages. The Sequel library already provides a Ruby DSL for
SQL
queries.

There are many domains which might benefit from customizing Ruby’s
parser. I
imagine that Rake wouldn’t mind being able to do this:

foo.o: foo.c
system ‘gcc foo.c -o foo.o’

Not the best Make example, but you get the idea. Instead, they’ve chosen
to
leave it as Ruby, which means it looks like this:

file ‘foo.c’ => ‘foo.o’ do
system ‘gcc foo.c -o foo.o’
end

Much cleaner, and actually valid Ruby.

And you still haven’t addressed perhaps the most disturbing part of your
proposal: Even if we agree with you that SQL is so important that it
needs to
be part of the language, you’re essentially dictating one true SQL
engine and
flavor of SQL. Why should Ruby embed SQLite, when almost no Ruby
programs use
SQLite in production? What does this mean for users of other databases?

And what does this mean for those who don’t want to use SQL at all?
When I call Array#select, will it do what I expect?

Oh, and for what it’s worth: There’s a language called Judo, which
includes
SQL. I’m curious what you think of it:

http://judoscript.org/

How to make Ruby _THE_ scripting language of choice, fold in SQLite

Attaches, unless told otherwise to sqlite db FILE.db

Map this class to a sql table called Customer with primary key :name

It’s persisted, so even if this appears a useless and can be garbage

collected, it will at least be added to Customer table

The following three lines should be equivalent…

An both should transparently and automagically allow …

An both should transparently and automagically allow …

It’s persisted, so even if this appears a useless and can be garbage

collected, it will at least be added to Customer table

Maybe an address wasn’t required, and this will work. Maybe an address was required, in which case, an exception will be raised, and the record won’t be saved. It’s useful to separate this from creation for other reasons, too

Attaches, unless told otherwise to sqlite db FILE.db

Maybe an address wasn’t required, and this will work. Maybe an address
was
required, in which case, an exception will be raised, and the record
won’t be
saved. It’s useful to separate this from creation for other reasons, too