Mongoose 0.2.5 - The "Two Steps Forward, One Step Back"

You can download it from: http://rubyforge.org/projects/mongoose/

What’s New

Well, there’s a lot of new stuff in this release, and some old stuff put
back in, as well. John L. pointed out that the use of #instance_eval
in the 0.2.0 release would be problematic because the query block would
no longer have access to instance variables from the calling object.
After looking at this all weekend and getting feedback from a number of
people, I have decided to go back to the previous query syntax whereby
you specify the table class as a block parameter and qualify each column
name with the table class name. So, it’s back to:

Dog.find { |dog| dog.breed == "German Shepard" }

instead of:

Dog.find { breed == "German Shepard" }

Besides this one step back, there have been a lot of steps forward. The
query engine code is totally re-written, thanks to input and code ideas
from Logan C… I have added a bunch of methods from ActiveRecord,
including dynamic finder methods like Table.find_by_user_name.
Additionally, I have added Table.import and Table.export methods that
allow you to get data in and out of a table via CSV. So, grab the
latest release and let me know what you think.

Documentation is still light, so the best way to learn is to look in the
“example” directory and at the unit tests.

What is Mongoose

Mongoose is a database management system written in Ruby. It has an
ActiveRecord-like interface, uses Skiplists for its indexing, and
Marshal for its data serialization. I named it Mongoose, because, like
Rudyard Kipling’s Rikki-Tikki-Tavi, my aim is for it to be small, quick,
and friendly.

You can find rudimentary documentation in the README file and some
sample scripts in the example directory.

Jamey C.
[email protected]

Jamey C. wrote:

you specify the table class as a block parameter and qualify each column
from Logan C… I have added a bunch of methods from ActiveRecord,
Mongoose is a database management system written in Ruby. It has an
ActiveRecord-like interface, uses Skiplists for its indexing, and
Marshal for its data serialization. I named it Mongoose, because, like
Rudyard Kipling’s Rikki-Tikki-Tavi, my aim is for it to be small, quick,
and friendly.

You can find rudimentary documentation in the README file and some
sample scripts in the example directory.

Jamey C.
[email protected]

First off, great job on Mongoose, it looks very promising.

I had a quick/vague question, wondering if you could shed any light on
it. Off the top of your head, how do you feel Mongoose would perform
as the backend storage for an object persistance system, where the
objects in question are very large (i.e. marshalling to disk creates a
40MB file)?

Jamey C. wrote:

Is each object 40MB? (God, I hope not!) Or are you saying that, the

of the original message, including attachments.

Sure, I certainly appreciate your help. I have an array that is 40MB
when marshalled to disk. The array contains 200 objects of approx.
equal size. These objects could certainly be further broken up, but I
would prefer not to.

Ideally, the entire array would be marshalled as one piece, with a
transparent object-persistance mechanism. However, I’m open to
marshalling its elements individually, which would probably be the best
way to go, for a number of reasons.

I’m currently using a home-grown “persistant array,” where each array
element is marshalled to a seperate file and stored that way.
Obviously, this is non-ideal, which is why I’m looking at other
options. Mongoose looks like a great compromise between the flexibility
of marshalling and storing in the file system and the helpfulness of
storing in a DB.

I guess I don’t really have a question anymore, it all seems pretty
straightforward at this point. Thanks for your help, and thanks for
taking the time to write and release Mongoose.

[email protected] wrote:

I had a quick/vague question, wondering if you could shed any light on
it. Off the top of your head, how do you feel Mongoose would perform
as the backend storage for an object persistance system, where the
objects in question are very large (i.e. marshalling to disk creates a
40MB file)?

Is each object 40MB? (God, I hope not!) Or are you saying that, the
total size of all the objects saved to disk is 40MB?

If it’s the latter, Mongoose might work just fine. In a test run I had
an 8MB file that had about 80,000 records (obviously, lots of small
records). Mongoose did a #find for a particular record, grabbed it,
instantiated an object from the Marshaled data, and returned it to me,
in something like 0.003 seconds.

A little more info on your data would help me give you a more accurate
answer.

Jamey

Confidentiality Notice: This email message, including any attachments,
is for the sole use of the intended recipient(s) and may contain
confidential and/or privileged information. If you are not the intended
recipient(s), you are hereby notified that any dissemination,
unauthorized review, use, disclosure or distribution of this email and
any materials contained in any attachments is prohibited. If you receive
this message in error, or are not the intended recipient(s), please
immediately notify the sender by email and destroy all copies of the
original message, including attachments.

Jamey C. wrote:

you specify the table class as a block parameter and qualify each column
from Logan C… I have added a bunch of methods from ActiveRecord,
Mongoose is a database management system written in Ruby. It has an
ActiveRecord-like interface, uses Skiplists for its indexing, and
Marshal for its data serialization. I named it Mongoose, because, like
Rudyard Kipling’s Rikki-Tikki-Tavi, my aim is for it to be small, quick,
and friendly.

You can find rudimentary documentation in the README file and some
sample scripts in the example directory.

Jamey C.
[email protected]

Got another question for you, if you don’t mind sparing a couple
minutes. Thanks in advance for taking the time.

What would be the best/easiest way to accomplish the intent of the
following code?

require ‘mongoose’

Create a class for your table.

class Thing < Mongoose::Table
end

Create a database instance.

db = Mongoose::Database.new

Create new table. Notice how you specify whether a column is indexed

or not.
db.create_table(:thing) do |tbl|
tbl.add_column(:foo,:dunno_what_to_put)
end

Add a record. You can also use #create.

rec = Thing.new
rec.foo = (1…100).to_a
rec.save

puts Thing.find.first.foo.size #100

Close database. This will write the indexes out to disk so they can

be

initialized quickly next time.

db.close

On Jul 27, 2006, at 9:45 PM, [email protected] wrote:

would

including dynamic finder methods like Table.find_by_user_name.
Mongoose is a database management system written in Ruby. It has an
Jamey C.

Create a class for your table.

tbl.add_column(:foo,:dunno_what_to_put)
be

initialized quickly next time.

db.close

It’s a (semi-)relation db. Store the array as a set of rows.

Something like:

class ArrayTable < Mongoose::Table
end

db.create_table(:array_table) do |tbl|
tbl.add_column(:array_id, :array_position, :array_value)
end

array_id = 1

(1…100).each_with_index do |value, position|
ArrayTable.create :array_id => array_id, :array_position => position,
:array_value => value
end

Jamey C. wrote:

rec.save
stay close to ActiveRecord’s api. The funny thing is, Mongoose
HTH,

Jamey

Yup, I understand i could relational-ize the array, it was a bad
example. Pretend i said an instance of some arbitrary Foo class
instead of an array. the point is I want to serialize arbitrary data,
taking advantage of marshaling.

I tried the :string method. I get a dump format error when trying to
read it back out of the found record.

I’m familiar with the serialize functionality in activerecord, that’s
the basic idea. I think the problem is basically that the proper
escaping for storing dump strings isn’t being done. I’ll try and see
if I can hack something in to make this work.

[email protected] wrote:

rec.foo = (1…100).to_a
the db. I am thinking of supporting that method, just because I want to

read it back out of the found record.

I’m familiar with the serialize functionality in activerecord, that’s
the basic idea. I think the problem is basically that the proper
escaping for storing dump strings isn’t being done. I’ll try and see
if I can hack something in to make this work.

The AR serialize method does some serializing via YAML. From the docs:

Specifies that the attribute by the name of attr_name should be
serialized before saving to the database and unserialized after loading
from the database. The serialization is done through YAML.

I don’t know if that would be my prefered way of handling it. I’m
using this functionality in an app presently and it works fine, but it
seems uneccesary to me (unless I’m missing something, which is
certainly possible).

[email protected] wrote:

puts Thing.find.first.foo.size #100

Logan already gave you one way to do this, and, probably from a purely
relational perspective, his suggestion is the right way to do it.

Additionally, I know that ActiveRecord has the serialize class method
that you can use to have the object specified Marshaled into and out of
the db. I am thinking of supporting that method, just because I want to
stay close to ActiveRecord’s api. The funny thing is, Mongoose
already Marhsals all of the data (that’s how it stores the table
records). Right now, I don’t think I am doing much data checking in the
#save method, so you could probably get away with declaring the data
type as :string and then just going ahead and saving the array to
rec.foo like you are doing in the example. It should work.

Of course, in an upcoming version, I will make all of this work the way
it should, i.e. do more data type checking in #save and also give you
the ability to specify which columns you want to serialize.

HTH,

Jamey

[email protected] wrote:

rec.foo = (1…100).to_a
the db. I am thinking of supporting that method, just because I want to

read it back out of the found record.

I’m familiar with the serialize functionality in activerecord, that’s
the basic idea. I think the problem is basically that the proper
escaping for storing dump strings isn’t being done. I’ll try and see
if I can hack something in to make this work.

FYI, from what I can tell, at least for the array (1…9).to_a, writing
the marshal dump to mongoose and them reading it back makes 2 changes

  1. adds a \r where the string contains a \n
  2. removes a trailer \016 (shift out)

This is what causes the dump format error

[email protected] wrote:

FYI, from what I can tell, at least for the array (1…9).to_a, writing
the marshal dump to mongoose and them reading it back makes 2 changes

  1. adds a \r where the string contains a \n
  2. removes a trailer \016 (shift out)

This is what causes the dump format error

I’ll take a look at this over the weekend and see if I can get it
working.

Jamey

Confidentiality Notice: This email message, including any attachments,
is for the sole use of the intended recipient(s) and may contain
confidential and/or privileged information. If you are not the intended
recipient(s), you are hereby notified that any dissemination,
unauthorized review, use, disclosure or distribution of this email and
any materials contained in any attachments is prohibited. If you receive
this message in error, or are not the intended recipient(s), please
immediately notify the sender by email and destroy all copies of the
original message, including attachments.

Jamey C. wrote:

intended recipient(s), you are hereby notified that any dissemination,
unauthorized review, use, disclosure or distribution of this email and
any materials contained in any attachments is prohibited. If you
receive this message in error, or are not the intended recipient(s),
please immediately notify the sender by email and destroy all copies
of the original message, including attachments.

That would be awesome. I didn’t mean to be “demanding” anything, you’re
obviously doing this for free. I appreciate your help, and if I come up
with anything today, I’ll send off another e-mail.

Mike H. wrote:

That would be awesome. I didn’t mean to be “demanding” anything,
you’re obviously doing this for free. I appreciate your help, and if
I come up with anything today, I’ll send off another e-mail.

Hey, no problem. I appreciate you taking the time to try out Mongoose
and give me feedback.

By the way, I did glance at the code and I think I might have found the
problem. Each record is marshaled to disk as an array. When I read a
record back from disk, I do a #flatten on it before creating the
object. That’s probably where the problem is, since #flatten is
recursive, it’s going to flatten both the record’s array, and the array
you stored in one of the fields.

So, I will try changing this code to not use #flatten and see if what
happens.

Jamey

Confidentiality Notice: This email message, including any attachments,
is for the sole use of the intended recipient(s) and may contain
confidential and/or privileged information. If you are not the intended
recipient(s), you are hereby notified that any dissemination,
unauthorized review, use, disclosure or distribution of this email and
any materials contained in any attachments is prohibited. If you receive
this message in error, or are not the intended recipient(s), please
immediately notify the sender by email and destroy all copies of the
original message, including attachments.

Jamey C. wrote:

By the way, I did glance at the code and I think I might have found

I noticed that, actually did a search for “flatten,” but I don’t think
that’s it. I was able to do [1,2] successfully, but not (1…9).to_a. I
think it’s when the array gets long enough to neccesitate a carriage
return in the marshal, and when it gets read back there’s a CR and an
LF, instead of just a CR or LF (I forget which), in addition to the
missing \016 at the end. I’ve been poking it, ran a little “test” with
some output. This probably won’t help you but here it is. It’s storing
Marshal.dump((1…9).to_a) to the str column

Orig Dump: \4 \8 [ \14 i \6 i \7 i \8 i \9 i \10 i \11 i \12 i \n
i \14
From DB File 1: \4 \8 [ \14 i \6 i \7 i \8 i \9 i \10 i \11 i \12 i \n
i \14
From DB: \4 \8 [ \14 i \6 i \7 i \8 i \9 i \n \10 i \11 i \12
i \n i

the \ characters are the decimal number for the unprintable characeters
Orig Dump is Marshal.dump((1…9).to_a)
From DB File is a manual read from the database file, without using
Mongoose to read
From DB is find.first.str

I’ve got to get back to my “job,” but I’ll whip up some test cases
later.

Logan C. wrote:

I’m pretty sure this is a case of Jamey being lazy (no offense Jamey
). I just did a quick grep for open in the source, he doesn’t tack
on ‘b’ to his flags (when he almost certainly should when storing
binary data like a Marshaled string.) Let me guess, you are on
Windows, right? Hence the magical \r -> to \r\n transformation.
Oops! Guilty as charged. Ever since I switched to Ubuntu, the only
time I boot into Windows is to play games, so I obviously haven’t tested
this much under Windows!

Thanks for catching this Logan. I’ll make sure I add the ‘b’ flag for
the next release.

Jamey

Confidentiality Notice: This email message, including any attachments,
is for the sole use of the intended recipient(s) and may contain
confidential and/or privileged information. If you are not the intended
recipient(s), you are hereby notified that any dissemination,
unauthorized review, use, disclosure or distribution of this email and
any materials contained in any attachments is prohibited. If you receive
this message in error, or are not the intended recipient(s), please
immediately notify the sender by email and destroy all copies of the
original message, including attachments.

On Jul 28, 2006, at 11:02 AM, Mike H. wrote:

Mongoose and give me feedback.

the sender by email and destroy all copies of the original
won’t help you but here it is. It’s storing Marshal.dump
characeters
Orig Dump is Marshal.dump((1…9).to_a)
From DB File is a manual read from the database file, without using
Mongoose to read
From DB is find.first.str

I’m pretty sure this is a case of Jamey being lazy (no offense Jamey
). I just did a quick grep for open in the source, he doesn’t
tack on ‘b’ to his flags (when he almost certainly should when
storing binary data like a Marshaled string.) Let me guess, you are
on Windows, right? Hence the magical \r -> to \r\n transformation.

Daniel B. wrote:

of is that some Win32 API functions used through win32api or win32ole
privileged information. Unauthorized use of this communication is
strictly prohibited and may be unlawful. If you have received this
communication in error, please immediately notify the sender by reply
e-mail and destroy all copies of the communication and any attachments.

Yup, when you add the b flag, it works (mostly)

obj.foo = (1…9).to_a still doesn’t work. When you try and read it
back, you get 1 (a Fixnum).

obj.foo = Marshal.dump((1…9).to_a) does work. Obviously you have to
Marshal.load on retrieval, but it works.

Awesome.

I’m going to add an :object type and transparent-ize the marshalling.

Jamey C. wrote:

Thanks for catching this Logan. I’ll make sure I add the ‘b’ flag for
the next release.

I wonder if we need to revisit the issue of making binmode the default.
I
don’t see a downside, really. The only thing I can think of is that
some Win32
API functions used through win32api or win32ole might get confused, but
I doubt it.

Regards,

Dan

This communication is the property of Qwest and may contain confidential
or
privileged information. Unauthorized use of this communication is
strictly
prohibited and may be unlawful. If you have received this communication
in error, please immediately notify the sender by reply e-mail and
destroy
all copies of the communication and any attachments.

Jamey C. wrote:

Awesome.
is for the sole use of the intended recipient(s) and may contain
confidential and/or privileged information. If you are not the
intended recipient(s), you are hereby notified that any dissemination,
unauthorized review, use, disclosure or distribution of this email and
any materials contained in any attachments is prohibited. If you
receive this message in error, or are not the intended recipient(s),
please immediately notify the sender by email and destroy all copies
of the original message, including attachments.

I’ll check out your e-mail, per our conversation off-list.

When you replace the flatten call with a flatten_top call, everything
works transparently, like you predicted before.

My “implementation” of flatten_top:

class Array
def flatten_top
inject([]) { |s,i| s.concat(i) }
end
end

Mike H. wrote:

Did you get the email I sent to you directly? I attached a test version
of Mongoose that added a new data type called :undefined (although I
like :object better).

Jamey

Confidentiality Notice: This email message, including any attachments,
is for the sole use of the intended recipient(s) and may contain
confidential and/or privileged information. If you are not the intended
recipient(s), you are hereby notified that any dissemination,
unauthorized review, use, disclosure or distribution of this email and
any materials contained in any attachments is prohibited. If you receive
this message in error, or are not the intended recipient(s), please
immediately notify the sender by email and destroy all copies of the
original message, including attachments.