Sparse tables seems to be an impossible term to search on google


#1

without bringing up 3000 articles on low languages and theory which i
am not looking for.

i just want some pointers to effectively storing variable column
sparse tables in ruby/rails preferably with a lean towards pulling
them all together in a cohesive table…

afaik i need a table for my rows
a table for my columns
a table for cells.
and a table of “tables”

that seems like a really long and stupid way to store variable sized
tables…

is there a more generic/simple way to do this?

i realize this may be an edge case but in this situation, im 98% sure
this is what i need… just cant figure out how to make it work
effectively.


#2

ive asked this question a half a dozen different ways and none have
ever gotten an answer.

is it really such a bad idea? some feedback would be welcome… or
maybe a point in the direction of the appropriate forum for these
kinds of questions.

  • FJM

On Feb 9, 10:34 am, “removed_email_address@domain.invalid” removed_email_address@domain.invalid


#3

On Tuesday 10 February 2009 12:05 am, removed_email_address@domain.invalid wrote:

ive asked this question a half a dozen different ways and none have
ever gotten an answer.

(Just an aside–is everybody switching to top posting–where’s the
context, what’s the question–must I read the rest of the post to find
the question? (Sorry, I’m in a funky mood, on another list I read,
some guy quoted a long post, and somewhere in the middle of it posted a
short phrase response (less than a line). What a pain.)

Anyway, after skimming the rest of the post, I’m not entirely sure of
your question:

  • do you want pointers on how to search for “sparse tables” on
    google, or
  • do you want help on storing “variable column sparse tables”–ahh, I
    didn’t see the variable column part the first time I skimmed

Anyway, before I saw that, I was going to say that a relational data
base (which is what Rails uses (iiuc, something like MySQL, Postgresql
(??), or whatever, is something like a sparse table because tables
don’t have to be “complete”–for example, if the key field is some
numerical index, you can have records 001, 006, and 046–you don’t have
to have a line for every record from 001 thru 046.

But I guess that’s not what you mean by “variable column sparse tables”.
So what do you mean? You mean where (starting with my sparse example
above ;-), some records (lines) do not have a value for every column?
Maybe something like this:

View in fixed font:
key lastname firstname height weight hatsize shoesize
001 Mattia Frank 5’11" 7 1/4 10
006 Doe John 5’6" 165
046 Jane 5’5"

If that’s what you’re looking for, as far as I know, tables like this
can exist in a relational data base and not really cause any problem.
A pedant would (I think) make some comment about the data not being
fully normalized, and maybe some slightly less efficiency that would be
experienced if you had many (i.e., thousands, millions) of such
records.

OTOH, normalizing the database (typically) takes effort both by the
programmer / database administrator and in the database itself (running
queries or whatever to actually accomplish the normalization, so for
small quantities of data (I would argue that) it is not efficient to
normalize the data.

afaik i need a table for my rows
a table for my columns
a table for cells.
and a table of “tables”

I haven’t thought through what the above is about, but at a quick skim
it sounds like the kind of tables and effort that would be needed to
normalize the database. Like I said, for some definitions of small,
this is counterproductive for small quantities of data (imho).

If your question is something else, maybe you can clarify.

Randy K.

I didn’t have time to write a short letter, so I created a video
instead.–with apologies to Cicero, et.al.


#4

Once again I hit send a little premature.

What I’m really looking for is a technical explaination of the correct/
incorrect way to acheive this… I’m sure it’s a problem that someone,
somewhere had to solve once before and I’ve been trying to reinvent
it, as i said - with separate tables for the columns, column groups,
rows, tables… but in the end - merging all the tables together and
iterating over everything just seems to take forever… not in the
least bit efficient or reliable.

On Feb 10, 8:28 am, “removed_email_address@domain.invalid” removed_email_address@domain.invalid


#5

ah, thank you. any response - even yours - feels good. i realize this
isnt the most thoroughly explained post so here goes.

i have a need allow my program to create and store variable column
data much like the sense of a spreadsheet where at any given time,
only a handful of columns will be accessed - but over the lifetime of
the “spreadsheet” (table seems too broad a term for me), all the
columns and all the rows will slowly become filled… if you would
imagine this scenario:

UserA defines/“creates” TableOne.
TableOne has ColumnOne, ColumnTwo, ColumnThree, ColumnFour.
UserB needs access to ColumnOne and ColumnTwo recognized as
ColumnGroupOne.
UserC needs access to ColumnThree and ColumnFour recognized as
ColumnGroupTwo.

now the row count is generally fixed (not truely fixed but not
sporatic) but isnt known in advance. rows are not identified by their
primary key but by another column called “ToolNumber” which is
specified in the ColumnGroup.

UserA and UserB may be modifying a record that names the same
ToolNumber and ultimately, upon the table/spreadsheets completion -
will all be consolidated into one master sheet which contains all Rows
and ColumnGroups merged into one table displayed for the user or print
or whatever.

Hopefully that gives you a better idea of what im asking for.

Sparse tables may be a meaningful term but I am most certainly
stretching the definition so that may be a point of confusion and I
understand that.

Many thanks,

  • FJM

#6

On Tuesday 10 February 2009 08:31 am, removed_email_address@domain.invalid wrote:

What I’m really looking for is a technical explaination of the
correct/
incorrect way to acheive this… I’m sure it’s a problem that someone,
somewhere had to solve once before and I’ve been trying to reinvent
it, as i said - with separate tables for the columns, column groups,
rows, tables… but in the end - merging all the tables together and
iterating over everything just seems to take forever… not in the
least bit efficient or reliable.

Well, I sort of stand by my original response then. I mean, when you
have an unnormalized relational database and responses are too slow,
the typical recommendation (I think) is to normalize the database. I
won’t try to explain that here, you need to look it up. (Maybe someone
can explain it (and how to do it) simply, but I can’t, at least not at
this time.)

Normalizing the database is not the only way forward however, and I’d
ask how much data you have. For my only (in progress) application, a
relational database is just not the right fit, and in general slows
everything down (in comparison to plain text files and “ordinary” (and
indexed) searches). (My application has plain text files with
(currently) up to 5000 variable length records per file, totalling on
the order of 10,000,000 characters per file. At them moment I have
about 12 such files, although only two are that big. I plan to scale
to files as big as 100,000,000 characters without switching to a
relational data base (which I’m sure would slow down my application).

How much data will you have in this(these) table(s)? For a low quantity
of data, maybe even a spreadsheet “technology” would do the job?

Other solutions, like a separate machine (server) to handle the database
could help as well. I guess someone would need more information,
specifically about the quantity of data involved (now and in the
future).

Randy K.

I didn’t have time to write a short letter, so I created a video
instead.–with apologies to Cicero, et.al.


#7

ive toyed around with just using txt files but my limited
understanding of “proper technique” in dealing with them makes them
just as cumbersome…

im very familiar with normalization and if it was practical (and the
cost didnt outweigh the benefit) id make sure everything was
absolutely 6NF and then some… but coulda, woulda, shoulda… its not
practical… the best im shooting for is 3NF or 4NF but its not a
stringent requirement…

i guess you could say i know my way around databases, im just lost
with trying to implement this in a ruby way. my database breakdown
will probably look as follows (i think, unless someone can point me in
a better direction)…

over time there may be 5000 sheets… each sheet may have up to 20
columns. each column will eventually belong to exactly one group. each
group may have up to 400 “rows”… … so if a sheet has 4 columns and
2 groups like my prev. example and is filled to capacity… theres
going to be 400 rows for each set of groups… 800 rows… they need
to then be translated into one cohesive unit for display. the final
display will have all 4 columns separated into groups and “merged” so
all the “toolnumbers” line up in rows… displaying only 400 rows.

i think i understand the database side… im lost on the ruby
implementation (or any implementation)… is there a “most effective”
way to construct my relationships?

Sheets

  • id (int)
  • name (string)

Columns

  • id (int)
  • sheet_id (int)
  • column_group_id (int)
  • name (string)

ColumnGroups

  • id (int)
  • name (string)

Data

  • id (int)
  • sheet_id (int)
  • column_id (int)
  • tool_number (string)
  • value (int)

then i’ll have a possible array as such for a query like:
select tool_number, value from data where sheet_id = x

whats an effective way to iterate over the returned dataset and sort
it out into its corresponding columns column groups and rows… im
seeing a join in my head but i dont know on what.

:frowning:

hopefully my problem is becoming a little more clear… but the deeper
i dig the more i suspect theres an elegent solution im not advanced
enough to see.


#8

my bottleneck at the moment is “how do i properly merge all this quasi-
normal data into an array FROM a rails find”… i can find all the
sheets… i can “find” all the sheets columns, i can find all the
columns data… but i cant merge them into one cohesive structure
which can then be easily iterated over ie:

please take this for the psudocode that it is.

–table–
–thead–
–tr–
columngroups.each do |columngroup|
–th colspan=columngroup.column_count–
columngroup.name
–/th–
end
–/tr–
–tr–
columns.each do |column|
–th–
column.name
–/th–
end
–/tr–
–/thead–
etc…

my main problem is that the find returns the columns, columngroups
etc… are all in random order… i think i need them returned as a
key/value so i can align everything properly… but i dont know.

a sheet is just a way of giving all the columns/rows something to say
“hey, this is my parent object”…


#9

I am not an expert by any means on this, I am just trying to offer a
suggestion, but have you thought about storing the data as XML? If you
do it right you should be able to extract the information you need
very easily.

Pepe

On Feb 10, 10:35 am, “removed_email_address@domain.invalid” removed_email_address@domain.invalid


#10

(top posting and not snipping very much so that somebody else can pick
up the thread)–I think I’m out of my depth trying to deal with the
best (i.e., fastest) way to deal with your data in Ruby/Rails. (If I
was to hazard a guess, I might try dealing with each table as an array,
just for the sake of speed.)

OTOH, if this is a rails application, presumably there is a client and a
server (and TCP/IP communication between them)–is the database
processing on the server really the bottleneck?

At least one more comment interspersed below.

On Tuesday 10 February 2009 09:37 am, removed_email_address@domain.invalid wrote:

i guess you could say i know my way around databases, im just lost
display will have all 4 columns separated into groups and “merged” so
all the “toolnumbers” line up in rows… displaying only 400 rows.

Without a lot more thinking, I don’t fully follow the above description.
I guess “sheet” is the first thing that puzzles me–is a sheet a table,
or is there a table containing up to 5000 sheets? (Maybe the “schema”
you list below would answer that and my other questions, but it would
probably take me a while to puzzle it out–more time than I have atm.)

If I really wanted to understand it, I’d ask for an example using “real”
data–sheets, columns, and rows just confuse me (would that be
metadata?).

  • sheet_id (int)
  • column_id (int)
    :frowning:

hopefully my problem is becoming a little more clear… but the deeper
i dig the more i suspect theres an elegent solution im not advanced
enough to see.

Good luck!
Randy K.

I didn’t have time to write a short letter, so I created a video
instead.–with apologies to Cicero, et.al.


#11

ive considered it but i still come back to the fundamental problem of
sucking out all the information contained in one “sheet” and
organizing it into a manageable hash or array or whatever would work
best. i think thats where my greatest problem lies… taking the info
and turning it into one meaningful structure i can loop through.
having everything “sparse” (im using the term to mean… not every
column of every row will be filled in) seems to make it hard for me to
just merge arrays…

i think…

its getting harder and harder to describe hehehe.


#12

To me, there are two possibilities: either switching to a less-
normalized design (serialized columns) or
a different database idea entirely (CouchDB, for instance).

I haven’t used CouchDB personally, but check it out:

http://couchdb.apache.org/ (the main project)
http://github.com/arunthampi/activecouch/tree/master (an ActiveRecord-
like wrapper)

It’s a “document database”, which can store arbitrary key-value pairs
for each “document”.
Sounds like it might be a better match for what you’re trying to do.

Staying within Rails, if you don’t have a need to search by specific
column values (which you haven’t
mentioned), you could just store the data in a single serialized
field:

Sheets

  • id (int)
  • name (string)
    Columns
  • id (int)
  • sheet_id (int)
  • column_group_id (int)
  • name (string)
    ColumnGroups
  • id (int)
  • name (string)
    Data
  • id (int)
  • sheet_id (int)
  • tool_number (string)
  • row_data (text, serialized)

Then keep the data in Data#row_data, as a Hash of (column_name =>
value) [or column_id => value] pairs.
That would keep all of the ToolNumber stuff grouped together.

BTW, if you need columns in a particular order, take a look at
acts_as_list - it should work great for what
you’re doing here.

In conclusion (this has gotten rather long), I’d recommend first that
you take a look at CouchDB. It seems
like a much better match to the idea you’re looking for. The
serialized-column approach is a little more of a
hack, but should work if nothing else is possible.

–Matt J.


#13

I suspect something like this could work–may be worth a shot.

Models

class Table < AR:Base
has_many :column_groups, :order => ‘cg_num’
has_many :rows, :order => ‘rownum’
def columns
self.column_groups.map {|cg| cg.columns}
end
end

class ColumnGroup < AR:Base
belongs_to :table
has_many :columns, :order => ‘col_num’
end

class Column < AR:Base
belongs_to :column_group
has_many :cells, :order => ‘col_num’
end

class Row < AR:Base
belongs_to :table
has_many :cells, :order = ‘col_num’

def get_cell(col)
# there’s almost certainly a faster
# way to do this, but this is all that
# comes to mind.
cells.each do |cell|
return cell if cell.column == col
end
return " "
end

end

class Cell < AR:Base
belongs_to :row
belongs_to :column
end

Controller

class TableController < ApplicationController
def show
# :include may be useful here if perf is an issue
@table = Table.find_by_name(params[:name])
end
end

View

<% @table.column_groups.each do |colgroup| %> <% end # colgroup %> <% @table.columns.each do |col| %> <% end # col %> <% @table.rows.each do |row| %> <% row.table.columns.each do |col| %> <% end # cell %> <% end # row %>
><%= h colgroup.name %>
<%= h col.name %>
<%= h row.get_cell(col) %>
  • Roy

#14

@Matt
Thank you - the “document database” paradigm is exactly what i was
looking for but didnt know what to call it… Thank you and thank you
again…

@Roy
Thank you for the code snippet you whipped up. I havent had a moment
to work it out and play with it but it most certainly seems to be a
help. Again thanks for the effort.

@Randy,
Thank you for the direction you gave me. I def appreciate that you
took time out to go back and forth with me enough to get my whole
thought out. It was a huge help.

Now i think im going to try my hand at making a stripped down doc-
database implementation in ruby… couchdb looks awesome but waaay too
heavy for my needs.

thanks for the pointers guys


#15

I’m looking for something similar. I did find
https://github.com/webnuts/post_json which looks almost what I need.

Did you end up finding a gem or project that does what you described?