Finding Child Records Efficiently


#1

I have a family-tree structure to a person model.

I want to be able to find people by specifying an array that
corresponds to the family tree. So if Abe is grandpa, homer is dad and
bart is the son, bart’s array would be [abe,homer,bart].

To find bart, I can’t just use find_by_name(“bart”) as there might be
another person called bart (with differnt parents and therefore a
different array). So, to find this particular bart, I want to find abe
first, then find homer, then bart.

How can I do an ititial database query of find_by_name(“abe”) that
will find the “abe” record as well as all his descendant records, so
that any subsequent filtering is efficient and doesn’t hit the
database?

Hope that makes sense, cheers

DAZ


#2

DAZ wrote:

How can I do an ititial database query of find_by_name(“abe”) that
will find the “abe” record as well as all his descendant records, so
that any subsequent filtering is efficient and doesn’t hit the
database?

Efficient retrieval of all descendants or all ancestors
of a record requires the addition of nested-set capability.

One alternative is to make the ancestor array a string key
to each record (“abe|homer|bart”), allowing instant retrieval.

Another would be to build the sql iteratively:

ancestors = %w(abe homer bart).reverse
conds = {“people.name” => ancestors.shift}
joins = nil
ancestors.each_with_index do |name, i|
joins = joins.nil? ? :parent : {:parent => joins}
conds.update("#parents_people#{’_’ + (i+1).to_s if i!=0}.name" =>
name)
end

bart = Person.find(:first, :conditions => conds, :joins => joins)


Rails Wheels - Find Plugins, List & Sell Plugins -
http://railswheels.com


#3

DAZ wrote:

I have a family-tree structure to a person model.

I want to be able to find people by specifying an array that
corresponds to the family tree. So if Abe is grandpa, homer is dad and
bart is the son, bart’s array would be [abe,homer,bart].

Your requirement probably exceeds ActiveRecord’s DSL.

Even if you did this…

class Person
has_many :children
has_many :grand_children, :through => :children

(with the :class_name => Person and whatnot)

…you cannot stitch them together in a find statement. You could of
course
:include => [:children, :grand_children], but then there’s no way to
distinguish
from three instances of the same table in your :conditions fields. You
must hit
either ‘name’ or ‘persons.name’, but then ‘persons’ is ambiguous.

And even if all that worked, you then can’t handle a variable length
array.

The best bet is to count the elements in the array, then build a :joins
string
that pulls in and numbers the persons table that number of times. Sketch
what
you need in raw SQL first, such as in a query editor. Then build a
:conditions
string (possibly using named replacements - :conditions => [‘yack yack’,
{:name1
=> name[1], :name2 => name[2], etc}]) that stamps in each name.


Phlip


#4

On Dec 28, 8:07 pm, Mark Reginald J. removed_email_address@domain.invalid wrote:

One alternative is to make the ancestor array a string key
to each record (“abe|homer|bart”), allowing instant retrieval.

This seems like a relatively good idea, could have a string-key called
family_tree or something and just do find_by_family_tree(“abe|homer|
bart”)
This doesn’t quite feel right - it seems like the only info you should
need to keep is a person’s parent (from which you can then find their
parent and so forth). It might also lead to some very long strings
eventually!

Another would be to build the sql iteratively:

This is what I’m doing at the moment - I’m using the “betternestedset”
plugin, so have access to a “children” method that returns an arrary
of children. Will this have all been pre-fetched efficiently? And if
so, is the iterative code the way to do it?

tree = [“abe”,“homer”,“bart”]
person = Person.find_by_name(tree[0])
if tree.size > 1
1.upto(tree.size - 1) do |i|
person = person.children.find { |child| child.name == tree
[i] }
end
end
@person = person

Thanks for the help,

DAZ


#5

DAZ wrote:

On Dec 28, 8:07�pm, Mark Reginald J. removed_email_address@domain.invalid wrote:

One alternative is to make the ancestor array a string key
to each record (“abe|homer|bart”), allowing instant retrieval.

This seems like a relatively good idea, could have a string-key called
family_tree or something and just do find_by_family_tree(“abe|homer|
bart”)
This doesn’t quite feel right - it seems like the only info you should
need to keep is a person’s parent (from which you can then find their
parent and so forth). It might also lead to some very long strings
eventually!

I tend to like this idea as well, and is probably what I would do. The
caching of the ancestry would seem to be the most efficient. However,
there is one change I would make. I would reverse the ancestry so it
would be (“bart|homer|abe”). Doing this would make string comparisons
more efficient because you are looking for “bart” so having “bart” at
the beginning of the string may slightly increase the string comparison
efficiency.

This doesn’t quite feel right - it seems like the only info you should
need to keep is a person’s parent (from which you can then find their
parent and so forth). It might also lead to some very long strings
eventually!

This, however, I don’t agree with. You need the full ancestry to
whatever the “root” objects is. The idea is to make one comparison with
no joins. The only way to do that I can think of is to have all the
information available at the row level.

If you end up with a very deep hierarchy, which would result in very
long strings, you could consider hashing the strings. This way you would
have a consistent length string to compare.

Example MD5 Hash:
bart|homer|abe = a92b11363ef020716f1ce3104e0cb0d8 (32 chars)
bart|homer|abe|john|william|ted|sam|joeseph|adam|bart|bill|jack =
ff2f4ad181a029935b89a36c7cd1dfbe (32 chars)

Obviously in some cases your losing rather than gaining, but at least
the string is consistent in length no matter how long the input string
becomes.


#6

Thanks or these replies Robert and Mark!

I guess that keeping the ancestors as a string in the database makes
more sense than I first thought. It is interesting that Robert refers
to this as ‘caching’ the family-tree, which obviously it is, but I’ve
not really thought of caching information like this (ie database level
relationships) before. Consequently, as Mark says it needs to be
updated whenever a name changes, which shouldn’t be too difficult to
do (cycle through all the descendants and update that row?).

This has the added advantage of acting as a sort of permalink row as
well. Thinking about pretty urls, it wouldn’t be too hard to use this
string to generate:

http://appname/people/abe/homer/bart

This would be the url that took you directly to Bart’s url. Could I
use the ‘/’ character as a separator at the DB level?

Robert - I really the idea of hashing the string to limit the length,
I’ve never thought of this as a use for hashing before, thanks!

Mark - I’m not sure what you mean by this:

you should put the name match into the
SQL conditions rather than in a Ruby loop.

I guess that you mean to use SQL rather than looping through ruby
arrays, but I’m afraid my SQL is very poor - I usually only use the
Rails helpers to do basic stuff. I’m struggling to follow your
example, but I guess that it corresponds to what Philip was alluding
to by saying to construct a joins string?

…And if I used this SQL method, which would be the most
efficient? I’m erring towards the ‘cached’ string method at the moment
as it easily allows uniqueness validations and as I mentioned could be
used to form pretty urls.

Thanks so much again to both of you - I’m really learning lots more
than I anticipated from this thread.

cheers,

DAZ


#7

In a tree-based model, you have some tradeoffs if you want to retrieve
multiple values efficiently. Usually this is an insert-time vs query-
time tradeoff. For example, if you wanted to keep track of the number
of descendants of any particular node, you would either have to
traverse the node’s subtree to count all the children every time you
wanted that information, or whenever you insert a child, you add one
to the parent’s descendant-count field, then the grandparent’s, and so
on, eventually bubbling this information up to the root. This way
every node is guaranteed to have an up-to-date count.

Your situation is similar. You can add a descendants field containing
an array of descendant IDs. Using the built-in activerecord
serialization mechanism:

class Person < ActiveRecord::Base
serialize :descendants

end

Every time you create a new Person, all you need to do is

descendants = new_person.parent.descendants
descendants << new_person.parent.id

Of course, this will be a bit more complex if you want to include
dependants from multiple parents, but the idea is the same. Now every
person includes a list of descendant IDs, and you’ll need only a
single query.

– Mark.


#8

DAZ wrote:

parent and so forth). It might also lead to some very long strings
eventually!

It does require maintenance: whenever a name changes you have to
update the string in all children and all ancestors.

  if tree.size > 1
    1.upto(tree.size - 1) do |i|
      person = person.children.find { |child| child.name == tree

[i] }
end
end
@person = person

First-generation children are efficiently retrieved in both
better_nested_set and acts_as_tree via the parent key.
better_nested_set also allows you to efficiently retrieve all
generations of children using the all_children method.

So your current method will be making one DB call per
generation. And you should put the name match into the
SQL conditions rather than in a Ruby loop.

But have a look at the code in my last post. It can find
the correct Bart using only one DB call by matching
up the unique ancestors chain.


Rails Wheels - Find Plugins, List & Sell Plugins -
http://railswheels.com


#9

DAZ wrote:

I guess that keeping the ancestors as a string in the database makes
more sense than I first thought. It is interesting that Robert refers
to this as ‘caching’ the family-tree, which obviously it is, but I’ve
not really thought of caching information like this (ie database level
relationships) before. Consequently, as Mark says it needs to be
updated whenever a name changes, which shouldn’t be too difficult to
do (cycle through all the descendants and update that row?).

Yes, you could use a recursive (tree) or batch (nested-set) traverse.
Or could actually do it in one statement, like

os = sanitize("/#{old_name}/")
ns = sanitize("/#{new_name}/")
Person.update_all “key = replace(key, #{os}, #{ns})”,
[‘key like ?’, “/path/to/#{old_name}/%”]

This has the added advantage of acting as a sort of permalink row as
well. Thinking about pretty urls, it wouldn’t be too hard to use this
string to generate:

http://appname/people/abe/homer/bart

This would be the url that took you directly to Bart’s url. Could I
use the ‘/’ character as a separator at the DB level?

Yes, good idea.

Mark - I’m not sure what you mean by this:

you should put the name match into the
SQL conditions rather than in a Ruby loop.

I guess that you mean to use SQL rather than looping through ruby
arrays, but I’m afraid my SQL is very poor - I usually only use the
Rails helpers to do basic stuff. I’m struggling to follow your
example, but I guess that it corresponds to what Philip was alluding
to by saying to construct a joins string?

person.children.find(:conditions => [‘name = ?’, name])


Rails Wheels - Find Plugins, List & Sell Plugins -
http://railswheels.com


#10

Mark Reginald J. wrote:

os = sanitize("/#{old_name}/")
ns = sanitize("/#{new_name}/")

Whoops, you’d need to add the prefix /path/to on these
to ensure you only replace the name in the correct context.

Person.update_all “key = replace(key, #{os}, #{ns})”,
[‘key like ?’, “/path/to/#{old_name}/%”]


Rails Wheels - Find Plugins, List & Sell Plugins -
http://railswheels.com


#11

I forgot you wanted actual names instead of IDs, but the same
principle applies. You can serialize strings in the same fashion.
Also, in my previous post I said ‘descendants’, when really
‘ancestors’ is more appropriate.

– Mark.