Performance of find_page_by_url

The performance of find_by_url degrades because children don’t
automatically get a reference to their parent, but they need to get that
parent reference to compute their url. With a structure of:

  • home
    • articles
      • (600 articles here)

To find the nth article will take 2xn db calls - up to 1200 calls in
this case - each article will make a separate call to retrieve it’s
parent, then its parent’s parent. Not to mention the fact that all 600
of those articles are going to be read out by the call to ‘children’
anyway.

This is the current code (in mental) for find_by_url:

def find_by_url(url, live = true, clean = true)
url = clean_url(url) if clean
if (self.url == url) && (not live or published?)
self
else
children.each do |child|
if (url =~ Regexp.compile( ‘^’ + Regexp.quote(child.url))) and
(not child.virtual?)
found = child.find_by_url(url, live, clean)
return found if found
end
end
children.find(:first, :conditions => “class_name =
‘FileNotFoundPage’”)
end
end

This could be worked around by putting in:
children.each do |child|
child.parent = self
… etc …

That is a bit hacky though, and will restrict children from overriding
parent=()

Is there too much useless flexibility here? Do we need to be able to
have child pages that match urls not defined by their parent and slug
(things like archive_page are still just parent+slug)? Why can’t we use
something like:

def find_by_slug_path(slugs)
if child = children.find_by_slug(slugs[0])
if slugs.size == 0
return child
else
return child.find_by_slug_path(slugs[1…-1])
end
end
end

and the root Page.find_by_url:

def self.find_by_url(url)
root = find_by_parent_id(nil)
slugs = url.split(’/’).select {|x| x.size > 0}
root.find_by_slug_path(slugs)
end

That would only need to make a single call at each level - each call
returning a single page.

I meant to post that to radiant-core… Sorry to the peoples that are
scared by such things.

Daniel S. wrote:

parent, then its parent’s parent. Not to mention the fact that all 600
of those articles are going to be read out by the call to ‘children’
anyway.

Good point.

something like:

Archive child pages use parent + date + slug, which is why we need the
functionality.

That would only need to make a single call at each level - each call
returning a single page.

What about just caching the full URL in the pages table? That way we
could just do a single query to find the correct page. The disadvantage
to this approach is that whenever you save a page you will need to
update all of it’s children if the slug or page_class has changed.

I meant to post that to radiant-core…

This kind of discussion is best on the core list.


John L.
http://wiseheartdesign.com