Facets branch extensibiliy extended

On 23/02/2007, at 1:51 PM, Daniel S. wrote:

a) check for the existance of /cache/.yml
b) check for the validity of /cache/.yml
if file exists and is valid:
c) send /cache/.data

else
d) Call Page.find_by_url to find the page
e) call the page.headers method to set the page headers
f) call the page.render method to render the page
content

Daniel,
What I was suggesting wasn’t that Page.find_by_url be called on every
page request… The original email was about changes to the caching
mechanism…
My suggestion, working from your outline:
a) check for the existance of /cache/.yml
b) check for the validity of /cache/.yml
if file exists and is valid:
c) call the page.headers method to set the page headers
d) send /cache/.data

else
e) Call Page.find_by_url to find the page
f) call the page.headers method to set the page headers
g) call the page.render method to render the page content

I was saying that the headers could be sent on every request. This
allows ACLs to work for the main site pages… at the moment (as I
understand it) the headers - so this includes session info - are
cached and only update once per 5 minutes… This is unworkable if
you want conditional content -
ie if user registered
show registered_user_page
else
show login_page

I certainly wasn’t saying that page caching wasn’t needed, and can
even see how powerful the radiant caching is since it essentially
turns the site into static content, once cached. But caching headers
doesn’t seem workable in the long-term - or will need to be at least
an option in the caching mechanism in order to implement more dynamic
behaviour. Of course this then opens up the site to all kinds of
dynamic content and caching by page url would not work… since two
users could potentially have different pages with the same url… but
it at least allows session data to be used on a page by page basis…
Perhaps caching page parts would work?

Anyway this is all stuff that will have to be worked out if blogging
is going to be possible with Radiant…

Cheers,
Adam

OK. Point taken. I was really just putting my two cents in on a
subject that’s important to me personally… ie user variable content
and ACLs.

It would be nice to do something like (I haven’t actually looked at
the code, my bad):
page = Page.new
page.headers
page.content = Cache.get_content(’/cache/’+cache_file)
page.render

Anyway I’d better have a look at the code myself before I make a/more
of a fool of myself.

Cheers,

I too have a need to implement a system that prevents displaying pages
unless the user is authenticated. I will need levels of authentication
and have more than just a handful of users.

Smells like sessions to me but do I understand correctly that radiant’s
caching prevents this?

I’m not even smart enough to know what would be required to make a user
authentication system play nicely with radiant. Thoughts/suggestions?

-Chris

OK. Point taken. I was really just putting my two cents in on a
subject that’s important to me personally… ie user variable content
and ACLs.

It would be nice to do something like (I haven’t actually looked
at the code, my bad):
page = Page.new
page.headers
page.content = Cache.get_content(’/cache/’+cache_file)
page.render

Anyway I’d better have a look at the code myself before I make
a/more of a fool of myself.

I wasn’t trying to make a fool out of you - I completely understand what
you’re trying to achieve, but asking the pages to handle authentication
isn’t the way to go about it unless you’re prepared to totally kill the
caching performance.

The way to implement this would probably be to introduce a filter to the
site controller that does ACL checks based on paths - check the ACL for
the path before doing anything else that site_controller does. Introduce
a new LoginPage type that is not cached that is responsible for setting
the authentication cookies. Rails doesn’t give you as much control over
session creation as I’m used to in java servlets, so you’ll have to take
over session control in your own code - You need to have your filter
check for the existance of a session, but not create one if it doesn’t
exist.

Hmmmm… does rails give you access to the session cookie before it gets
sent off in the response? Perhaps the site_controller could be changed
to clear out that cookie if the request is going to be cached - giving a
warning / raising an exception if the session is not empty. You’d still
need a seperate layer of access control, that slots in before the cache
handler, but if you could use rails sessions instead of rolling your own
that would be useful. I’ll have a little play.

Dan.

Maybe I need to dump my assumptions for a moment. Knowing the page that
we’re about to be rendering from the cache would be a useful thing. Why
can’t we do it? Because the performance of Page.find_by_url sucks.

I talked a little bit about that suckage a while ago, and somebody
(Adam? Sean?) suggested having a table to directly find lookup a page
from it’s url from the database. I was originally dismissive, but the
idea is starting to grow on me.

  • Create a page_urls table
  • Whenever the slug or type of a page changes, that page and all of its
    children (recursively) would update the page_urls table with their
    current url(s).

If a page isn’t found using a direct lookup, a regular Page.find_by_url
call would be made, so that we can still maintain all the flexibility
that provides.

Advantages of this system are:

  • faster to lookup pages by url (though I don’t know if it will be fast
    enough that I’d want it before the cache check)

  • all children of a page could be discovered recursively through:
    PageUrl.find(:all, :include => :page, :conditions => ‘url like ?’,
    “#{parent_url}%”)

  • old urls could be left in the table - automatically giving ‘cool urls’

  • unless another page comes to replace it or the page is no longer
    published, a page is always available through it’s old address. Old urls
    would probably get a marker and redirect to the real urls of a page.

Disadvantages:

  • Database isn’t third normal form (the url is a calculated field, not
    raw data), makes my inner database purist feel queasy.

  • Long save times when slug/type changes

I should probably do some other things instead, but I might have a look
at this.

Dan.

Right. Caching the complete url in the database would help, but would
not work in all cases, i.e. virtual pages that represent multiple urls.
Perhaps we should try this and get some metrics?

Sean

Hmmmm… does rails give you access to the session cookie before it
gets sent off in the response? Perhaps the site_controller could be
changed to clear out that cookie if the request is going to be cached

giving a warning / raising an exception if the session is not empty.
You’d still need a seperate layer of access control, that slots in
before the cache handler, but if you could use rails sessions instead
of rolling your own that would be useful. I’ll have a little play.

Phew. Just had a look at how rails does sessions. Not very pretty -
session handling is actually done by CGI::Session.

It would probably be possible to get rid of session :off and get the
same behaviour with:

def show_uncached_page(url)
  @page = find_page(url)
  unless @page.nil?
    @page.process(request, response)
if live? and @page.cache?
      @cache.cache_response(url, response)
    else
      response.instance_eval { @cgi.instance_eval {

@output_cookies.clear }}
end
@performed_render = true
else
render :template => ‘site/not_found’, :status => 404
end
rescue Page::MissingRootPageError
redirect_to welcome_url
end

But that’s really messy. Don’t like it one bit.

I’d suggest instead:

class ProtectedSiteController < SiteController
def show_page
url = params[:url] = “/protected/#{params[:url]}”
if is_allowed(url)
super
else
# do something else
end
end

private
def is_allowed(url)
#test something
end
end
end

And in your extension:

define_routes do |map|
protected.connect ‘/protected/*url’, :action => ‘show_page’,
:controller => ‘protected’
end

Give it a go.

Right. Caching the complete url in the database would help,
but would
not work in all cases, i.e. virtual pages that represent
multiple urls.
Perhaps we should try this and get some metrics?

I was thinking that there would be a Page#calculate_urls method that
would by default just return the result of Page#url but could be
overridden to return multiple urls (in the case where a page knows all
its urls) - otherwise there’s the fallback to doing the tree search.

I just did one quick metric - on my system, with ~900 pages, doing an
update of the url cache for all pages 50 times took ~19 seconds - that’s
~.4ms per page. Scaling up, that’s 4 seconds if you modify a page with
10,000 children… I think that would be a reasonable response time…
so the disadvantage at that end seems reasonable - just need to find out
if the advantages are worth it.

Dan.