I’m kinda newbie in RoR yet and I’m having a hard time trying to figure
out how should I implement this. I’m writing an application to store and
display information about insects and their distribution. Currently I
have almost all functionality implemented, except for a very
important one: The application must be capable of “crawling” itself and
generate a zip archive for download. Actually, crawling itself isn’t
accurate enough, since the views must be sightly different (e.g. don’t
provide functionality not available without Internet connection,
indicate in the title that the page is an offline copy, etc).
The question is: Do you have any suggestions as to how I should
implement this?
One approach I had in mind (although I don’t know how to program it),
would be calling from the controller that triggers the archive
generation, all the publicly accessible controllers and each of them
providing a non-routable method which uses the offline templates. This
method would be an index-like one except that it will repeatedly use the
offline view for #show, rendering to string and storing in a stream.
Another approach, let the zip generation controller access the views
from all the resources and iterate over all model data, making a really
centralized and big controller.
And lastly, make all public controllers check for “/static/” to tell
them to use the offline templates and then use some self crawling by
iterating over all model data. (For some reason, trying to self-crawl
didn’t work for me in development even when using Thin which explicitly
advertises “>> Maximum connections set to 1024” when it boots up. The
problem would probably solve itself by using delayed job, though,
haven’t tested yet.)
All the controllers dealing with resources were created with “rails g
scaffold …”.
In all the “solutions” above (except maybe in the third method) are
missing a procedure to include all assets in the zip archive properly
(i.e. enumerate them all and store them with the correct file name).
I’ll be extremely grateful for any suggestions about how should I tackle
this problem!
A tip when searching for a RoR version of XYZ, Google something like
“XYZ
ruby rails github gem”. There is a gem to accomplish most
complex/mundane
tasks.
I would let something like Jekyll handle the static site creation. You
could then have a ruby script create a custom config file for each type
site or actually create another gem that handles that.
Have everything run in the background like so:
Request comes in for new static build.
Ruby script initializes and custom variables
Ruby script creates unique temp folder. Keep track of this folder so
it
can be logged and deleted.
Write log to database that process has started.
Ruby script runs jekyll command to create static site.
If no errors write to db log that process completed. Log error
otherwise
and notify admin via email.
Zip temp directory and notify db log process is complete and zip
ready.
Notify user via email or prompt that zip is ready.
Push zip to browser with new pretty file name.
Delete temp zip and directory. Log in db that file was successfully
delivered.
This process under a heavy server load can take a while. Use a
background
task gem that logs to a DB or roll your own.
Utilize your Linux server via bash as much as possible. You can control
your server via Ruby. The system in many instances will run tasks like
zipping way faster than Ruby. Use Cron for scheduled tasks and the “God”
gem for crash detection.
Use good error handling as much as possible and log all “states” so you
can
debug where issues arise.
The question is: Do you have any suggestions as to how I should
implement this?
My suggestion is I hope simple. Use wget to crawl/mirror the site, using
a query string parameter to indicate you want the “offline” views – you
still need to implement them if they are different enough – by checking
that the special parameter is set; you should be able to set it just
once for the session, and have wget use cookies to maintain the session
info.
Another alternative instead of the query string parm could be using the
user agent string wget sends, and always deliver the “offline” version
to that UA string.
The mirroring will pull all the urls that are included under the main
one. If your assets are not under that main url, this won’t work. You
can tell wget to pull from elsewhere, but it can easily get out of
hand.
It sounds like you might want to implement an offline app instead. There
is
browser support in most browsers for appcache to make all your assets
available offline, and there is the javascript call navigator.onLine
that
will tell you if you are offline or not. You’ll have to create and
maintain
an appcache file. There is a gem that handles that, but I ended up just
creating an appcache controller and serving it myself.
This is quite a fundamental architecture change, though, so you might be
too far along in development to do it. You could create a small sample
app
to prove the concept, then drag all your existing code into it.
EDIT I just realised this is not exactly what you want, but keeping
my post in here just in case it brings you onto other ideas
Just throwing in an idea, assuming you would want to download these for
documentation/printing purposes :
Why not generate pdf’s of all the entries on your website using a gem,
saving them to a specific folder, zip them up with another gem and then
force a download when entering a specific url/triggering a specific
action?
Here’s some information to get your adventure started: