Background process fork - generating zip files


#1

I am trying to generate a static archive (zip) as follows:

Model has following method

def generate_archive
dir = “/tmp/#{self.id}”
title = self.title.gsub(/^\s+/, ‘’).gsub(/\s+$/, ‘’).gsub(/\s+/,
‘-’)
id = self.id
host = “#{RAILS_ENV}_HOST”.upcase.constantize
url = url_for :host => host, :controller => :topics, :action =>
:show, :id => id
logger.info “Generating topic (#{title}) archive ‘#{dir}’.”
pid = fork do
wget --page-requisites --html-extension --convert-links --no-directories --recursive --level=1 -np --domains=#{host} --directory-prefix=#{dir} #{url};
mv #{dir}/#{id}.html #{dir}/index.html;
zip -mj #{dir}/#{title}.zip #{dir}/*
end
Process.detach(pid)
end

Controller is calling this method

@topic.generate_archive
send_file path, :type => ‘application/zip’

#### ### ### ### ####

  • The problem is when I click/call on this method for the first time
    then it does not work saying missing file. The zip file is not ready at
    that point.
  • If I go back and click again then it works fine and instantaneously
    delivers the zip file.
  • Is there anyway I can fix this? I tried putting sleep method,
    however its not working either. And what is causing this problem? I
    thought above method wouldn’t return until background jobs are complete.

Any clues?

Thanks,
CS.


#2

Anyone?

Carlos S. wrote:

I am trying to generate a static archive (zip) as follows:

Model has following method

def generate_archive
dir = “/tmp/#{self.id}”
title = self.title.gsub(/^\s+/, ‘’).gsub(/\s+$/, ‘’).gsub(/\s+/,
‘-’)


…`

#### ### ### ### ####

  • The problem is when I click/call on this method for the first time
    then it does not work saying missing file. The zip file is not ready at
    that point.
  • If I go back and click again then it works fine and instantaneously
    delivers the zip file.
  • Is there anyway I can fix this? I tried putting sleep method,
    however its not working either. And what is causing this problem? I
    thought above method wouldn’t return until background jobs are complete.

Any clues?

Thanks,
CS.


#3

On 12 Apr 2009, at 05:26, Carlos S. wrote:

thought above method wouldn’t return until background jobs are
complete.

If you want to wait for a process to complete you should be using
Process.wait. Though if you’re blocking on it like that what’s the
point of using fork ?

Fred


#4
  • How do I know when it is complete? Ideally, I would like to send zip
    file once the process is complete rather than giving arbitrary sleep
    time.

  • Is it possible to get an estimate of this?

Thanks,
CS.

Frederick C. wrote:

On 12 Apr 2009, at 05:26, Carlos S. wrote:

thought above method wouldn’t return until background jobs are
complete.

If you want to wait for a process to complete you should be using
Process.wait. Though if you’re blocking on it like that what’s the
point of using fork ?

Fred


#5

One way you could do it:

in your model meth:

def generate_archive
was_success = false

fname = “#{dir}/#{title}.zip”

prev_fsize = 0
10.times do # or some reasonable(?) max num times.
sleep 0.5
begin fsize = File.size(fname); rescue; fsize = 0; end
if prev_fsize > 0 and prev_fsize == fsize
was_success = true
break
end
prev_fsize = fsize
end
return was_success
end

and in your controller meth:

...
if @topic.generate_archive
  send_file path, :type => 'application/zip'
else
  # show some err msg ...
end
...

Note that the above assumes that the archiving process time (and
request volume) is short-enough from a user’s wait-time perspective
(and app handle-ability). However if that process takes too long (and/
or req volume is too high), then you’ll probably not want to continue
to test/wait for archiving process to complete before responding, but
instead return/redirect to a screen where the user (either app-driven
or the user-selected) tests/waits for the archiving process to
complete.

Jeff

On Apr 13, 4:13 pm, Carlos S. removed_email_address@domain.invalid


#6

Any help please?

Thanks,
CS.

Carlos S. wrote:

  • How do I know when it is complete? Ideally, I would like to send zip
    file once the process is complete rather than giving arbitrary sleep
    time.

  • Is it possible to get an estimate of this?

Thanks,
CS.

Frederick C. wrote:

On 12 Apr 2009, at 05:26, Carlos S. wrote:

thought above method wouldn’t return until background jobs are
complete.

If you want to wait for a process to complete you should be using
Process.wait. Though if you’re blocking on it like that what’s the
point of using fork ?

Fred


#7

Are you seeing this behavior while running your rails app under a
typical development environment setup, ie a single mongrel instance of
the dev env running? If so, then the blocking/hanging is due to the
fact that your archiving meth is trying to make an httpclient call
(using wget) back into your single-thread/-process rails app which can
only handle one request at a time.

If this is the case, you’ll need to setup/run a second instance of
your dev env (or launch your archiving meth via script/runner, or
setup your dev env to run under mod_rails/passenger, or …)
specifically to handle your archiving httpclient requests.

Jeff

On Apr 21, 5:29 am, Carlos S. removed_email_address@domain.invalid


#8

Thanks for the insight.
Yes, it is a dev. environment with a single mongrel server (if thats
what you mean by single instance).
But, doesn’t mongrel handle HTTP requests in a multi-threaded manner?

I tried to move this code to my controllers and had another problem:
http://www.ruby-forum.com/topic/185134

Any clues how does the rails handle model-controller code? Any
suggestions on placement of code to get optimum performance from rails
framework in addition/parallel to MVC arch. style?

CS.

Jeff B.systems wrote:

Are you seeing this behavior while running your rails app under a
typical development environment setup, ie a single mongrel instance of
the dev env running? If so, then the blocking/hanging is due to the
fact that your archiving meth is trying to make an httpclient call
(using wget) back into your single-thread/-process rails app which can
only handle one request at a time.

If this is the case, you’ll need to setup/run a second instance of
your dev env (or launch your archiving meth via script/runner, or
setup your dev env to run under mod_rails/passenger, or …)
specifically to handle your archiving httpclient requests.

Jeff

On Apr 21, 5:29�am, Carlos S. removed_email_address@domain.invalid


#9

Thank you Jeff.

How can I do it without blocking any other requests? I tried to do this
multi-threading with no success (got into infinite loop condition).

Also, I read that while running background system commands it is good
practice to start new fork rather than a new thread. This gives us more
control over the code.

Is there anyway to run this background process without blocking other
requests and also get to know it’s completion status?

Any help appreciated.

Thanks,
CS.

Jeff B.systems wrote:

One way you could do it:

in your model meth:


#10

If you have a single instance of mongrel running your rails app, then
it is essentially “single threaded” when it comes to handling requests
(see comments in
http://weblog.rubyonrails.org/2006/5/18/interview-with-mongrel-developer-zed-shaw
or http://mongrel.rubyforge.org/wiki/FAQ or …).

Like I mentioned before, you’ll need to setup a second instance of
mongrel to specifically handle those recursive httpclient calls back
into your app under dev env. If you decide to go this route, you’ll
likely want to dev a ruby (or capistrono) script for start/stop/
restart/status of the two dev env mongrel instances: one for your main
dev env running on port 3000(?), and one for your internal httpclient
calls running on port 3001(?), and then mod your internal httpclient
call code to hit port 3001 instead of 3000.

Another alternative would be to setup your dev env to run via
mod_rails/passenger, something like
http://accidentaltechnologist.com/ruby/replicating-rails-project-setup-on-development/
and the single-threaded-blocking issue goes away.

Jeff

On Apr 23, 10:07 am, Carlos S. removed_email_address@domain.invalid


#11

From that first link I provided: “… dizave: Are you guys seeing
performance problems with the fact that rails controllers are
essentially forced to single-threaded under mongrels? … Zed A.
Shaw: dizave, just wanted to clarify that Rails is always single
threaded, even under FastCGI, SCGI, and WEBrick. Under Mongrel it’s
just exposed to you and isn’t a dirty little secret. …”

Trust me, your wget call is being blocked, because it’s wating for the
orig request to generate_archive to finish (which itself is waiting
for your wget call to finish, which it won’t until it times-out) so
that it can be handled by your single instance mongrel/rails setup.
This is why you need another instance of your rails app running to get
around this blocking problem. So either fire up a second mongrel dev
env instance on a diff port and use that for your internal wget calls,
or setup your dev env to be served via passenger, or …

As for the problems you’re having with your archiving process, …
If it were me, I’d pull out that code into a separate class, that is
callable via a class method into which you pass the necessary params
to complete the archiving process, something like:

class Archiver

def Archiver.gen_archive(....)
  ....
end

end

This way you can easily test your archiving process via unit tests,
console, and/or runner ($ ./script/runner ‘puts Archiver.gen_archive
(…)’ ), to ensure that it works properly. Once you know it’s
working correctly, then you can mod your controller code accordingly.

Jeff

On Apr 24, 5:48 pm, Carlos S. removed_email_address@domain.invalid


#12

Carlos S. wrote:

However, it is blocking other http requests (if I try to access my
application from a browser, then it times out or waits forever).
So it is not processing other requests. This could be because server has
reached max. number of connections… (just one possibility).

Rails does not support multithreading and will only handle 1 request in
1 thread at a time
Mongrel supports multithreading and will handle multiple requests in
multiple threads.

Background threads and forks are not dependant on the webserver and are
always supported.

title = self.title.gsub(/^\s+/, '').gsub(/\s+$/, '').gsub(/\s+/, 
 #`mv #{dir}/#{id}.html #{dir}/index.html;`
  `zip -mj #{dir}/#{title}.zip #{dir}/*;`
end
Process.detach(pid)

end


Any clues?

What do you mean by wget seems to make infinite calls to the server?
Passenger should be no different than mongrel.

Anyway, if you are going to do a lot of zipping and wgetting of large
websites perhaps you should have a look at starling and working.

Have a look at this railscast that explains how to set this up:
http://railscasts.com/episodes/128-starling-and-workling

Using this setup makes it easy to handle many large background tasks
while preventing the server from being overloaded by running them from a
queue instead of simultaneously when there are many requests.

/Morgan


#13

Carlos S. wrote:

However, it is blocking other http requests (if I try to access my
application from a browser, then it times out or waits forever).
So it is not processing other requests. This could be because server has
reached max. number of connections… (just one possibility).

Rails does not support multithreading and will only handle 1 request in
1 thread at a time
Mongrel supports multithreading and will handle multiple requests in
multiple threads.

Background threads and forks are not dependant on the webserver and are
always supported.

title = self.title.gsub(/^\s+/, '').gsub(/\s+$/, '').gsub(/\s+/, 
 #`mv #{dir}/#{id}.html #{dir}/index.html;`
  `zip -mj #{dir}/#{title}.zip #{dir}/*;`
end
Process.detach(pid)

end


Any clues?

What do you mean by wget seems to make infinite calls to the server?
Passenger should be no different than mongrel.

Anyway, if you are going to do a lot of zipping and wgetting of large
websites perhaps you should have a look at starling and working.

Have a look at this railscast that explains how to set this up:
http://railscasts.com/episodes/128-starling-and-workling

Using this setup makes it easy to handle many large background tasks
while preventing the server from being overloaded by running them from a
queue instead of simultaneously when there are many requests.

/Morgan


#14

I am really confused now.
The FAQ says - Mongrel uses one thread per request.
So it can handle multiple requests (not surprising).

What you are suggesting is that my archiving method is trying to make
another http req. call within same http request?

I can see the wget requests in the mongrel (development.log). Clearly,
its not blocking these requests.

However, it is blocking other http requests (if I try to access my
application from a browser, then it times out or waits forever).
So it is not processing other requests. This could be because server has
reached max. number of connections… (just one possibility).

I tried Passenger and it is really cool. However, there seems to be some
serious problem with my archiving code. When I run my app. using
passenger and try archiving method then, system slows down and I had to
reboot it forcefully.
The wget seems to make infinite calls to the server.

I am posting my archiving code for ref.:

def generate_archive
dir = “/tmp/topicsys/#{self.id}”
title = self.title.gsub(/^\s+/, ‘’).gsub(/\s+$/, ‘’).gsub(/\s+/,
‘-’)
id = self.id
host = “#{RAILS_ENV}_HOST”.upcase.constantize
url = url_for :host => host, :controller => :topics, :action =>
:show, :id => id
logger.info “Generating topic - (#{title}) archive ‘#{dir}’.”
pid = fork do
wget --page-requisites --html-extension --convert-links --no-directories --recursive --level=1 -np --directory- prefix=#{dir} #{url};
#mv #{dir}/#{id}.html #{dir}/index.html;
zip -mj #{dir}/#{title}.zip #{dir}/*;
end
Process.detach(pid)
end


Any clues?

Jeff B.systems wrote:

If you have a single instance of mongrel running your rails app, then
it is essentially “single threaded” when it comes to handling requests
(see comments in
http://weblog.rubyonrails.org/2006/5/18/interview-with-mongrel-developer-zed-shaw
or http://mongrel.rubyforge.org/wiki/FAQ or …).

Another alternative would be to setup your dev env to run via
mod_rails/passenger, something like
http://accidentaltechnologist.com/ruby/replicating-rails-project-setup-on-development/
and the single-threaded-blocking issue goes away.

Jeff