Forum: Ruby on Rails background process fork - generating zip files

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Neubyr N. (Guest)
on 2009-04-12 08:26
I am trying to generate a static archive (zip) as follows:

#### Model has following method
def generate_archive
    dir = "/tmp/#{self.id}"
    title = self.title.gsub(/^\s+/, '').gsub(/\s+$/, '').gsub(/\s+/,
'-')
    id = self.id
    host = "#{RAILS_ENV}_HOST".upcase.constantize
    url = url_for :host => host, :controller => :topics, :action =>
:show, :id => id
    logger.info "Generating topic (#{title}) archive '#{dir}'."
    pid = fork do
     `wget --page-requisites --html-extension --convert-links
--no-directories --recursive --level=1 -np --domains=#{host}
--directory-prefix=#{dir} #{url};`
     `mv #{dir}/#{id}.html #{dir}/index.html;`
      `zip -mj #{dir}/#{title}.zip #{dir}/*`
    end
    Process.detach(pid)
  end

#### Controller is calling this method
@topic.generate_archive
send_file path, :type => 'application/zip'
###   ####   ###   ###   ###   ####   ###

 - The problem is when I click/call on this method for the first time
then it does not work saying missing file. The zip file is not ready at
that point.
 - If I go back and click again then it works fine and instantaneously
delivers the zip file.
 -  Is there anyway I can fix this? I tried putting sleep method,
however its not working either. And what is causing this problem? I
thought above method wouldn't return until background jobs are complete.

Any clues?

-
Thanks,
CS.
Neubyr N. (Guest)
on 2009-04-12 20:31
Anyone?

Carlos S. wrote:
> I am trying to generate a static archive (zip) as follows:
>
> #### Model has following method
> def generate_archive
>     dir = "/tmp/#{self.id}"
>     title = self.title.gsub(/^\s+/, '').gsub(/\s+$/, '').gsub(/\s+/,
> '-')
>....
.........
.........`

> ###   ####   ###   ###   ###   ####   ###
>
>  - The problem is when I click/call on this method for the first time
> then it does not work saying missing file. The zip file is not ready at
> that point.
>  - If I go back and click again then it works fine and instantaneously
> delivers the zip file.
>  -  Is there anyway I can fix this? I tried putting sleep method,
> however its not working either. And what is causing this problem? I
> thought above method wouldn't return until background jobs are complete.
>
> Any clues?
>
> -
> Thanks,
> CS.
Frederick C. (Guest)
on 2009-04-12 20:41
(Received via mailing list)
On 12 Apr 2009, at 05:26, Carlos S. wrote:
> thought above method wouldn't return until background jobs are
> complete.
>
If you want to wait for a process to complete you should be using
Process.wait. Though if you're blocking on it like that what's the
point of using fork ?

Fred
Neubyr N. (Guest)
on 2009-04-12 23:37
- How do I know when it is complete? Ideally, I would like to send zip
file once the process is complete rather than giving arbitrary sleep
time.
- Is it possible to get an estimate of this?

-
Thanks,
CS.

Frederick C. wrote:
> On 12 Apr 2009, at 05:26, Carlos S. wrote:
>> thought above method wouldn't return until background jobs are
>> complete.
>>
> If you want to wait for a process to complete you should be using
> Process.wait. Though if you're blocking on it like that what's the
> point of using fork ?
>
> Fred
Neubyr N. (Guest)
on 2009-04-14 03:13
Any help please?

Thanks,
CS.

Carlos S. wrote:
> - How do I know when it is complete? Ideally, I would like to send zip
> file once the process is complete rather than giving arbitrary sleep
> time.
> - Is it possible to get an estimate of this?
>
> -
> Thanks,
> CS.
>
> Frederick C. wrote:
>> On 12 Apr 2009, at 05:26, Carlos S. wrote:
>>> thought above method wouldn't return until background jobs are
>>> complete.
>>>
>> If you want to wait for a process to complete you should be using
>> Process.wait. Though if you're blocking on it like that what's the
>> point of using fork ?
>>
>> Fred
Jeff B. (Guest)
on 2009-04-14 22:12
(Received via mailing list)
One way you could do it:

  # in your model meth:
  def generate_archive
    was_success = false
    ...
    fname = "#{dir}/#{title}.zip"
    ...
    prev_fsize = 0
    10.times do     # or some reasonable(?) max num times.
      sleep 0.5
      begin fsize = File.size(fname); rescue; fsize = 0; end
      if prev_fsize > 0 and prev_fsize == fsize
        was_success = true
        break
      end
      prev_fsize = fsize
    end
    return was_success
  end

  # and in your controller meth:
    ...
    if @topic.generate_archive
      send_file path, :type => 'application/zip'
    else
      # show some err msg ...
    end
    ...

Note that the above assumes that the archiving process time (and
request volume) is short-enough from a user's wait-time perspective
(and app handle-ability).  However if that process takes too long (and/
or req volume is too high), then you'll probably not want to continue
to test/wait for archiving process to complete before responding, but
instead return/redirect to a screen where the user (either app-driven
or the user-selected) tests/waits for the archiving process to
complete.

Jeff

On Apr 13, 4:13 pm, Carlos S. <removed_email_address@domain.invalid>
Neubyr N. (Guest)
on 2009-04-21 16:29
Thank you Jeff.

How can I do it without blocking any other requests? I tried to do this
multi-threading with no success (got into infinite loop condition).

Also, I read that while running background system commands it is good
practice to start new fork rather than a new thread. This gives us more
control over the code.

Is there anyway to run this background process without blocking other
requests and also get to know it's completion status?

Any help appreciated.

Thanks,
CS.

Jeff B.systems wrote:
> One way you could do it:
>
>   # in your model meth:
Jeff B. (Guest)
on 2009-04-21 20:34
(Received via mailing list)
Are you seeing this behavior while running your rails app under a
typical development environment setup, ie a single mongrel instance of
the dev env running?  If so, then the blocking/hanging is due to the
fact that your archiving meth is trying to make an httpclient call
(using wget) back into your single-thread/-process rails app which can
only handle one request at a time.

If this is the case, you'll need to setup/run a second instance of
your dev env (or launch your archiving meth via script/runner, or
setup your dev env to run under mod_rails/passenger, or ...)
specifically to handle your archiving httpclient requests.

Jeff

On Apr 21, 5:29 am, Carlos S. <removed_email_address@domain.invalid>
Neubyr N. (Guest)
on 2009-04-23 21:07
Thanks for the insight.
Yes, it is a dev. environment with a single mongrel server (if thats
what you mean by single instance).
But, doesn't mongrel handle HTTP requests in a multi-threaded manner?

I tried to move this code to my controllers and had another problem:
http://www.ruby-forum.com/topic/185134

Any clues how does the rails handle model-controller code?  Any
suggestions on placement of code to get optimum performance from rails
framework in addition/parallel to MVC arch. style?

-
CS.


Jeff B.systems wrote:
> Are you seeing this behavior while running your rails app under a
> typical development environment setup, ie a single mongrel instance of
> the dev env running?  If so, then the blocking/hanging is due to the
> fact that your archiving meth is trying to make an httpclient call
> (using wget) back into your single-thread/-process rails app which can
> only handle one request at a time.
>
> If this is the case, you'll need to setup/run a second instance of
> your dev env (or launch your archiving meth via script/runner, or
> setup your dev env to run under mod_rails/passenger, or ...)
> specifically to handle your archiving httpclient requests.
>
> Jeff
>
> On Apr 21, 5:29�am, Carlos S. <removed_email_address@domain.invalid>
Jeff B. (Guest)
on 2009-04-25 01:00
(Received via mailing list)
If you have a single instance of mongrel running your rails app, then
it is essentially "single threaded" when it comes to handling requests
(see comments in
http://weblog.rubyonrails.org/2006/5/18/interview-...
or http://mongrel.rubyforge.org/wiki/FAQ or ....).

Like I mentioned before, you'll need to setup a second instance of
mongrel to specifically handle those recursive httpclient calls back
into your app under dev env.  If you decide to go this route, you'll
likely want to dev a ruby (or capistrono) script for start/stop/
restart/status of the two dev env mongrel instances: one for your main
dev env running on port 3000(?), and one for your internal httpclient
calls running on port 3001(?), and then mod your internal httpclient
call code to hit port 3001 instead of 3000.

Another alternative would be to setup your dev env to run via
mod_rails/passenger, something like
http://accidentaltechnologist.com/ruby/replicating...
and the single-threaded-blocking issue goes away.

Jeff

On Apr 23, 10:07 am, Carlos S. <removed_email_address@domain.invalid>
Neubyr N. (Guest)
on 2009-04-25 04:48
I am really confused now.
The FAQ says - Mongrel uses one thread per request.
So it can handle multiple requests (not surprising).

What you are suggesting is that my archiving method is trying to make
another http req. call within same  http request?

I can see the wget requests in the mongrel (development.log). Clearly,
its not blocking these requests.

However, it is blocking other http requests (if I try to access my
application from a browser, then it times out or waits forever).
So it is not processing other requests. This could be because server has
reached max. number of connections.. (just one possibility).

I tried Passenger and it is really cool. However, there seems to be some
serious problem with my archiving code. When I run my app. using
passenger and try archiving method then, system slows down and I had to
reboot it forcefully.
The wget seems to make infinite calls to the server.

I am posting my archiving code for ref.:
----------------
  def generate_archive
    dir = "/tmp/topicsys/#{self.id}"
    title = self.title.gsub(/^\s+/, '').gsub(/\s+$/, '').gsub(/\s+/,
'-')
    id = self.id
    host = "#{RAILS_ENV}_HOST".upcase.constantize
    url = url_for :host => host, :controller => :topics, :action =>
:show, :id => id
    logger.info "Generating topic - (#{title}) archive '#{dir}'."
    pid = fork do
     `wget --page-requisites --html-extension --convert-links
--no-directories --recursive --level=1 -np --directory-
prefix=#{dir} #{url};`
     #`mv #{dir}/#{id}.html #{dir}/index.html;`
      `zip -mj #{dir}/#{title}.zip #{dir}/*;`
    end
    Process.detach(pid)
  end

----------------

Any clues?




Jeff B.systems wrote:
> If you have a single instance of mongrel running your rails app, then
> it is essentially "single threaded" when it comes to handling requests
> (see comments in
> http://weblog.rubyonrails.org/2006/5/18/interview-...
> or http://mongrel.rubyforge.org/wiki/FAQ or ....).
>
>
> Another alternative would be to setup your dev env to run via
> mod_rails/passenger, something like
> http://accidentaltechnologist.com/ruby/replicating...
> and the single-threaded-blocking issue goes away.
>
> Jeff
>
Jeff B. (Guest)
on 2009-04-25 06:28
(Received via mailing list)
From that first link I provided:  "... dizave: Are you guys seeing
performance problems with the fact that rails controllers are
essentially forced to single-threaded under mongrels? ...  Zed A.
Shaw: dizave, just wanted to clarify that Rails is always single
threaded, even under FastCGI, SCGI, and WEBrick. Under Mongrel it’s
just exposed to you and isn’t a dirty little secret. ..."

Trust me, your wget call is being blocked, because it's wating for the
orig request to generate_archive to finish (which itself is waiting
for your wget call to finish, which it won't until it times-out) so
that it can be handled by your single instance mongrel/rails setup.
This is why you need another instance of your rails app running to get
around this blocking problem.  So either fire up a second mongrel dev
env instance on a diff port and use that for your internal wget calls,
or setup your dev env to be served via passenger, or ....

As for the problems you're having with your archiving process, ....
If it were me, I'd pull out that code into a separate class, that is
callable via a class method into which you pass the necessary params
to complete the archiving process, something like:

  class Archiver

    def Archiver.gen_archive(....)
      ....
    end
  end

This way you can easily test your archiving process via unit tests,
console, and/or runner ($ ./script/runner 'puts Archiver.gen_archive
(...)' ), to ensure that it works properly.  Once you know it's
working correctly, then you can mod your controller code accordingly.

Jeff

On Apr 24, 5:48 pm, Carlos S. <removed_email_address@domain.invalid>
Morgan C. (Guest)
on 2009-04-26 05:37
(Received via mailing list)
Carlos S. wrote:
> However, it is blocking other http requests (if I try to access my
> application from a browser, then it times out or waits forever).
> So it is not processing other requests. This could be because server has
> reached max. number of connections.. (just one possibility).
>
Rails does not support multithreading and will only handle 1 request in
1 thread at a time
Mongrel supports multithreading and will handle multiple requests in
multiple threads.

Background threads and forks are not dependant on the webserver and are
always supported.
>     title = self.title.gsub(/^\s+/, '').gsub(/\s+$/, '').gsub(/\s+/,
>      #`mv #{dir}/#{id}.html #{dir}/index.html;`
>       `zip -mj #{dir}/#{title}.zip #{dir}/*;`
>     end
>     Process.detach(pid)
>   end
>
> ----------------
>
> Any clues?
>

What do you mean by wget seems to make infinite calls to the server?
Passenger should be no different than mongrel.

Anyway, if you are going to do a lot of zipping and wgetting of large
websites perhaps you should have a look at starling and working.

Have a look at this railscast that explains how to set this up:
http://railscasts.com/episodes/128-starling-and-workling

Using this setup makes it easy to handle many large background tasks
while preventing the server from being overloaded by running them from a
queue instead of simultaneously when there are many requests.

/Morgan
Morgan C. (Guest)
on 2009-04-26 12:38
(Received via mailing list)
Carlos S. wrote:
> However, it is blocking other http requests (if I try to access my
> application from a browser, then it times out or waits forever).
> So it is not processing other requests. This could be because server has
> reached max. number of connections.. (just one possibility).
>
Rails does not support multithreading and will only handle 1 request in
1 thread at a time
Mongrel supports multithreading and will handle multiple requests in
multiple threads.

Background threads and forks are not dependant on the webserver and are
always supported.

>     title = self.title.gsub(/^\s+/, '').gsub(/\s+$/, '').gsub(/\s+/,
>      #`mv #{dir}/#{id}.html #{dir}/index.html;`
>       `zip -mj #{dir}/#{title}.zip #{dir}/*;`
>     end
>     Process.detach(pid)
>   end
>
> ----------------
>
> Any clues?
>

What do you mean by wget seems to make infinite calls to the server?
Passenger should be no different than mongrel.

Anyway, if you are going to do a lot of zipping and wgetting of large
websites perhaps you should have a look at starling and working.

Have a look at this railscast that explains how to set this up:
http://railscasts.com/episodes/128-starling-and-workling

Using this setup makes it easy to handle many large background tasks
while preventing the server from being overloaded by running them from a
queue instead of simultaneously when there are many requests.

/Morgan
This topic is locked and can not be replied to.