Asynchronous http POST?

Hey everyone, I’m new to Ruby and to the mailing list, so go easy.
Basically, I have to POST to a certain url, then I wait for a response.
The catch is that I have to do this to two urls at once. Both of them
may respond to me almost instantly, or they may take up to 10 seconds to
respond. I need to have a post to both of these running at all times to
catch incoming events. I will also need to post to other urls at the
same time that these are running. So, I need to find a way to run these
two posts in the background constantly. From what I’ve read, ruby
threads will hang on a command like this, since the interpreter does not
have control. Can anyone help (or understand) me?

Thanks,
Ivan

Ezra Z. wrote:

two posts in the background constantly. From what I’ve read, ruby

Ivan-

This is a perfect job for eventmachine and em-http-request. You can 

run as many async http requests as you want without blocking and handle
the results with callback blocks.

In small scale cases (such as a simple client) is there any reason not
to use threads? EM just seems like overkill for a fairly simple client.

On Sep 9, 2009, at 10:15 PM, Ivan S. wrote:

these
two posts in the background constantly. From what I’ve read, ruby
threads will hang on a command like this, since the interpreter does
not
have control. Can anyone help (or understand) me?

Thanks,
Ivan

Posted via http://www.ruby-forum.com/.

Ivan-

This is a perfect job for eventmachine and em-http-request. You can
run as many async http requests as you want without blocking and
handle the results with callback blocks.

Cheers-

Ezra Z.
[email protected]

Joel VanderWerf wrote:

Ezra Z. wrote:

two posts in the background constantly. From what I’ve read, ruby

Ivan-

This is a perfect job for eventmachine and em-http-request. You can 

run as many async http requests as you want without blocking and handle
the results with callback blocks.

In small scale cases (such as a simple client) is there any reason not
to use threads? EM just seems like overkill for a fairly simple client.

Apparently, since control is not returned to the interpreter, when one
thread waits the other(s) will not continue. At least that’s my
understanding.

On Thursday 10 September 2009 15:48:56 Ivan S. wrote:

Apparently, since control is not returned to the interpreter, when one
thread waits the other(s) will not continue. At least that’s my
understanding.

A quick test seems to show that isn’t the case. I wrote a simple
webrick
servlet that accepts a post request and delays for a specified amount of
time
(from the delay parameter to the post), and a client with 2 threads that
post
to those URLs and keep track of when things start and end:

delay_servlet.rb:
require ‘webrick’
require ‘time’

class DelayServlet < WEBrick::HTTPServlet::AbstractServlet
def do_POST(request, response)
start_time = Time.now
delay = 0
if request.query[“delay”]
delay = request.query[“delay”].to_i
end

sleep(delay)

end_time = Time.now
response.body = "delayed for #{delay}s, started at " +
  "#{start_time.iso8601}, ended at #{end_time.iso8601}\n"

end
end

if FILE == $0
server = WEBrick::HTTPServer.new(:Port => 8000)
server.mount(“/”, DelayServlet)

trap(“INT”) {server.shutdown}
server.start
end

delay_client.rb:
require ‘net/http’
require ‘time’

if FILE == $0
puts “Main thread start at #{Time.now.iso8601}”

t1 = Thread.new do
puts “Thread 1 start at #{Time.now.iso8601}”
res = Net::HTTP.post_form(URI.parse(‘http://localhost:8000/’),
{‘delay’=>‘5’})
puts "Response: " + res.body
puts “Thread 1 end at #{Time.now.iso8601}”
end

t2 = Thread.new do
puts “Thread 2 start at #{Time.now.iso8601}”
res = Net::HTTP.post_form(URI.parse(‘http://localhost:8000/’),
{‘delay’=>‘7’})
puts "Response: " + res.body
puts “Thread 2 end at #{Time.now.iso8601}”
end

t1.join
t2.join
puts “Main thread end at #{Time.now.iso8601}”
end

Output:
Main thread start at 2009-09-10T16:46:17-04:00
Thread 1 start at 2009-09-10T16:46:17-04:00
Thread 2 start at 2009-09-10T16:46:17-04:00
Response: delayed for 5s, started at 2009-09-10T16:46:17-04:00, ended at
2009-09-10T16:46:22-04:00
Thread 1 end at 2009-09-10T16:46:22-04:00
Response: delayed for 7s, started at 2009-09-10T16:46:17-04:00, ended at
2009-09-10T16:46:24-04:00
Thread 2 end at 2009-09-10T16:46:24-04:00
Main thread end at 2009-09-10T16:46:24-04:00

So it sure looks like it isn’t blocking all threads when waiting for a
HTTP
response.

Ben

Ben G. wrote:

On Thursday 10 September 2009 15:48:56 Ivan S. wrote:

Apparently, since control is not returned to the interpreter, when one
thread waits the other(s) will not continue. At least that’s my
understanding.

A quick test seems to show that isn’t the case. I wrote a simple
webrick
servlet that accepts a post request and delays for a specified amount of
time
(from the delay parameter to the post), and a client with 2 threads that
post
to those URLs and keep track of when things start and end:

delay_servlet.rb:
require ‘webrick’
require ‘time’

class DelayServlet < WEBrick::HTTPServlet::AbstractServlet
def do_POST(request, response)
start_time = Time.now
delay = 0
if request.query[“delay”]
delay = request.query[“delay”].to_i
end

sleep(delay)

end_time = Time.now
response.body = "delayed for #{delay}s, started at " +
  "#{start_time.iso8601}, ended at #{end_time.iso8601}\n"

end
end

if FILE == $0
server = WEBrick::HTTPServer.new(:Port => 8000)
server.mount(“/”, DelayServlet)

trap(“INT”) {server.shutdown}
server.start
end

delay_client.rb:
require ‘net/http’
require ‘time’

if FILE == $0
puts “Main thread start at #{Time.now.iso8601}”

t1 = Thread.new do
puts “Thread 1 start at #{Time.now.iso8601}”
res = Net::HTTP.post_form(URI.parse(‘http://localhost:8000/’),
{‘delay’=>‘5’})
puts "Response: " + res.body
puts “Thread 1 end at #{Time.now.iso8601}”
end

t2 = Thread.new do
puts “Thread 2 start at #{Time.now.iso8601}”
res = Net::HTTP.post_form(URI.parse(‘http://localhost:8000/’),
{‘delay’=>‘7’})
puts "Response: " + res.body
puts “Thread 2 end at #{Time.now.iso8601}”
end

t1.join
t2.join
puts “Main thread end at #{Time.now.iso8601}”
end

Output:
Main thread start at 2009-09-10T16:46:17-04:00
Thread 1 start at 2009-09-10T16:46:17-04:00
Thread 2 start at 2009-09-10T16:46:17-04:00
Response: delayed for 5s, started at 2009-09-10T16:46:17-04:00, ended at
2009-09-10T16:46:22-04:00
Thread 1 end at 2009-09-10T16:46:22-04:00
Response: delayed for 7s, started at 2009-09-10T16:46:17-04:00, ended at
2009-09-10T16:46:24-04:00
Thread 2 end at 2009-09-10T16:46:24-04:00
Main thread end at 2009-09-10T16:46:24-04:00

So it sure looks like it isn’t blocking all threads when waiting for a
HTTP
response.

Ben

Sure looks like you’re right. Here’s where I got that idea in my head:

http://www.rubycentral.com/pickaxe/tut_threads.html

“”"
Multithreading

Often the simplest way to do two things at once is by using Ruby
threads. These are totally in-process, implemented within the Ruby
interpreter. That makes the Ruby threads completely portable—there is
no reliance on the operating system—but you don’t get certain benefits
from having native threads. You may experience thread starvation (that’s
where a low-priority thread doesn’t get a chance to run). If you manage
to get your threads deadlocked, the whole process may grind to a halt.
(!!!) And if some thread happens to make a call to the operating system
that takes a long time to complete, all threads will hang until the
interpreter gets control back. (!!!) However, don’t let these
potential problems put you off—Ruby threads are a lightweight and
efficient way to achieve parallelism in your code.
“”"

(Sorry, I’m unsure if I’m allowed to use html tags or anything here, but
I think this will do. Looks like the faq link is broken.) Is this a
blatant lie? Maybe someone can explain to me what is actually being
referred to?

Thanks,
Ivan

Ezra Z. wrote:

On Sep 9, 2009, at 10:15 PM, Ivan S. wrote:

these
two posts in the background constantly. From what I’ve read, ruby
threads will hang on a command like this, since the interpreter does
not
have control. Can anyone help (or understand) me?

Thanks,
Ivan

Posted via http://www.ruby-forum.com/.

Ivan-

This is a perfect job for eventmachine and em-http-request. You can
run as many async http requests as you want without blocking and
handle the results with callback blocks.

GitHub - igrigorik/em-http-request: Asynchronous HTTP Client (EventMachine + Ruby)

Cheers-

Ezra Z.
[email protected]

I couldn’t seem to get this running with threads, so I’m trying
eventmachine. I can get a single post to run fine with callback, but
what do I have to do to get continuous posts running? I need to have a
post to the site going at all times, while handling the responses.
Documentation/examples seem very hard to find. A decent em-http-request
tutorial would be great.

Ivan S. wrote:

where a low-priority thread doesn’t get a chance to run). If you manage
to get your threads deadlocked, the whole process may grind to a halt.
(!!!) And if some thread happens to make a call to the operating system
that takes a long time to complete, all threads will hang until the
interpreter gets control back. (!!!) However, don’t let these
potential problems put you off—Ruby threads are a lightweight and
efficient way to achieve parallelism in your code.
“”"

Here’s my relatively naive understanding of the situation (for MRI,
1.8):

System calls will block all threads, except in a few cases. The
exceptions include:

  1. Waiting on IO. Ruby’s threads are really an abstraction over a single
    native thread calling select() on all the file descriptors that the ruby
    threads are waiting on. When a fd is ready for reading, say, the native
    thread starts executing the ruby thread that was waiting for that fd.

  2. Starting processes and waiting for them to finish. This is why

Thread.new { system “long-running process” }

is a useful idiom (and it even works on windows).

Still, if you expect a lot of threads, EM will probably be much more
efficient instead.

But many other system calls (#flock without the nonblock flag, for
example) will block all ruby threads.

Ivan S. wrote:

I couldn’t seem to get this running with threads, so I’m trying
eventmachine.

I think EM is overkill here. The following example uses PUT not POST,
but I’m sure you’ll be able to adapt it.

require ‘net/http’
require ‘uri’

Configuration variables:

THREAD_COUNT = 10
REQUESTS_PER_THREAD = 10
FILENAME = ‘file_to_put’
URL = ‘http://localhost/DropBox/file_to_put

Put a data string to the specified url:

def urlput(url, data)
begin
uri = URI.parse(url)
response = nil
value = nil
Net::HTTP.start(uri.host) { |http|
response, value = http.put(uri.path, data, nil)
}
p response.message if (response.code.to_i >= 300)
rescue => e
p e
end
value
end

Read the file to put:

data = File.new(FILENAME).read

start = Time.now
$threads = []
(1…THREAD_COUNT).each {|thread|
$threads << Thread.new(thread) { |thread_no|
(1…REQUESTS_PER_THREAD).each {
urlput(URL, data)
}
}
}
$threads.each { |aThread| aThread.join }
puts “#{THREAD_COUNT*REQUESTS_PER_THREAD} requests completed in
#{Time.now - start} seconds”

Clifford H…

On Thu, Sep 10, 2009 at 1:48 PM, Ivan S. [email protected]
wrote:

Apparently, since control is not returned to the interpreter, when one
thread waits the other(s) will not continue. At least that’s my
understanding.

In MRI, you can do multiplex I/O across threads, however the code that
implements this will make your eyes bleed (eval.c)

On Sep 12, 2009, at 9:15 PM, Clifford H. wrote:

Ivan S. wrote:

I couldn’t seem to get this running with threads, so I’m trying
eventmachine.

I think EM is overkill here.

I disagree that EM is overkill here. EM is not a heavyweight library
and does a much better job of this type of http async stuff then
threads and net/http does that EM really should be the preferred way
of doing something like this.

require ‘eventmachine’

def make_request(site=‘http://www.website.com/’, body={})
http = EventMachine::HttpRequest.new(site).post :body => body
http.errback { p ‘request failed’ }
http.callback {
p http.response_header.status
p http.response_header
p http.response
}
end

EM.run do

make a request every 1 second

EM.add_periodic_timer(1) do
make_request “http://foo.com/”, :param => ‘hi’, :param2 => ‘there’
end
end

look ma, no threads but I still get full async network concurrent IO.

Cheers-

Ezra Z.
[email protected]

Eric, it’s great that you thought about this as I’m currently stuck on
this.

However, your solution won’t work. The http.start triggers the
Net::HTTP.start method which can take quite a while to complete. In
fact, it will take much longer than the actual request in cases where
the host wasn’t queried for some time (and thus not cached).

Ya, obviously this doesn’t parallelize the connect, just the request.
Unless you’re doing SSL, the only blocking thing Net::HTTP.connect does
is the underlying TCPSocket.open. If that is your bottleneck and you’ve
already set open_timeout as low as you can go, then you’d have to patch
deeper to get Net::HTTP to use connect_nonblock as per
http://www.ruby-doc.org/core/classes/Socket.html#M002091 instead of
TCPSocket.open

Jaap H. wrote:

Eric, it’s great that you thought about this as I’m currently stuck on
this.

However, your solution won’t work. The http.start triggers the
Net::HTTP.start method which can take quite a while to complete. In
fact, it will take much longer than the actual request in cases where
the host wasn’t queried for some time (and thus not cached).

I’m not sure why ruby doesn’t provide the ability to send the request
without reading the response, but it’s fairly trivial to split the
Net::HTTP.request method into two halves to do so, as per below:

require ‘net/http’

module Net
class HTTP < Protocol
# pasted first half of HTTP.request that writes the request to the
server,
# does not return HTTPResponse and does not take a block
def request_async(req, body = nil)
if proxy_user()
unless use_ssl?
req.proxy_basic_auth proxy_user(), proxy_pass()
end
end

  req.set_body_internal body
  begin_transport req
  req.exec @socket, @curr_http_version, edit_path(req.path)
end

# second half of HTTP.request that yields or returns the response
def read_response(req, body = nil, &block)  # :yield: +response+
  begin
    res = HTTPResponse.read_new(@socket)
  end while res.kind_of?(HTTPContinue)
  res.reading_body(@socket, req.response_body_permitted?) {
    yield res if block_given?
  }
  end_transport req, res

  res
end

end
end

Example usage for a non-blocking GET without following redirects:

http = Net::HTTP.new(‘www.google.com’)
req = Net::HTTP::Get.new(‘/’)
http.start
begin
http.request_async(req)

do other stuff

res = http.read_response(req)
ensure
http.finish
end
res.value # raise if error
p res.body