Net::http and caching files

I would like to open a webpage, only if the page is newer than what I
already have.

It looks like I have to get the whole page to get the last_modified
value. I can’t see anyway else to get the value, say off of the
Net::HTTP::HEAD.

I was hoping to save some bandwidth, am I SOL?

~S

Shea M. wrote the following on 02.08.2007 17:24 :

You don’t use the last_modified value like that. You make a simple GET
but you pass headers to tell the server that you only want the whole
page if the content has been modified.

So you’ll have to
1/ store the ‘last-modified’ and ‘etag’ headers in the response (when it
has been modified, on first fetch or when the server is updated to put
them in the response).
2/ put them in the headers of your get request when you have them, like
that:

headers = {}
headers[“If-Modified-Since”] = last_modified if last_modified
headers[“If-None-Match”] = etag if etag

3/ check that response.is_a?(Net::HTTPNotModified)

Lionel

Shea M. wrote:

I would like to open a webpage, only if the page is newer than what I
already have.

It looks like I have to get the whole page to get the last_modified
value. I can’t see anyway else to get the value, say off of the
Net::HTTP::HEAD.

I was hoping to save some bandwidth, am I SOL?

~S

It looks like Net::HTTP::Options might be what I want, just trying to
decipher the docs for it now.

~S

Just read what ‘etag’ is. Do I actually need mtime, if I have etag?

Depends on the server on the other side. Both have roughly the same
usage (‘last_modified’ can’t be reliably parsed as an accurate date as
there are servers with inaccurate clocks or bad timezone settings) but
anyone of them can be used at the server’s discretion. If you don’t know
in advance which server you’ll fetch information from and which header
it will respond with, better implement support bor both.

Lionel

I have just tried about 20 servers (random urls), and have not seen and
etag or last_modified on any of them. Is there really that few of
servers which support the two?

Am I doing something wrong?

I am on win32 if it matters.

CODE:
require ‘open-uri’

h = {}

h[‘If-Modified-Since’] = ‘Thu, 09 Aug 2007 17:33:40 GMT’

http = Net::HTTP.new( “www.google.com” )
resp, data = http.get( “/index.html” )
p “r is #{resp}”
p “code is #{resp.code}”
resp.each { |k,v| p “#{k} = #{v}” }

#open( “http://google.ca” ) do |f|

p f.last_modified

#end

exit 0

~S

Shea M. wrote the following on 02.08.2007 20:35 :

#end

exit 0

Of course google won’t send you last_modified or etag headers they don’t
have documents to tag with an etag or to mark as generated at a given
time. If they want to optimize the bandwidth they are far more likeliy
to use cache-control headers, which they do for their main page with:
“Cache-Control: private”

Look at RSS or Atom feeds, between 1/3 to 1/2 of them have
“last-modified” headers, they are “modified” when a new article or
comment is posted…

Lionel.

Lionel B. wrote:

~S
2/ put them in the headers of your get request when you have them, like
that:

headers = {}
headers[“If-Modified-Since”] = last_modified if last_modified
headers[“If-None-Match”] = etag if etag

3/ check that response.is_a?(Net::HTTPNotModified)

Just read what ‘etag’ is. Do I actually need mtime, if I have etag?

Thanks,

~S