Resolv class caching /etc/hosts entries

Hey guys,

I ran into a subtle bug recently where I did not realize that the
Resolv class actually caches the contents of the /etc/hosts file. This
is problematic, as if the resolution is done in a long-running
process, any changes are not visible to the process unless the class
is reloaded or the process is restarted. I was wondering if anyone
knows what the rationale is for caching the contents and why there are
no checks to see if the file has been modified? For now, I’ve worked
around this by calling Socket::getaddrinfo, as getaddrinfo does the
right thing.

Hi Timur,

If you’re working with DNS you’re going to run into all sorts of caching
fun in general, so it’s a good idea to be prepared for it. Applications
(eg. browsers), operating systems, ISPs, and nameservers all do their
own caching of DNS results. Did you know, for example, that the data
could be over a week out-of-date because you are using an ISP that
disregards TTLs, and your browser has been caching previous results to
remain responsive?

The general rationale is that it is expensive to always have the latest
information available- in terms of amount of data, or just the time
taken to parse /etc/hosts, or just the initial wait in making the
request. Resolver libraries are usually called very frequently with
exactly the same data. To keep libraries responsive, usually the first
request of a sort is made, and the resolver waits for a response. The
next similar requests use cached results, rather than waiting for a
response each time.

I can’t speak for the Resolv class myself- I’ve never used it- but the
rationale is probably similar. If you’ve got a hard requirement such as
checking /etc/hosts for changes, you might want to consider a wrapper or
proxy class that watches /etc/hosts for changes in its timestamp, and
then does whatever you need- restarting OS-level resolvers, perhaps
dropping and recreating the Resolv object, or somehow forcing it to drop
its cache. Exactly what needs to be done will depend on the problem you
are trying to solve.

Hope this helps. :slight_smile:

Cheers,
Garth

Hi Garth,

Thanks for taking the time to reply!

On Wed, Mar 6, 2013 at 6:39 PM, Garthy D
[email protected] wrote:

Hi Timur,

If you’re working with DNS you’re going to run into all sorts of caching fun
in general, so it’s a good idea to be prepared for it. Applications (eg.
browsers), operating systems, ISPs, and nameservers all do their own caching
of DNS results. Did you know, for example, that the data could be over a
week out-of-date because you are using an ISP that disregards TTLs, and your
browser has been caching previous results to remain responsive?

Yes, I do realize that propagating updates can take a significant
amount of time. However, eventually, they should be propagated – the
behavior you’re describing sounds like a bug.

The general rationale is that it is expensive to always have the latest
information available- in terms of amount of data, or just the time taken to
parse /etc/hosts, or just the initial wait in making the request. Resolver
libraries are usually called very frequently with exactly the same data. To
keep libraries responsive, usually the first request of a sort is made, and
the resolver waits for a response. The next similar requests use cached
results, rather than waiting for a response each time.

Agreed, however, the cache should eventually be invalidated if there
is an update to ensure correctness.

I can’t speak for the Resolv class myself- I’ve never used it- but the
rationale is probably similar. If you’ve got a hard requirement such as
checking /etc/hosts for changes, you might want to consider a wrapper or
proxy class that watches /etc/hosts for changes in its timestamp, and then
does whatever you need- restarting OS-level resolvers, perhaps dropping and
recreating the Resolv object, or somehow forcing it to drop its cache.
Exactly what needs to be done will depend on the problem you are trying to
solve.

As I mentioned, one workaround is to call getaddrinfo, since glibc
does not cache getaddrinfo responses.

However, I was trying to point out the larger issue here: the Resolv
class (and the Dnsruby gem), which expose these operations through
class methods, would never recover in case of /etc/hosts being
updated. This is not a delay in propagation, but actually incorrect
behavior – through looking at the source, I didn’t see anything that
would cause the cache to be either invalidated or updated. That in
itself appears like a bug, so I wanted to see if this was a conscious
decision and whether there are any plans to address it. Maybe “talk”
is not the best venue for it?

You are correct, however, to point out that one can work around it by
reloading the class on every query, but that seems like overkill?

Hi Timur,

On 07/03/13 18:15, Timur Alperovich wrote:

Thanks for taking the time to reply!

Not a problem.

amount of time. However, eventually, they should be propagated – the
behavior you’re describing sounds like a bug.

The browser behaviour is an oddity- it’s what you actually want most of
the time, especially if the operating system lookup is sluggish (might
be true on Windows? I’m not sure), but if you’re doing something
involving messing about with hosts, it’s positively painful to work
with.

For the ISP case, it’s more poor behaviour on the part of an ISP rather
than a bug. :} None should ignore TTL, because used properly it is
incredibly useful, especially when migrating hosts. But some do. It
drove me crazy when I had to deal with it- lazy behaviour on the part of
ISPs of which you aren’t even a customer can have nasty effects if your
clients are using that ISP.

does not cache getaddrinfo responses.

You are correct, however, to point out that one can work around it by
reloading the class on every query, but that seems like overkill?

Oh yes, it’s definitely overkill. I was just saying how I’d work around
it if I was faced with a similar problem and needed a solution in the
face of such a shortcoming. :slight_smile:

I’d definitely agree that if there is no way to clear the cache short of
dropping the whole object, then there is an issue/shortcoming/bug.

Another thing to bear in mind is that the fault might not be in Resolv
(it might be, I’m not familiar with it). On Linux, for example,
gethostbyname (apparently) goes through nscd, which is providing its own
caching. I know there have been times in the past when I’ve disabled
that, where the caching has caused more problems than it solved. In this
case, even being able to drop the Resolv cache wouldn’t be enough,
because it is layered on something else that is providing caching. Now
this is all thoeretical, but I’m just bringing up something that might
be happening. Is the fault actually with Resolv? If not, you’re going to
find where the problem occurs, and find a way around that issue. If yes,
it does seem like the lack of ability to drop the cache is either a
shortcoming or bug. But in either case, what to do? Suppose no immediate
fix is available, or you need something that works generally. You might
need to add some logic to your app to handle the additional hard
requirement you have (immediate update if /etc/hosts changes), to ensure
that no matter the details of the underlying implementation, your app
behaves as it should. The audience for Resolv might be more geared
toward simpler use cases employed by browsers and net apps, where
ongoing indefinite caching is sufficient.

Having said that- I definitely don’t want to derail any intended
discussion on any shortcomings of the Resolv class. Please don’t take it
that way. :slight_smile: I’m just running through some possible concerns and
solutions. I’m not suggesting that Resolv is completely fine and
shouldn’t be changed. From what you’ve described, it sounds like there
is an issue there in there that needs to be addressed- being able to
drop the cache at a minimum, and detecting source changes (eg.
/etc/hosts) at best.

Cheers,
Garth

Hi Garth,

On Mar 7, 2013 12:44 AM, “Garthy D”
[email protected]
wrote:

Not a problem.

On Wed, Mar 6, 2013 at 6:39 PM, Garthy D
[email protected] wrote:

If you’re working with DNS you’re going to run into all sorts of
caching fun

in general, so it’s a good idea to be prepared for it. Applications (eg.
browsers), operating systems, ISPs, and nameservers all do their own
caching
of DNS results. Did you know, for example, that the data could be over a
week out-of-date because you are using an ISP that disregards TTLs, and
your
browser has been caching previous results to remain responsive?

Yes, I do realize that propagating updates can take a significant
amount of time. However, eventually, they should be propagated – the
behavior you’re describing sounds like a bug.

The browser behaviour is an oddity- it’s what you actually want most of
the time, especially if the operating system lookup is sluggish (might
be
true on Windows? I’m not sure), but if you’re doing something involving
messing about with hosts, it’s positively painful to work with.

For the ISP case, it’s more poor behaviour on the part of an ISP rather
than a bug. :} None should ignore TTL, because used properly it is
incredibly useful, especially when migrating hosts. But some do. It
drove
me crazy when I had to deal with it- lazy behaviour on the part of ISPs
of
which you aren’t even a customer can have nasty effects if your
clients
are using that ISP.

I can’t speak for the Resolv class myself- I’ve never used it- but the
rationale is probably similar. If you’ve got a hard requirement such as
checking /etc/hosts for changes, you might want to consider a wrapper or
proxy class that watches /etc/hosts for changes in its timestamp, and
then
does whatever you need- restarting OS-level resolvers, perhaps dropping
and
recreating the Resolv object, or somehow forcing it to drop its cache.
Exactly what needs to be done will depend on the problem you are trying
to
behavior – through looking at the source, I didn’t see anything that
would cause the cache to be either invalidated or updated. That in
itself appears like a bug, so I wanted to see if this was a conscious
decision and whether there are any plans to address it. Maybe “talk”
is not the best venue for it?

You are correct, however, to point out that one can work around it by
reloading the class on every query, but that seems like overkill?

Oh yes, it’s definitely overkill. I was just saying how I’d work around
it if I was faced with a similar problem and needed a solution in the
face
of such a shortcoming. :slight_smile:

I’d definitely agree that if there is no way to clear the cache short of
dropping the whole object, then there is an issue/shortcoming/bug.

Another thing to bear in mind is that the fault might not be in Resolv
(it might be, I’m not familiar with it). On Linux, for example,
gethostbyname (apparently) goes through nscd, which is providing its own
caching. I know there have been times in the past when I’ve disabled
that,
where the caching has caused more problems than it solved. In this case,
even being able to drop the Resolv cache wouldn’t be enough, because it
is
layered on something else that is providing caching. Now this is all
thoeretical, but I’m just bringing up something that might be happening.
Is
the fault actually with Resolv? If not, you’re going to find where the
problem occurs, and find a way around that issue. If yes, it does seem
like
the lack of ability to drop the cache is either a shortcoming or bug.
But
in either case, what to do? Suppose no immediate fix is available, or
you
need something that works generally. You might need to add some logic to
your app to handle the additional hard requirement you have (immediate
update if /etc/hosts changes), to ensure that no matter the details of
the
underlying implementation, your app behaves as it should. The audience
for
Resolv might be more geared toward simpler use cases employed by
browsers
and net apps, where ongoing indefinite caching is sufficient.

You may be right about the audience for the module being geared toward
shorter-lived applications. I did look through both the code and
published
API, as well as trying some tests of my own. I did not find a solution
to
this caching issue, which is why I looked to the list for insight.
Should I
ask the same question on the core ruby list?

Having said that- I definitely don’t want to derail any intended
discussion on any shortcomings of the Resolv class. Please don’t take it
that way. :slight_smile: I’m just running through some possible concerns and
solutions.
I’m not suggesting that Resolv is completely fine and shouldn’t be
changed.
From what you’ve described, it sounds like there is an issue there in
there
that needs to be addressed- being able to drop the cache at a minimum,
and
detecting source changes (eg. /etc/hosts) at best.

Agreed. I appreciate you trying to help. At this point, however, I’d
like
to figure out the best place to figure out what’s going on with that gem
(core mailing list?).

Hi Timur,

On 09/03/13 14:34, Timur Alperovich wrote:

Agreed. I appreciate you trying to help. At this point, however, I’d
like to figure out the best place to figure out what’s going on with
that gem (core mailing list?).

I can’t say for sure. However, in your shoes I’d probably start here
(ie. ruby-talk). If I had no further luck, under the circumstances, I’d
consider if filing a bug/feature request here would be appropriate:

I believe this would end up on ruby-core too.

Somebody else might have some better suggestions to add though?

Cheers,
Garth