Hey guys, I ran into a subtle bug recently where I did not realize that the Resolv class actually caches the contents of the /etc/hosts file. This is problematic, as if the resolution is done in a long-running process, any changes are not visible to the process unless the class is reloaded or the process is restarted. I was wondering if anyone knows what the rationale is for caching the contents and why there are no checks to see if the file has been modified? For now, I've worked around this by calling Socket::getaddrinfo, as getaddrinfo does the right thing.
on 2013-03-06 23:42
on 2013-03-07 03:40
Hi Timur, If you're working with DNS you're going to run into all sorts of caching fun in general, so it's a good idea to be prepared for it. Applications (eg. browsers), operating systems, ISPs, and nameservers all do their own caching of DNS results. Did you know, for example, that the data could be over a week out-of-date because you are using an ISP that disregards TTLs, and your browser has been caching previous results to remain responsive? The general rationale is that it is expensive to always have the latest information available- in terms of amount of data, or just the time taken to parse /etc/hosts, or just the initial wait in making the request. Resolver libraries are usually called very frequently with exactly the same data. To keep libraries responsive, usually the first request of a sort is made, and the resolver waits for a response. The next similar requests use cached results, rather than waiting for a response each time. I can't speak for the Resolv class myself- I've never used it- but the rationale is probably similar. If you've got a hard requirement such as checking /etc/hosts for changes, you might want to consider a wrapper or proxy class that watches /etc/hosts for changes in its timestamp, and then does whatever you need- restarting OS-level resolvers, perhaps dropping and recreating the Resolv object, or somehow forcing it to drop its cache. Exactly what needs to be done will depend on the problem you are trying to solve. Hope this helps. :) Cheers, Garth
on 2013-03-07 08:47
Hi Garth, Thanks for taking the time to reply! On Wed, Mar 6, 2013 at 6:39 PM, Garthy D <garthy_lmkltybr@entropicsoftware.com> wrote: > > Hi Timur, > > If you're working with DNS you're going to run into all sorts of caching fun > in general, so it's a good idea to be prepared for it. Applications (eg. > browsers), operating systems, ISPs, and nameservers all do their own caching > of DNS results. Did you know, for example, that the data could be over a > week out-of-date because you are using an ISP that disregards TTLs, and your > browser has been caching previous results to remain responsive? Yes, I do realize that propagating updates can take a significant amount of time. However, eventually, they should be propagated -- the behavior you're describing sounds like a bug. > The general rationale is that it is expensive to always have the latest > information available- in terms of amount of data, or just the time taken to > parse /etc/hosts, or just the initial wait in making the request. Resolver > libraries are usually called very frequently with exactly the same data. To > keep libraries responsive, usually the first request of a sort is made, and > the resolver waits for a response. The next similar requests use cached > results, rather than waiting for a response each time. Agreed, however, the cache should eventually be invalidated if there is an update to ensure correctness. > I can't speak for the Resolv class myself- I've never used it- but the > rationale is probably similar. If you've got a hard requirement such as > checking /etc/hosts for changes, you might want to consider a wrapper or > proxy class that watches /etc/hosts for changes in its timestamp, and then > does whatever you need- restarting OS-level resolvers, perhaps dropping and > recreating the Resolv object, or somehow forcing it to drop its cache. > Exactly what needs to be done will depend on the problem you are trying to > solve. As I mentioned, one workaround is to call getaddrinfo, since glibc does not cache getaddrinfo responses. However, I was trying to point out the larger issue here: the Resolv class (and the Dnsruby gem), which expose these operations through class methods, would never recover in case of /etc/hosts being updated. This is not a delay in propagation, but actually incorrect behavior -- through looking at the source, I didn't see anything that would cause the cache to be either invalidated or updated. That in itself appears like a bug, so I wanted to see if this was a conscious decision and whether there are any plans to address it. Maybe "talk" is not the best venue for it? You are correct, however, to point out that one can work around it by reloading the class on every query, but that seems like overkill?
on 2013-03-07 09:44
Hi Timur, On 07/03/13 18:15, Timur Alperovich wrote: > Thanks for taking the time to reply! Not a problem. > amount of time. However, eventually, they should be propagated -- the > behavior you're describing sounds like a bug. The browser behaviour is an oddity- it's what you actually want most of the time, especially if the operating system lookup is sluggish (might be true on Windows? I'm not sure), but if you're doing something involving messing about with hosts, it's positively painful to work with. For the ISP case, it's more poor behaviour on the part of an ISP rather than a bug. :} None *should* ignore TTL, because used properly it is incredibly useful, especially when migrating hosts. But some do. It drove me crazy when I had to deal with it- lazy behaviour on the part of ISPs of which you aren't even a customer can have nasty effects if your *clients* are using that ISP. > does not cache getaddrinfo responses. > > You are correct, however, to point out that one can work around it by > reloading the class on every query, but that seems like overkill? Oh yes, it's definitely overkill. I was just saying how I'd work around it if I was faced with a similar problem and needed a solution in the face of such a shortcoming. :) I'd definitely agree that if there is no way to clear the cache short of dropping the whole object, then there is an issue/shortcoming/bug. Another thing to bear in mind is that the fault might not be in Resolv (it might be, I'm not familiar with it). On Linux, for example, gethostbyname (apparently) goes through nscd, which is providing its own caching. I know there have been times in the past when I've disabled that, where the caching has caused more problems than it solved. In this case, even being able to drop the Resolv cache wouldn't be enough, because it is layered on something else that is providing caching. Now this is all thoeretical, but I'm just bringing up something that might be happening. Is the fault actually with Resolv? If not, you're going to find where the problem occurs, and find a way around that issue. If yes, it does seem like the lack of ability to drop the cache is either a shortcoming or bug. But in either case, what to do? Suppose no immediate fix is available, or you need something that works generally. You might need to add some logic to your app to handle the additional hard requirement you have (immediate update if /etc/hosts changes), to ensure that no matter the details of the underlying implementation, your app behaves as it should. The audience for Resolv might be more geared toward simpler use cases employed by browsers and net apps, where ongoing indefinite caching is sufficient. Having said that- I definitely don't want to derail any intended discussion on any shortcomings of the Resolv class. Please don't take it that way. :) I'm just running through some possible concerns and solutions. I'm not suggesting that Resolv is completely fine and shouldn't be changed. From what you've described, it sounds like there is an issue there in there that needs to be addressed- being able to drop the cache at a minimum, and detecting source changes (eg. /etc/hosts) at best. Cheers, Garth
on 2013-03-09 05:05
Hi Garth, On Mar 7, 2013 12:44 AM, "Garthy D" <garthy_lmkltybr@entropicsoftware.com> wrote: > Not a problem. > >> On Wed, Mar 6, 2013 at 6:39 PM, Garthy D >> <garthy_lmkltybr@entropicsoftware.com> wrote: >>> >>> If you're working with DNS you're going to run into all sorts of caching fun >>> >>> in general, so it's a good idea to be prepared for it. Applications (eg. >>> browsers), operating systems, ISPs, and nameservers all do their own caching >>> of DNS results. Did you know, for example, that the data could be over a >>> week out-of-date because you are using an ISP that disregards TTLs, and your >>> browser has been caching previous results to remain responsive? >> >> >> Yes, I do realize that propagating updates can take a significant >> amount of time. However, eventually, they should be propagated -- the >> behavior you're describing sounds like a bug. > > > The browser behaviour is an oddity- it's what you actually want most of the time, especially if the operating system lookup is sluggish (might be true on Windows? I'm not sure), but if you're doing something involving messing about with hosts, it's positively painful to work with. > > For the ISP case, it's more poor behaviour on the part of an ISP rather than a bug. :} None *should* ignore TTL, because used properly it is incredibly useful, especially when migrating hosts. But some do. It drove me crazy when I had to deal with it- lazy behaviour on the part of ISPs of which you aren't even a customer can have nasty effects if your *clients* are using that ISP. > > >>> I can't speak for the Resolv class myself- I've never used it- but the >>> rationale is probably similar. If you've got a hard requirement such as >>> checking /etc/hosts for changes, you might want to consider a wrapper or >>> proxy class that watches /etc/hosts for changes in its timestamp, and then >>> does whatever you need- restarting OS-level resolvers, perhaps dropping and >>> recreating the Resolv object, or somehow forcing it to drop its cache. >>> Exactly what needs to be done will depend on the problem you are trying to >> behavior -- through looking at the source, I didn't see anything that >> would cause the cache to be either invalidated or updated. That in >> itself appears like a bug, so I wanted to see if this was a conscious >> decision and whether there are any plans to address it. Maybe "talk" >> is not the best venue for it? >> >> You are correct, however, to point out that one can work around it by >> reloading the class on every query, but that seems like overkill? > > > Oh yes, it's definitely overkill. I was just saying how I'd work around it if I was faced with a similar problem and needed a solution in the face of such a shortcoming. :) > > I'd definitely agree that if there is no way to clear the cache short of dropping the whole object, then there is an issue/shortcoming/bug. > > Another thing to bear in mind is that the fault might not be in Resolv (it might be, I'm not familiar with it). On Linux, for example, gethostbyname (apparently) goes through nscd, which is providing its own caching. I know there have been times in the past when I've disabled that, where the caching has caused more problems than it solved. In this case, even being able to drop the Resolv cache wouldn't be enough, because it is layered on something else that is providing caching. Now this is all thoeretical, but I'm just bringing up something that might be happening. Is the fault actually with Resolv? If not, you're going to find where the problem occurs, and find a way around that issue. If yes, it does seem like the lack of ability to drop the cache is either a shortcoming or bug. But in either case, what to do? Suppose no immediate fix is available, or you need something that works generally. You might need to add some logic to your app to handle the additional hard requirement you have (immediate update if /etc/hosts changes), to ensure that no matter the details of the underlying implementation, your app behaves as it should. The audience for Resolv might be more geared toward simpler use cases employed by browsers and net apps, where ongoing indefinite caching is sufficient. > You may be right about the audience for the module being geared toward shorter-lived applications. I did look through both the code and published API, as well as trying some tests of my own. I did not find a solution to this caching issue, which is why I looked to the list for insight. Should I ask the same question on the core ruby list? > Having said that- I definitely don't want to derail any intended discussion on any shortcomings of the Resolv class. Please don't take it that way. :) I'm just running through some possible concerns and solutions. I'm not suggesting that Resolv is completely fine and shouldn't be changed. From what you've described, it sounds like there is an issue there in there that needs to be addressed- being able to drop the cache at a minimum, and detecting source changes (eg. /etc/hosts) at best. > Agreed. I appreciate you trying to help. At this point, however, I'd like to figure out the best place to figure out what's going on with that gem (core mailing list?).
on 2013-03-10 01:52
Hi Timur, On 09/03/13 14:34, Timur Alperovich wrote: > Agreed. I appreciate you trying to help. At this point, however, I'd > like to figure out the best place to figure out what's going on with > that gem (core mailing list?). I can't say for sure. However, in your shoes I'd probably start here (ie. ruby-talk). If I had no further luck, under the circumstances, I'd consider if filing a bug/feature request here would be appropriate: https://bugs.ruby-lang.org/ I believe this would end up on ruby-core too. Somebody else might have some better suggestions to add though? Cheers, Garth
Please log in before posting. Registration is free and takes only a minute.
Existing account
(Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
Log in with Google account | Log in with Yahoo account
No account? Register here.