From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michael Tremer To: development@lists.ipfire.org Subject: Re: [RFC] unbound: Increase timeout value for unknown dns-server Date: Mon, 11 Jan 2021 11:10:39 +0000 Message-ID: <1468B7A9-ECA3-4B77-A4A1-30FBB114C6CB@ipfire.org> In-Reply-To: <096e8184-7dd0-e081-8b5a-c1f7c8dff476@gmail.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============3893513123639227440==" List-Id: --===============3893513123639227440== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable > On 9 Jan 2021, at 18:57, Paul Simmons wrote: >=20 > On 1/9/21 9:04 AM, Michael Tremer wrote: >> Hi, >>=20 >> In that case, I do not think that this change realistically changes anythi= ng for anyone. >>=20 >> In Paul=E2=80=99s case, where the name servers are further away than the t= imeout, he would send another packet, but then receive the first reply (not r= egarding any actual packet loss here), and after that unbound will have learn= ed that the name server is further away. >>=20 >> He would have sent one extra packet. Potentially re-probing will cause the= same effect, but usually unbound should be busy enough to have a rolling mea= n that is up to date at any time. >>=20 >> Therefore this only matters in recursor mode where there are many servers = being contacted instead of only a few forwarders. Again, there would be more = overhead here, but there should not be any effect where names cannot be resol= ved. >>=20 >> We can now increase the timeout, which will cause slower resolution for ma= ny users that are running in recursor mode, or we can just leave it and nothi= ng would change. >>=20 >> -Michael >>=20 >>> On 8 Jan 2021, at 17:33, Jonatan Schlag wro= te: >>>=20 >>> Hi, >>>=20 >>> I will try to provide some explanations to the questions. >>>=20 >>>> Am 06.01.2021 um 19:01 schrieb Michael Tremer : >>>>=20 >>>> =EF=BB=BFHello, >>>>=20 >>>>> On 6 Jan 2021, at 16:19, Tapani Tarvainen = wrote: >>>>>=20 >>>>> On Wed, Jan 06, 2021 at 03:14:52PM +0000, Michael Tremer (michael.treme= r(a)ipfire.org) wrote: >>>>>=20 >>>>>>> On 6 Jan 2021, at 12:02, Paul Simmons wrote: >>>>>>>=20 >>>>>>> On 1/6/21 4:17 AM, Jonatan Schlag wrote: >>>>>>>> When unbound has no information about a DNS-server >>>>>>>> a timeout of 376 msec is assumed. This works well in a lot of situat= ions, >>>>>>>> but they mention in their documentation that this could be way too l= ow. >>>>>>>> They recommend a timeout of 1126 msec for satellite connections >>>>>>>> (https://nlnetlabs.nl/documentation/unbound/unbound.conf). >>>>> A small nit, they actually suggest 1128 ... and that's indeed what >>>>> the patch has: >>>>>=20 >>>>>>>> + unknown-server-time-limit: 1128 >>>>> But that's trivial. The point: >>>>>=20 >>>>>> I am not entirely sure what this is supposed to fix. >>>>>> It is possible that a DNS response takes longer than 376ms, indeed. >>>>>> Does it harm us if we send another packet? No. >>>>> If you are behind a slow satellite link, it can take more than that >>>>> *every time*. >>> This should actually not the case. There is no fixed timeout which can be= set in unbound. They do something much sophisticated here. >>>=20 >>> https://nlnetlabs.nl/documentation/unbound/info-timeout/ >>>=20 >>> When I unterstand this document correctly. They keep something like a rol= ling mean. So if everybody would execute =E2=80=9Aunbound-control dump_infra= =E2=80=98 we all would get different timeout limits for every server and ever= y site. >>> The actual calculation seems to much more complex (or their explanation o= f simple things is very complex without any formulas), this is only a simple = explanation which seems to be necessary for my next paragraph. >>>=20 >>> So the question is, when we have no information about a server (for examp= le right after startup of unbound or if the entry in the infra cache has expi= red (time limit 15 min)), which timeout should we assume. We currently assume= a timeout of 376 msec. They state in their documentation that on slow links = 1128 msec is more suitable. >>>=20 >>> When we have informations about a server (so the rtt of previous requests= ), this value should not matter, when I am get this right. >>>=20 >>>>> So you would always have sent another query before >>>>> getting a response to the previous one. >>>> True, but aren=E2=80=99t these extra-ordinary circumstances? >>>>=20 >>>> On a regular network we want to keep eyeballs happy and when packets get= lost or get sent to a slow server, we want to try again - sooner rather than= later. >>>>=20 >>>> If we would set this to a worst case setting (let=E2=80=99s say 10 secon= ds), then even for average users DNS resolution will become slower. >>>>=20 >>>>> With TCP that would mean never getting a response, because you'd >>>>> always terminate the connection too soon. With UDP, I'm not sure, >>>>> depends on how unbound handles incoming responses to queries it's >>>>> already deemed lost and sent again. Adjusting delay-close might help. >>>>> But it may be it would not work at all when the limit is too small. >>>>>=20 >>>>> That would mean that someone installing IPFire in some remote location >>>>> with a slow link would conclude that it just doesn't work. >>>>>=20 >>>>> The downside of increasing the limit is that sometimes replies will >>>>> take longer when a packet is lost on the way because we'd wait longer >>>>> before re-sending. So it should not be increased too much either. >>> This should only happen in the first time where our own rolling mean is n= ot adjusted to the needs of this side. >>>>> I don't have data to judge what the limit should be, but I'd tend to >>>>> trust nllabs recommendation here and go with the suggested 1128 ms. >>>> Did anyone actually experience some problems here that this needs changi= ng? >>>>=20 >>>> @Jonatan: What is your motivation for this patch? >>> Just opening the discussion. It seems that their handling of timeouts and= the infra cache could had caused a lot of problems for some users, so I thou= ght about bringing this up. Maybe it is a good idea that people like Paul tes= t this before we further think about how this could be implemented. Also addi= ng this to the wiki, that this might be a tweak to improve dns resolution, co= uld be a solution. >>> But people should first check the current infra cache as these values wou= ld determine if this setting would help. >>>=20 >>> I hope a could make some things a little bit more clear. >>>=20 >>> Greetings Jonatan >>>>> --=20 >>>>> Tapani Tarvainen >=20 > Greetings, Michael and @list. >=20 > I tested the ping (-c1) times for the first 27 IPv4 addresses in the DNS se= rver list from the wiki. I can test more, if desired. >=20 > The fastest return was 596ms, and the slowest was 857ms. At present, I'm u= sing 9.9.9.10 (631ms ping) and 81.3.27.54 (752ms ping). >=20 > My DNS protocol is "TLS", and QNAME Minimisation is "Standard". Prior to th= e release with TLS support, I was unable to resolve hosts at all. (Did I men= tion that I dislike HughesNot? I have no other option for 'net connectivity = - boonie life is great for the nerves, but hell on talking to anyone.) The good thing is though, that we have a good test-bed for this kind of conne= ction :) I know of some more people who use a satellite connection, but they are not v= ery keen on testing things with it. > I'm willing to test Tapani's "/etc/unbound/local.d" proposal(s), if it will= clarify the situation. Also, I'm prepared to backup and edit any other file= s that might assist testing. >=20 > I've noticed (from NTP logs) that name resolution usually stalls/fails afte= r ~3 hours when my LAN is quiet. Could changes to cache timeout settings be = beneficial? >=20 > Please advise... >=20 > Thank you (and, GREAT EFFORT, ALL!), >=20 > Paul >=20 > --=20 > It is better to have loved a short man than never to have loved a tall. >=20 --===============3893513123639227440==--