From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michael Tremer To: development@lists.ipfire.org Subject: Re: [RFC] unbound: Increase timeout value for unknown dns-server Date: Sat, 09 Jan 2021 15:04:33 +0000 Message-ID: <4EEEF91B-540A-406B-B9C7-C3C8606026A0@ipfire.org> In-Reply-To: <20E5B302-A896-4BD2-BAD1-9D6A50831514@ipfire.org> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============3264306789436219379==" List-Id: --===============3264306789436219379== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Hi, In that case, I do not think that this change realistically changes anything = for anyone. In Paul=E2=80=99s case, where the name servers are further away than the time= out, he would send another packet, but then receive the first reply (not rega= rding any actual packet loss here), and after that unbound will have learned = that the name server is further away. He would have sent one extra packet. Potentially re-probing will cause the sa= me effect, but usually unbound should be busy enough to have a rolling mean t= hat is up to date at any time. Therefore this only matters in recursor mode where there are many servers bei= ng contacted instead of only a few forwarders. Again, there would be more ove= rhead here, but there should not be any effect where names cannot be resolved. We can now increase the timeout, which will cause slower resolution for many = users that are running in recursor mode, or we can just leave it and nothing = would change. -Michael > On 8 Jan 2021, at 17:33, Jonatan Schlag wrote: >=20 > Hi, >=20 > I will try to provide some explanations to the questions. >=20 >> Am 06.01.2021 um 19:01 schrieb Michael Tremer : >>=20 >> =EF=BB=BFHello, >>=20 >>> On 6 Jan 2021, at 16:19, Tapani Tarvainen w= rote: >>>=20 >>> On Wed, Jan 06, 2021 at 03:14:52PM +0000, Michael Tremer (michael.tremer(= a)ipfire.org) wrote: >>>=20 >>>>> On 6 Jan 2021, at 12:02, Paul Simmons wrote: >>>>>=20 >>>>> On 1/6/21 4:17 AM, Jonatan Schlag wrote: >>>>>> When unbound has no information about a DNS-server >>>>>> a timeout of 376 msec is assumed. This works well in a lot of situatio= ns, >>>>>> but they mention in their documentation that this could be way too low. >>>>>> They recommend a timeout of 1126 msec for satellite connections >>>>>> (https://nlnetlabs.nl/documentation/unbound/unbound.conf). >>>=20 >>> A small nit, they actually suggest 1128 ... and that's indeed what >>> the patch has: >>>=20 >>>>>> + unknown-server-time-limit: 1128 >>>=20 >>> But that's trivial. The point: >>>=20 >>>> I am not entirely sure what this is supposed to fix. >>>=20 >>>> It is possible that a DNS response takes longer than 376ms, indeed. >>>> Does it harm us if we send another packet? No. >>>=20 >>> If you are behind a slow satellite link, it can take more than that >>> *every time*.=20 > This should actually not the case. There is no fixed timeout which can be s= et in unbound. They do something much sophisticated here.=20 >=20 > https://nlnetlabs.nl/documentation/unbound/info-timeout/ >=20 > When I unterstand this document correctly. They keep something like a rolli= ng mean. So if everybody would execute =E2=80=9Aunbound-control dump_infra=E2= =80=98 we all would get different timeout limits for every server and every s= ite.=20 > The actual calculation seems to much more complex (or their explanation of = simple things is very complex without any formulas), this is only a simple ex= planation which seems to be necessary for my next paragraph. >=20 > So the question is, when we have no information about a server (for example= right after startup of unbound or if the entry in the infra cache has expire= d (time limit 15 min)), which timeout should we assume. We currently assume a= timeout of 376 msec. They state in their documentation that on slow links 11= 28 msec is more suitable.=20 >=20 > When we have informations about a server (so the rtt of previous requests),= this value should not matter, when I am get this right.=20 >=20 >>> So you would always have sent another query before >>> getting a response to the previous one. >>=20 >> True, but aren=E2=80=99t these extra-ordinary circumstances? >>=20 >> On a regular network we want to keep eyeballs happy and when packets get l= ost or get sent to a slow server, we want to try again - sooner rather than l= ater. >>=20 >> If we would set this to a worst case setting (let=E2=80=99s say 10 seconds= ), then even for average users DNS resolution will become slower. >>=20 >>> With TCP that would mean never getting a response, because you'd >>> always terminate the connection too soon. With UDP, I'm not sure, >>> depends on how unbound handles incoming responses to queries it's >>> already deemed lost and sent again. Adjusting delay-close might help. >>> But it may be it would not work at all when the limit is too small. >>>=20 >>> That would mean that someone installing IPFire in some remote location >>> with a slow link would conclude that it just doesn't work. >>>=20 >>> The downside of increasing the limit is that sometimes replies will >>> take longer when a packet is lost on the way because we'd wait longer >>> before re-sending. So it should not be increased too much either. > This should only happen in the first time where our own rolling mean is not= adjusted to the needs of this side. >>>=20 >>> I don't have data to judge what the limit should be, but I'd tend to >>> trust nllabs recommendation here and go with the suggested 1128 ms. >>=20 >> Did anyone actually experience some problems here that this needs changing? >>=20 >> @Jonatan: What is your motivation for this patch? >=20 > Just opening the discussion. It seems that their handling of timeouts and t= he infra cache could had caused a lot of problems for some users, so I though= t about bringing this up. Maybe it is a good idea that people like Paul test = this before we further think about how this could be implemented. Also adding= this to the wiki, that this might be a tweak to improve dns resolution, coul= d be a solution. > But people should first check the current infra cache as these values would= determine if this setting would help. >=20 > I hope a could make some things a little bit more clear. >=20 > Greetings Jonatan =20 >>=20 >>>=20 >>> --=20 >>> Tapani Tarvainen --===============3264306789436219379==--