From mboxrd@z Thu Jan  1 00:00:00 1970
From: Michael Tremer <michael.tremer@ipfire.org>
To: development@lists.ipfire.org
Subject: Re: Long delays restarting unbound if Red is down
Date: Thu, 13 Dec 2018 16:41:59 +0000
Message-ID: <11684CBC-A4C2-4F5E-B8B8-27AA1D32EBFE@ipfire.org>
In-Reply-To: <94190F08-05FC-41C1-B68D-6D8CEE8195DC@rymes.com>
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="===============0902634219304061903=="
List-Id: <development.lists.ipfire.org>

--===============0902634219304061903==
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable


> On 13 Dec 2018, at 16:36, Tom Rymes <trymes(a)rymes.com> wrote:
>=20
> That=E2=80=99s precisely the problem, Michael. Everything works fine, until=
 something (like an internet outage) prevents you from reaching the DNS serve=
rs. Then, when you reboot the router as part of a troubleshooting process, it=
 takes forever while unbound flails about.

Yes. If you have seen my conversation with Erik on this list, we are aware of=
 loads of issues with this script and want to remove those problems.

However, getting rid of that script might have other implications that we do =
not know yet. It is actually quite risky.

Unbound just does not handle those DNS servers well that do not conform to (r=
ecent) RFC standards. I am not sure if that has changed with recent releases =
and with more people who have deployed unbound in recent months and years. I =
hope it did.

> In this instance, we were replacing a failed router, and someone had miscon=
figured the red interface settings, so we were modifying those settings in se=
tup. It took us a while to figure out what we were doing wrong, so we ended u=
p having to wait over and over again as unbound tried to restart and failed. =
We were the problem, but unbound made it a real hassle while we worked to fig=
ure out what we were doing wrong.
>=20
> Shouldn=E2=80=99t it have a reasonably short timeout for when the DNS serve=
rs are unavailable?

It does, but that timeout is still quite long. Some DNS servers do not respon=
d when the EDNS0 buffer size is too large. We try to go down from a large siz=
e to a smaller one and hope that the server responds at some point (old Cisco=
 equipment is to blame for this - they assumed that a DNS packet will never b=
e larger than a couple of bytes). Each try has a small timeout, but we will t=
est quite a couple of sizes until we finally decide that the server does not =
respond at all.

The algorithm could of course be tuned to figure that out earlier, but I neve=
r considered many people running into this problem. However, it has an effect=
 too when the name servers are not reachable at all.

I want to get rid of this test though. That should be the solution instead of=
 making a shitty script a little bit better. It will still be a bit shitty.

Best,
-Michael

>=20
> Tom
>=20
>> On Dec 13, 2018, at 11:19 AM, Michael Tremer <michael.tremer(a)ipfire.org>=
 wrote:
>>=20
>> Hi Tom,
>>=20
>> the script is usually quite fast when it can reach the DNS servers.
>>=20
>> If it can=E2=80=99t it is waiting quite long and runs into a timeout.
>>=20
>> Is there a reason why it cannot reach the DNS servers? It should be able t=
o or you should not have functioning DNS.
>>=20
>> -Michael
>>=20
>>> On 12 Dec 2018, at 15:31, Tom Rymes <trymes(a)rymes.com> wrote:
>>>=20
>>> Last night we were fighting with a router that had a non-functioning red =
interface. It ended up being a misconfiguration on our part, but while we wer=
e troubleshooting, we routinely had to wait for very long periods as unbound =
failed to contact the defined name servers.=20
>>>=20
>>> Is there a way to make the startup script time out more quickly when thin=
gs are not functioning?
>>>=20
>>> Tom
>>=20


--===============0902634219304061903==--