From mboxrd@z Thu Jan  1 00:00:00 1970
From: Michael Tremer <michael.tremer@ipfire.org>
To: development@lists.ipfire.org
Subject: Re: [PATCH] collectd: Do not sync
Date: Thu, 01 Feb 2024 17:32:39 +0100
Message-ID: <FA2FA297-0755-4BD6-8370-D52C5983F0AE@ipfire.org>
In-Reply-To: <f0d193df606966cb12fa12d98d80f8799f898532.camel@sicho.home>
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="===============4839983290350591884=="
List-Id: <development.lists.ipfire.org>

--===============4839983290350591884==
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable

Hello,

> On 1 Feb 2024, at 00:50, Robin Roevens <robin.roevens(a)disroot.org> wrote:
>=20
> Hi Michael
>=20
> Thanks for the clear explanation. And indeed I can imagine sync locking
> up on a stale NFS mount when it still needs data written to it..I
> didn't consider that as I don't have NFS mounts on my ipfire instance,
> but other users may indeed have. It does however fits in my "hardware
> problem" scenario, as sync won't get an ACK from the stale NFS mount..
> Only networking errors may be more frequent of a problem than actual
> hardware failure, increasing the chances that sync indeed locks up.

I am actually doing something a little bit more perverted on one system where=
 I am mounting a large file that is on an NFS share with an ext4 file system =
(because NFS lacks certain features). So hence the locking issues.

> And it is as you say, a journaling fs should not have a problem when no
> manual syncs are performed during 'crucial' operations.

Should, but if the FS cannot rely on the data actually being written when it =
gets that sort if feedback, then there is only hope left.

> The usb flash drive story is new for me, but now that I read your
> explanation, I now understand why I sometimes have trouble with corrupt
> usb sticks where I just wrote a boot image to and then restart the
> machine to boot from that stick to find out it doesn't want to boot due
> to a corrupt root partition. Now I know exactly why this sometimes
> happens :-)

We have (used to have?) a couple of hacks in there. But really bad SSDs have =
kind of gone now. Even the not so bad ones don=E2=80=99t cost much these days=
=E2=80=A6 So hopefully we won=E2=80=99t have to think about these things for =
that much longer.

-Michael

> Always happy to learn! Thanks
>=20
> Regards
> Robin
>=20
> Michael Tremer schreef op wo 31-01-2024 om 10:54 [+0000]:
>> Hello Robin,
>>=20
>>> On 30 Jan 2024, at 19:50, Robin Roevens <robin.roevens(a)disroot.org>
>>> wrote:
>>>=20
>>> Hi Michael, all
>>>=20
>>> First, this mail does not contain critique, I'm only trying to
>>> learn,
>>> if I'm wrong. But I think your remark about fixing hardware is not
>>> correct?
>>>=20
>>> In my understanding, sync synchronizes the not-yet-written data
>>> currently in kernel managed cache memory (hence in volatile system
>>> memory) to permanent storage.
>>=20
>> Yes.
>>=20
>>> Normally storage hardware also has its own cache, which I presume
>>> you
>>> are referring to, but I don't think sync cares or even knows about
>>> that
>>> and that is/should be handled within the hardware itself.=20
>>> As soon as storage hardware returns an ACK on a write operation,
>>> sync
>>> is happy. Not actually guaranteeing any physical write, only that
>>> all
>>> to-be-written data is actually delivered to the storage hardware.
>>=20
>> Well, this is all a little bit more complicated, because it massively
>> depends on the type of hardware we are talking about=E2=80=A6
>>=20
>> A classic example for a piece of hardware with a cache is a RAID
>> controller. When we call sync(), the kernel sends all data to that
>> RAID controller and as soon as the last operation has been confirmed,
>> sync() is considered done. That does not mean that any of that data
>> has actually been written to any physical storage device. RAID
>> controllers have some temporary caches that are persistent and so
>> that data will not be lost even if there was a power outage
>> immediately after the sync().
>>=20
>> Then there is plain old hard drives. They traditionally did not use
>> to be very smart. So if the kernel wanted to write something, it
>> would have been written straight to disk and once that was done,
>> sync() would have returned.
>>=20
>> Those two scenarios are not a problem at all.
>>=20
>> Then there is the more problematic type of hardware which is usually
>> (cheap?) flash storage. Those devices very often have a faster cache
>> and then very slow persistent storage. To pretend that a device is
>> much faster than it actually is, blocks are being written to a
>> volatile cache and later written to the persistent storage, but there
>> is no power source in case of a sudden loss of power.
>>=20
>> We discovered this with a regular reboot where we also call sync to
>> make sure that everything is properly written to disk and then we
>> make the system reboot. Those devices confirm that everything is
>> written when it isn=E2=80=99t but then the system cycles power and the last
>> blocks are gone. That will result in a corrupt file system and those
>> systems will perform a filesystem check (and usually repair too) at
>> the next boot.
>>=20
>> The visual way to see this phenomenon is a USB stick with a USB
>> light. Calling sync() will return but the light will continue
>> flashing because the device is actually busy, but has told the OS
>> that it is already done. And hence we have this problem...
>>=20
>>> I do agree that a (full) sync there is probably overkill. The
>>> content
>>> of the config will be available to the collectd daemon whether it
>>> is
>>> actually written on disk or still in cache, so no sync needed.=20
>>> If the server crashes/powercycles/... on that point, I think a
>>> possible
>>> commented line that should be uncommented in that config is then
>>> probably of the least concern.
>>=20
>> On top of all of this comes that the =E2=80=9Cflash image=E2=80=9D that we=
 ship used
>> to come with no journal. In a journaling file system, a loss of power
>> would either result in the old version of that file, or the new one -
>> depending on how far the writes have come. Neither of that is a
>> problem in our scenario. Without the journal, it is quite likely to
>> have a broken file.
>>=20
>>> On the other hand, a sync that blocks forever, is probably more of
>>> an
>>> indicator that the hardware should be fixed, as this would mean
>>> sync
>>> doesn't get all the ACK's it expects.
>>=20
>> It=E2=80=99s not hardware. The system I was dealing with had an NFS volume
>> mounted which was unavailable for a moment. That caused the entire
>> dial-in sequence to hang because collectd was waiting for a sync() to
>> finish just after reconnecting.
>>=20
>> So, calling sync() is generally a very bad idea. It works around bugs
>> that should not be there in the first place, but is just creating
>> some fancy race condition where you increase chances that no data
>> will be lost, but it will never be anywhere acceptable for me.
>>=20
>> Ergo, don=E2=80=99t buy cheap flash. Use a journaling file system. If your
>> flash dies because you are writing to it it was not fit for purpose.
>>=20
>> As you can see I am very annoyed about these things because over all
>> those years they have cost us so much time to debug and =E2=80=9Cfix=E2=80=
=9D and it
>> simply is not worth it=E2=80=A6
>>=20
>>> Please correct me if I'm wrong.
>>=20
>> I hope this clears it up :)
>>=20
>> -Michael
>>=20
>>>=20
>>>=20
>>> Regards
>>> Robin
>>>=20
>>> Michael Tremer schreef op di 30-01-2024 om 18:01 [+0000]:
>>>> Calling a global sync operation manually is generally a bad idea
>>>> as
>>>> it
>>>> can block for forever. If people have storage that does not
>>>> retain
>>>> anything that is being written to it, they need to fix their
>>>> hardware.
>>>>=20
>>>> Signed-off-by: Michael Tremer <michael.tremer(a)ipfire.org>
>>>> ---
>>>>  src/initscripts/system/collectd | 3 ---
>>>>  1 file changed, 3 deletions(-)
>>>>=20
>>>> diff --git a/src/initscripts/system/collectd
>>>> b/src/initscripts/system/collectd
>>>> index bb8a2f54f..56b799d56 100644
>>>> --- a/src/initscripts/system/collectd
>>>> +++ b/src/initscripts/system/collectd
>>>> @@ -146,9 +146,6 @@ case "$1" in
>>>>   sed -i -e "s|^#LoadPlugin swap|LoadPlugin
>>>> swap|g" /etc/collectd.conf
>>>>   fi
>>>> =20
>>>> - # sync after config update...
>>>> - sync
>>>> -
>>>>   if [ $(date +%Y) -gt 2011 ]; then
>>>>   boot_mesg "Starting Collection daemon..."
>>>>   /usr/sbin/collectd -C /etc/collectd.conf
>>>> --=20
>>>> 2.39.2
>>>>=20
>>>>=20
>>>=20
>>> --=20
>>> Dit bericht is gescanned op virussen en andere gevaarlijke
>>> inhoud door MailScanner en lijkt schoon te zijn.
>>>=20
>>=20
>>=20
>=20
> --=20
> Dit bericht is gescanned op virussen en andere gevaarlijke
> inhoud door MailScanner en lijkt schoon te zijn.
>=20


--===============4839983290350591884==--