From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michael Tremer To: development@lists.ipfire.org Subject: Re: [PATCH] collectd: Do not sync Date: Thu, 01 Feb 2024 17:32:39 +0100 Message-ID: In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============4839983290350591884==" List-Id: --===============4839983290350591884== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Hello, > On 1 Feb 2024, at 00:50, Robin Roevens wrote: >=20 > Hi Michael >=20 > Thanks for the clear explanation. And indeed I can imagine sync locking > up on a stale NFS mount when it still needs data written to it..I > didn't consider that as I don't have NFS mounts on my ipfire instance, > but other users may indeed have. It does however fits in my "hardware > problem" scenario, as sync won't get an ACK from the stale NFS mount.. > Only networking errors may be more frequent of a problem than actual > hardware failure, increasing the chances that sync indeed locks up. I am actually doing something a little bit more perverted on one system where= I am mounting a large file that is on an NFS share with an ext4 file system = (because NFS lacks certain features). So hence the locking issues. > And it is as you say, a journaling fs should not have a problem when no > manual syncs are performed during 'crucial' operations. Should, but if the FS cannot rely on the data actually being written when it = gets that sort if feedback, then there is only hope left. > The usb flash drive story is new for me, but now that I read your > explanation, I now understand why I sometimes have trouble with corrupt > usb sticks where I just wrote a boot image to and then restart the > machine to boot from that stick to find out it doesn't want to boot due > to a corrupt root partition. Now I know exactly why this sometimes > happens :-) We have (used to have?) a couple of hacks in there. But really bad SSDs have = kind of gone now. Even the not so bad ones don=E2=80=99t cost much these days= =E2=80=A6 So hopefully we won=E2=80=99t have to think about these things for = that much longer. -Michael > Always happy to learn! Thanks >=20 > Regards > Robin >=20 > Michael Tremer schreef op wo 31-01-2024 om 10:54 [+0000]: >> Hello Robin, >>=20 >>> On 30 Jan 2024, at 19:50, Robin Roevens >>> wrote: >>>=20 >>> Hi Michael, all >>>=20 >>> First, this mail does not contain critique, I'm only trying to >>> learn, >>> if I'm wrong. But I think your remark about fixing hardware is not >>> correct? >>>=20 >>> In my understanding, sync synchronizes the not-yet-written data >>> currently in kernel managed cache memory (hence in volatile system >>> memory) to permanent storage. >>=20 >> Yes. >>=20 >>> Normally storage hardware also has its own cache, which I presume >>> you >>> are referring to, but I don't think sync cares or even knows about >>> that >>> and that is/should be handled within the hardware itself.=20 >>> As soon as storage hardware returns an ACK on a write operation, >>> sync >>> is happy. Not actually guaranteeing any physical write, only that >>> all >>> to-be-written data is actually delivered to the storage hardware. >>=20 >> Well, this is all a little bit more complicated, because it massively >> depends on the type of hardware we are talking about=E2=80=A6 >>=20 >> A classic example for a piece of hardware with a cache is a RAID >> controller. When we call sync(), the kernel sends all data to that >> RAID controller and as soon as the last operation has been confirmed, >> sync() is considered done. That does not mean that any of that data >> has actually been written to any physical storage device. RAID >> controllers have some temporary caches that are persistent and so >> that data will not be lost even if there was a power outage >> immediately after the sync(). >>=20 >> Then there is plain old hard drives. They traditionally did not use >> to be very smart. So if the kernel wanted to write something, it >> would have been written straight to disk and once that was done, >> sync() would have returned. >>=20 >> Those two scenarios are not a problem at all. >>=20 >> Then there is the more problematic type of hardware which is usually >> (cheap?) flash storage. Those devices very often have a faster cache >> and then very slow persistent storage. To pretend that a device is >> much faster than it actually is, blocks are being written to a >> volatile cache and later written to the persistent storage, but there >> is no power source in case of a sudden loss of power. >>=20 >> We discovered this with a regular reboot where we also call sync to >> make sure that everything is properly written to disk and then we >> make the system reboot. Those devices confirm that everything is >> written when it isn=E2=80=99t but then the system cycles power and the last >> blocks are gone. That will result in a corrupt file system and those >> systems will perform a filesystem check (and usually repair too) at >> the next boot. >>=20 >> The visual way to see this phenomenon is a USB stick with a USB >> light. Calling sync() will return but the light will continue >> flashing because the device is actually busy, but has told the OS >> that it is already done. And hence we have this problem... >>=20 >>> I do agree that a (full) sync there is probably overkill. The >>> content >>> of the config will be available to the collectd daemon whether it >>> is >>> actually written on disk or still in cache, so no sync needed.=20 >>> If the server crashes/powercycles/... on that point, I think a >>> possible >>> commented line that should be uncommented in that config is then >>> probably of the least concern. >>=20 >> On top of all of this comes that the =E2=80=9Cflash image=E2=80=9D that we= ship used >> to come with no journal. In a journaling file system, a loss of power >> would either result in the old version of that file, or the new one - >> depending on how far the writes have come. Neither of that is a >> problem in our scenario. Without the journal, it is quite likely to >> have a broken file. >>=20 >>> On the other hand, a sync that blocks forever, is probably more of >>> an >>> indicator that the hardware should be fixed, as this would mean >>> sync >>> doesn't get all the ACK's it expects. >>=20 >> It=E2=80=99s not hardware. The system I was dealing with had an NFS volume >> mounted which was unavailable for a moment. That caused the entire >> dial-in sequence to hang because collectd was waiting for a sync() to >> finish just after reconnecting. >>=20 >> So, calling sync() is generally a very bad idea. It works around bugs >> that should not be there in the first place, but is just creating >> some fancy race condition where you increase chances that no data >> will be lost, but it will never be anywhere acceptable for me. >>=20 >> Ergo, don=E2=80=99t buy cheap flash. Use a journaling file system. If your >> flash dies because you are writing to it it was not fit for purpose. >>=20 >> As you can see I am very annoyed about these things because over all >> those years they have cost us so much time to debug and =E2=80=9Cfix=E2=80= =9D and it >> simply is not worth it=E2=80=A6 >>=20 >>> Please correct me if I'm wrong. >>=20 >> I hope this clears it up :) >>=20 >> -Michael >>=20 >>>=20 >>>=20 >>> Regards >>> Robin >>>=20 >>> Michael Tremer schreef op di 30-01-2024 om 18:01 [+0000]: >>>> Calling a global sync operation manually is generally a bad idea >>>> as >>>> it >>>> can block for forever. If people have storage that does not >>>> retain >>>> anything that is being written to it, they need to fix their >>>> hardware. >>>>=20 >>>> Signed-off-by: Michael Tremer >>>> --- >>>> src/initscripts/system/collectd | 3 --- >>>> 1 file changed, 3 deletions(-) >>>>=20 >>>> diff --git a/src/initscripts/system/collectd >>>> b/src/initscripts/system/collectd >>>> index bb8a2f54f..56b799d56 100644 >>>> --- a/src/initscripts/system/collectd >>>> +++ b/src/initscripts/system/collectd >>>> @@ -146,9 +146,6 @@ case "$1" in >>>> sed -i -e "s|^#LoadPlugin swap|LoadPlugin >>>> swap|g" /etc/collectd.conf >>>> fi >>>> =20 >>>> - # sync after config update... >>>> - sync >>>> - >>>> if [ $(date +%Y) -gt 2011 ]; then >>>> boot_mesg "Starting Collection daemon..." >>>> /usr/sbin/collectd -C /etc/collectd.conf >>>> --=20 >>>> 2.39.2 >>>>=20 >>>>=20 >>>=20 >>> --=20 >>> Dit bericht is gescanned op virussen en andere gevaarlijke >>> inhoud door MailScanner en lijkt schoon te zijn. >>>=20 >>=20 >>=20 >=20 > --=20 > Dit bericht is gescanned op virussen en andere gevaarlijke > inhoud door MailScanner en lijkt schoon te zijn. >=20 --===============4839983290350591884==--