A fresh build with empty ccache boots also with the slab cache patch so RANDRTRUCT should be the real problem. Arne Am 2022-08-09 08:23, schrieb Arne Fitzenreiter: > Am 2022-08-08 17:47, schrieb Peter Müller: >> Hello Arne, >> >> thanks for reporting back. >> >> This means the slab cache patch is the problem. > > Im not sure. I fear it could be the RANDSTRUCT because after a version > update of the kernel > it not use the ccache at first build and after a small config change > it could break if parts of > the kernel used from cache and some not. > > At the moment i test a clean build without ccache but enabled slub > cache patch. If this work > it is the RANDSTRUCT change. > > Arne > >> >> Unfortunately, my local C-cache appears to be completely messed up >> now, so I >> will have to start with a clean cache, hence it will probably take me >> until >> tomorrow to have some testing results ready. >> >> Will keep you updated. >> >> Thanks, and best regards, >> Peter Müller >> >> >>> With this >>> https://nightly.ipfire.org/next/2022-08-06%2007:45:02%20+0000-43df4a03/ >>> nightly the kernel 5.15.59 boots on real hardware (x86_64 and >>> aarch64) >>> After >>> commit 06b4164dfe269704976b52421edbbbdf3b345679 >>> Author: Peter Müller >>> Date:   Mon Aug 1 17:39:59 2022 +0000 >>> >>>     linux: Do not allow slab caches to be merged >>> >>> >>> it doesn't boot anymore. (also tested on x86_64 and aarch64) >>> >>> Arne >>> >>> >>> Am 2022-08-08 12:22, schrieb Michael Tremer: >>>> Hello, >>>> >>>>> On 8 Aug 2022, at 11:16, Peter Müller >>>>> wrote: >>>>> >>>>> Hello Michael, hello Arne, >>>>> >>>>> just a quick reply: I think we are dealing with the combination of >>>>> two issues here, >>>>> as kernel 5.15.59 without slab cache merging disabled won't even >>>>> boot in a VM (the >>>>> screen stays blank indefinitely), and it crashes straight away with >>>>> the slab cache >>>>> merging patch. >>>>> >>>>> Since kernel 5.15.57 is running perfectly fine here with randstruct >>>>> enabled, and has >>>>> been for days, I just reverted both the update to 5.15.59 and the >>>>> slab cache patch. >>>>> For the time being, I would leave randstruct enabled, since it does >>>>> not seem to be a >>>>> root cause for whatever bug(s) we are dealing with at the moment. >>>> >>>> Is that from the first build or a consecutive one? >>>> >>>>> @Arne: Were you able to boot 5.15.59 successfully on hardware? If >>>>> so, did it also >>>>> boot properly in a VirtualBox VM? >>>>> >>>>> Apologies for this coming up so unexpected. >>>> >>>> Well, things break. We should however be fast to have at least a >>>> booting kernel in the tree so that we won’t crash any more systems. >>>> >>>> And if that requires to revert both patches until we know for >>>> certain >>>> which one is the bad one, I find that the best option. >>>> >>>> -Michael >>>> >>>>> >>>>> Thanks, and best regards, >>>>> Peter Müller >>>>> >>>>>> Hello, >>>>>> >>>>>> You seem to have a very classic NULL pointer dereference. >>>>>> >>>>>> Something is trying to follow a NULL pointer. And that isn’t >>>>>> possible. >>>>>> >>>>>> Now it is interesting to know why that is. The cap_capable >>>>>> function hasn’t been touched in the 5.15 tree in a while. The same >>>>>> goes for ns_capable. >>>>>> >>>>>> I would therefore suspect that this is some issue from the >>>>>> RANDSTRUCT plugin which seems to be incompatible with ccache. >>>>>> >>>>>> If you have built a kernel with a random seed for the first time, >>>>>> that will be put into the cache. If the next build is unmodified, >>>>>> the kernel with come out of the cache and will be exactly the same >>>>>> as the previous build. >>>>>> >>>>>> If you however modify some parts of the kernel (a minor release >>>>>> for example) you will only compile the changed parts BUT with a >>>>>> different seed for the randstruct plugin. >>>>>> >>>>>> And I suspect that this has happened here where your code is now >>>>>> simply reading the wrong memory. >>>>>> >>>>>> I would recommend reverting the RANDSTRUCT patch and that should >>>>>> allow you to have a proper image again. >>>>>> >>>>>> If you want to keep that, the only option would be to disable the >>>>>> ccache for the kernel. The kernel is however one of the largest >>>>>> packages and ccache works really really well here. We can discuss >>>>>> this if we have identified RADNSTRUCT to be the culprit. >>>>>> >>>>>> -Michael >>>>>> >>>>>>> On 7 Aug 2022, at 19:08, Peter Müller >>>>>>> wrote: >>>>>>> >>>>>>> Hello *, >>>>>>> >>>>>>> enclosed is a screenshot of what booting the installer for Core >>>>>>> Update 170 (dirty) >>>>>>> with kernel 5.15.57 and slab merging disabled looks like. With >>>>>>> kernel 5.15.59, the >>>>>>> VM screen stays blank, so I had to revert this to get some >>>>>>> results. >>>>>>> >>>>>>> Frankly, I don't see why the kernel suddenly does not know >>>>>>> anything about efivarfs >>>>>>> anymore, and what's sunrpc got to do with it. For the latter, >>>>>>> /build/lib/modules/5.15.57-ipfire/kernel/net/sunrpc/auth_gss/rpcsec_gss_krb5.ko.xz >>>>>>> is still there, just as it has been in C169 before. >>>>>>> >>>>>>> Any ideas are appreciated. :-) >>>>>>> >>>>>>> Thanks, and best regards, >>>>>>> Peter Müller >>>>>>> >>>>>>> >>>>>>>> Hello all, especially Arne, >>>>>>>> >>>>>>>> today, I upgraded to "IPFire 2.27 - Core Update 170 Development >>>>>>>> Build: next/06b4164d", >>>>>>>> which primarily comes with Linux 5.15.59 and the slab cache >>>>>>>> merging disabled. On >>>>>>>> my physical testing hardware, the boot process stalled after >>>>>>>> several kernel trace >>>>>>>> message blocks being displayed. >>>>>>>> >>>>>>>> Unfortunately, I was unable to recover them in detail, but they >>>>>>>> occurred fairly >>>>>>>> early, roughly around the mounting of the root file system. >>>>>>>> Since the machine is >>>>>>>> semi-productive (we all test in production, don't we? ;-) ), I >>>>>>>> went back to C169 >>>>>>>> and will now investigate further which change broke the update. >>>>>>>> >>>>>>>> An earlier version of Core Update 170 (commit >>>>>>>> 668cf4c0d0c2dbbc607716956daace413837a8da, >>>>>>>> I believe, but it was definitely after the randstruct changes) >>>>>>>> ran fine for days here, >>>>>>>> so it must be a pretty recent change. Will keep you updated. >>>>>>>> >>>>>>>> Thanks, and best regards, >>>>>>>> Peter Müller >>>>>>> >>>>>>