Hello,
You seem to have a very classic NULL pointer dereference.
Something is trying to follow a NULL pointer. And that isn’t possible.
Now it is interesting to know why that is. The cap_capable function hasn’t been touched in the 5.15 tree in a while. The same goes for ns_capable.
I would therefore suspect that this is some issue from the RANDSTRUCT plugin which seems to be incompatible with ccache.
If you have built a kernel with a random seed for the first time, that will be put into the cache. If the next build is unmodified, the kernel with come out of the cache and will be exactly the same as the previous build.
If you however modify some parts of the kernel (a minor release for example) you will only compile the changed parts BUT with a different seed for the randstruct plugin.
And I suspect that this has happened here where your code is now simply reading the wrong memory.
I would recommend reverting the RANDSTRUCT patch and that should allow you to have a proper image again.
If you want to keep that, the only option would be to disable the ccache for the kernel. The kernel is however one of the largest packages and ccache works really really well here. We can discuss this if we have identified RADNSTRUCT to be the culprit.
-Michael
On 7 Aug 2022, at 19:08, Peter Müller peter.mueller@ipfire.org wrote:
Hello *,
enclosed is a screenshot of what booting the installer for Core Update 170 (dirty) with kernel 5.15.57 and slab merging disabled looks like. With kernel 5.15.59, the VM screen stays blank, so I had to revert this to get some results.
Frankly, I don't see why the kernel suddenly does not know anything about efivarfs anymore, and what's sunrpc got to do with it. For the latter, /build/lib/modules/5.15.57-ipfire/kernel/net/sunrpc/auth_gss/rpcsec_gss_krb5.ko.xz is still there, just as it has been in C169 before.
Any ideas are appreciated. :-)
Thanks, and best regards, Peter Müller
Hello all, especially Arne,
today, I upgraded to "IPFire 2.27 - Core Update 170 Development Build: next/06b4164d", which primarily comes with Linux 5.15.59 and the slab cache merging disabled. On my physical testing hardware, the boot process stalled after several kernel trace message blocks being displayed.
Unfortunately, I was unable to recover them in detail, but they occurred fairly early, roughly around the mounting of the root file system. Since the machine is semi-productive (we all test in production, don't we? ;-) ), I went back to C169 and will now investigate further which change broke the update.
An earlier version of Core Update 170 (commit 668cf4c0d0c2dbbc607716956daace413837a8da, I believe, but it was definitely after the randstruct changes) ran fine for days here, so it must be a pretty recent change. Will keep you updated.
Thanks, and best regards, Peter Müller
<screenshot_c170_dirty_crash_on_boot_sunrpc_efivarfs.png>