From mboxrd@z Thu Jan 1 00:00:00 1970 From: Peter =?utf-8?q?M=C3=BCller?= To: development@lists.ipfire.org Subject: Re: Core Update 170 testing report - "next/06b4164d" crashes on my x86_64 testing machine Date: Mon, 08 Aug 2022 10:16:31 +0000 Message-ID: <83b41711-f866-a3b8-e401-f78e2ca01611@ipfire.org> In-Reply-To: <90E36BC4-F882-452B-A078-01EE35FE653B@ipfire.org> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1497922562812697804==" List-Id: --===============1497922562812697804== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Hello Michael, hello Arne, just a quick reply: I think we are dealing with the combination of two issues= here, as kernel 5.15.59 without slab cache merging disabled won't even boot in a VM= (the screen stays blank indefinitely), and it crashes straight away with the slab = cache merging patch. Since kernel 5.15.57 is running perfectly fine here with randstruct enabled, = and has been for days, I just reverted both the update to 5.15.59 and the slab cache = patch. For the time being, I would leave randstruct enabled, since it does not seem = to be a root cause for whatever bug(s) we are dealing with at the moment. @Arne: Were you able to boot 5.15.59 successfully on hardware? If so, did it = also boot properly in a VirtualBox VM? Apologies for this coming up so unexpected. Thanks, and best regards, Peter M=C3=BCller > Hello, >=20 > You seem to have a very classic NULL pointer dereference. >=20 > Something is trying to follow a NULL pointer. And that isn=E2=80=99t possib= le. >=20 > Now it is interesting to know why that is. The cap_capable function hasn=E2= =80=99t been touched in the 5.15 tree in a while. The same goes for ns_capabl= e. >=20 > I would therefore suspect that this is some issue from the RANDSTRUCT plugi= n which seems to be incompatible with ccache. >=20 > If you have built a kernel with a random seed for the first time, that will= be put into the cache. If the next build is unmodified, the kernel with come= out of the cache and will be exactly the same as the previous build. >=20 > If you however modify some parts of the kernel (a minor release for example= ) you will only compile the changed parts BUT with a different seed for the r= andstruct plugin. >=20 > And I suspect that this has happened here where your code is now simply rea= ding the wrong memory. >=20 > I would recommend reverting the RANDSTRUCT patch and that should allow you = to have a proper image again. >=20 > If you want to keep that, the only option would be to disable the ccache fo= r the kernel. The kernel is however one of the largest packages and ccache wo= rks really really well here. We can discuss this if we have identified RADNST= RUCT to be the culprit. >=20 > -Michael >=20 >> On 7 Aug 2022, at 19:08, Peter M=C3=BCller wr= ote: >> >> Hello *, >> >> enclosed is a screenshot of what booting the installer for Core Update 170= (dirty) >> with kernel 5.15.57 and slab merging disabled looks like. With kernel 5.15= .59, the >> VM screen stays blank, so I had to revert this to get some results. >> >> Frankly, I don't see why the kernel suddenly does not know anything about = efivarfs >> anymore, and what's sunrpc got to do with it. For the latter, >> /build/lib/modules/5.15.57-ipfire/kernel/net/sunrpc/auth_gss/rpcsec_gss_kr= b5.ko.xz >> is still there, just as it has been in C169 before. >> >> Any ideas are appreciated. :-) >> >> Thanks, and best regards, >> Peter M=C3=BCller >> >> >>> Hello all, especially Arne, >>> >>> today, I upgraded to "IPFire 2.27 - Core Update 170 Development Build: ne= xt/06b4164d", >>> which primarily comes with Linux 5.15.59 and the slab cache merging disab= led. On >>> my physical testing hardware, the boot process stalled after several kern= el trace >>> message blocks being displayed. >>> >>> Unfortunately, I was unable to recover them in detail, but they occurred = fairly >>> early, roughly around the mounting of the root file system. Since the mac= hine is >>> semi-productive (we all test in production, don't we? ;-) ), I went back = to C169 >>> and will now investigate further which change broke the update. >>> >>> An earlier version of Core Update 170 (commit 668cf4c0d0c2dbbc607716956da= ace413837a8da, >>> I believe, but it was definitely after the randstruct changes) ran fine f= or days here, >>> so it must be a pretty recent change. Will keep you updated. >>> >>> Thanks, and best regards, >>> Peter M=C3=BCller >> >=20 --===============1497922562812697804==--