From: Arne Fitzenreiter <arne_f@ipfire.org>
To: development@lists.ipfire.org
Subject: Re: Core Update 170 testing report - "next/06b4164d" crashes on my x86_64 testing machine
Date: Mon, 08 Aug 2022 16:15:45 +0200 [thread overview]
Message-ID: <7300c922548070c647e561cbbf7817f2@ipfire.org> (raw)
In-Reply-To: <DF06E427-CFBD-42E4-8F91-731D78CA3D21@ipfire.org>
[-- Attachment #1: Type: text/plain, Size: 5320 bytes --]
With this
https://nightly.ipfire.org/next/2022-08-06%2007:45:02%20+0000-43df4a03/
nightly the kernel 5.15.59 boots on real hardware (x86_64 and aarch64)
After
commit 06b4164dfe269704976b52421edbbbdf3b345679
Author: Peter Müller <peter.mueller(a)ipfire.org>
Date: Mon Aug 1 17:39:59 2022 +0000
linux: Do not allow slab caches to be merged
it doesn't boot anymore. (also tested on x86_64 and aarch64)
Arne
Am 2022-08-08 12:22, schrieb Michael Tremer:
> Hello,
>
>> On 8 Aug 2022, at 11:16, Peter Müller <peter.mueller(a)ipfire.org>
>> wrote:
>>
>> Hello Michael, hello Arne,
>>
>> just a quick reply: I think we are dealing with the combination of two
>> issues here,
>> as kernel 5.15.59 without slab cache merging disabled won't even boot
>> in a VM (the
>> screen stays blank indefinitely), and it crashes straight away with
>> the slab cache
>> merging patch.
>>
>> Since kernel 5.15.57 is running perfectly fine here with randstruct
>> enabled, and has
>> been for days, I just reverted both the update to 5.15.59 and the slab
>> cache patch.
>> For the time being, I would leave randstruct enabled, since it does
>> not seem to be a
>> root cause for whatever bug(s) we are dealing with at the moment.
>
> Is that from the first build or a consecutive one?
>
>> @Arne: Were you able to boot 5.15.59 successfully on hardware? If so,
>> did it also
>> boot properly in a VirtualBox VM?
>>
>> Apologies for this coming up so unexpected.
>
> Well, things break. We should however be fast to have at least a
> booting kernel in the tree so that we won’t crash any more systems.
>
> And if that requires to revert both patches until we know for certain
> which one is the bad one, I find that the best option.
>
> -Michael
>
>>
>> Thanks, and best regards,
>> Peter Müller
>>
>>> Hello,
>>>
>>> You seem to have a very classic NULL pointer dereference.
>>>
>>> Something is trying to follow a NULL pointer. And that isn’t
>>> possible.
>>>
>>> Now it is interesting to know why that is. The cap_capable function
>>> hasn’t been touched in the 5.15 tree in a while. The same goes for
>>> ns_capable.
>>>
>>> I would therefore suspect that this is some issue from the RANDSTRUCT
>>> plugin which seems to be incompatible with ccache.
>>>
>>> If you have built a kernel with a random seed for the first time,
>>> that will be put into the cache. If the next build is unmodified, the
>>> kernel with come out of the cache and will be exactly the same as the
>>> previous build.
>>>
>>> If you however modify some parts of the kernel (a minor release for
>>> example) you will only compile the changed parts BUT with a different
>>> seed for the randstruct plugin.
>>>
>>> And I suspect that this has happened here where your code is now
>>> simply reading the wrong memory.
>>>
>>> I would recommend reverting the RANDSTRUCT patch and that should
>>> allow you to have a proper image again.
>>>
>>> If you want to keep that, the only option would be to disable the
>>> ccache for the kernel. The kernel is however one of the largest
>>> packages and ccache works really really well here. We can discuss
>>> this if we have identified RADNSTRUCT to be the culprit.
>>>
>>> -Michael
>>>
>>>> On 7 Aug 2022, at 19:08, Peter Müller <peter.mueller(a)ipfire.org>
>>>> wrote:
>>>>
>>>> Hello *,
>>>>
>>>> enclosed is a screenshot of what booting the installer for Core
>>>> Update 170 (dirty)
>>>> with kernel 5.15.57 and slab merging disabled looks like. With
>>>> kernel 5.15.59, the
>>>> VM screen stays blank, so I had to revert this to get some results.
>>>>
>>>> Frankly, I don't see why the kernel suddenly does not know anything
>>>> about efivarfs
>>>> anymore, and what's sunrpc got to do with it. For the latter,
>>>> /build/lib/modules/5.15.57-ipfire/kernel/net/sunrpc/auth_gss/rpcsec_gss_krb5.ko.xz
>>>> is still there, just as it has been in C169 before.
>>>>
>>>> Any ideas are appreciated. :-)
>>>>
>>>> Thanks, and best regards,
>>>> Peter Müller
>>>>
>>>>
>>>>> Hello all, especially Arne,
>>>>>
>>>>> today, I upgraded to "IPFire 2.27 - Core Update 170 Development
>>>>> Build: next/06b4164d",
>>>>> which primarily comes with Linux 5.15.59 and the slab cache merging
>>>>> disabled. On
>>>>> my physical testing hardware, the boot process stalled after
>>>>> several kernel trace
>>>>> message blocks being displayed.
>>>>>
>>>>> Unfortunately, I was unable to recover them in detail, but they
>>>>> occurred fairly
>>>>> early, roughly around the mounting of the root file system. Since
>>>>> the machine is
>>>>> semi-productive (we all test in production, don't we? ;-) ), I went
>>>>> back to C169
>>>>> and will now investigate further which change broke the update.
>>>>>
>>>>> An earlier version of Core Update 170 (commit
>>>>> 668cf4c0d0c2dbbc607716956daace413837a8da,
>>>>> I believe, but it was definitely after the randstruct changes) ran
>>>>> fine for days here,
>>>>> so it must be a pretty recent change. Will keep you updated.
>>>>>
>>>>> Thanks, and best regards,
>>>>> Peter Müller
>>>> <screenshot_c170_dirty_crash_on_boot_sunrpc_efivarfs.png>
>>>
next prev parent reply other threads:[~2022-08-08 14:15 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <d4d5fe5f-08c5-df44-4ba6-0a77f16bf890@ipfire.org>
2022-08-08 9:50 ` Michael Tremer
2022-08-08 10:16 ` Peter Müller
2022-08-08 10:22 ` Michael Tremer
2022-08-08 14:15 ` Arne Fitzenreiter [this message]
2022-08-08 15:47 ` Peter Müller
2022-08-09 6:23 ` Arne Fitzenreiter
2022-08-09 8:27 ` Arne Fitzenreiter
2022-08-09 9:28 ` Peter Müller
2022-08-09 9:31 ` Michael Tremer
2022-08-09 10:26 ` Peter Müller
2022-08-09 10:37 ` Michael Tremer
2022-08-07 12:14 Peter Müller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7300c922548070c647e561cbbf7817f2@ipfire.org \
--to=arne_f@ipfire.org \
--cc=development@lists.ipfire.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox