From: Arne Fitzenreiter <arne_f@ipfire.org>
To: development@lists.ipfire.org
Subject: Re: Core Update 170 testing report - "next/06b4164d" crashes on my x86_64 testing machine
Date: Tue, 09 Aug 2022 08:23:33 +0200 [thread overview]
Message-ID: <0b179153c471b0fe6b9d87486bdef423@ipfire.org> (raw)
In-Reply-To: <21efe16c-bad8-c9a0-dede-10762d269cd0@ipfire.org>
[-- Attachment #1: Type: text/plain, Size: 6476 bytes --]
Am 2022-08-08 17:47, schrieb Peter Müller:
> Hello Arne,
>
> thanks for reporting back.
>
> This means the slab cache patch is the problem.
Im not sure. I fear it could be the RANDSTRUCT because after a version
update of the kernel
it not use the ccache at first build and after a small config change it
could break if parts of
the kernel used from cache and some not.
At the moment i test a clean build without ccache but enabled slub cache
patch. If this work
it is the RANDSTRUCT change.
Arne
>
> Unfortunately, my local C-cache appears to be completely messed up now,
> so I
> will have to start with a clean cache, hence it will probably take me
> until
> tomorrow to have some testing results ready.
>
> Will keep you updated.
>
> Thanks, and best regards,
> Peter Müller
>
>
>> With this
>> https://nightly.ipfire.org/next/2022-08-06%2007:45:02%20+0000-43df4a03/
>> nightly the kernel 5.15.59 boots on real hardware (x86_64 and aarch64)
>> After
>> commit 06b4164dfe269704976b52421edbbbdf3b345679
>> Author: Peter Müller <peter.mueller(a)ipfire.org>
>> Date: Mon Aug 1 17:39:59 2022 +0000
>>
>> linux: Do not allow slab caches to be merged
>>
>>
>> it doesn't boot anymore. (also tested on x86_64 and aarch64)
>>
>> Arne
>>
>>
>> Am 2022-08-08 12:22, schrieb Michael Tremer:
>>> Hello,
>>>
>>>> On 8 Aug 2022, at 11:16, Peter Müller <peter.mueller(a)ipfire.org>
>>>> wrote:
>>>>
>>>> Hello Michael, hello Arne,
>>>>
>>>> just a quick reply: I think we are dealing with the combination of
>>>> two issues here,
>>>> as kernel 5.15.59 without slab cache merging disabled won't even
>>>> boot in a VM (the
>>>> screen stays blank indefinitely), and it crashes straight away with
>>>> the slab cache
>>>> merging patch.
>>>>
>>>> Since kernel 5.15.57 is running perfectly fine here with randstruct
>>>> enabled, and has
>>>> been for days, I just reverted both the update to 5.15.59 and the
>>>> slab cache patch.
>>>> For the time being, I would leave randstruct enabled, since it does
>>>> not seem to be a
>>>> root cause for whatever bug(s) we are dealing with at the moment.
>>>
>>> Is that from the first build or a consecutive one?
>>>
>>>> @Arne: Were you able to boot 5.15.59 successfully on hardware? If
>>>> so, did it also
>>>> boot properly in a VirtualBox VM?
>>>>
>>>> Apologies for this coming up so unexpected.
>>>
>>> Well, things break. We should however be fast to have at least a
>>> booting kernel in the tree so that we won’t crash any more systems.
>>>
>>> And if that requires to revert both patches until we know for certain
>>> which one is the bad one, I find that the best option.
>>>
>>> -Michael
>>>
>>>>
>>>> Thanks, and best regards,
>>>> Peter Müller
>>>>
>>>>> Hello,
>>>>>
>>>>> You seem to have a very classic NULL pointer dereference.
>>>>>
>>>>> Something is trying to follow a NULL pointer. And that isn’t
>>>>> possible.
>>>>>
>>>>> Now it is interesting to know why that is. The cap_capable function
>>>>> hasn’t been touched in the 5.15 tree in a while. The same goes for
>>>>> ns_capable.
>>>>>
>>>>> I would therefore suspect that this is some issue from the
>>>>> RANDSTRUCT plugin which seems to be incompatible with ccache.
>>>>>
>>>>> If you have built a kernel with a random seed for the first time,
>>>>> that will be put into the cache. If the next build is unmodified,
>>>>> the kernel with come out of the cache and will be exactly the same
>>>>> as the previous build.
>>>>>
>>>>> If you however modify some parts of the kernel (a minor release for
>>>>> example) you will only compile the changed parts BUT with a
>>>>> different seed for the randstruct plugin.
>>>>>
>>>>> And I suspect that this has happened here where your code is now
>>>>> simply reading the wrong memory.
>>>>>
>>>>> I would recommend reverting the RANDSTRUCT patch and that should
>>>>> allow you to have a proper image again.
>>>>>
>>>>> If you want to keep that, the only option would be to disable the
>>>>> ccache for the kernel. The kernel is however one of the largest
>>>>> packages and ccache works really really well here. We can discuss
>>>>> this if we have identified RADNSTRUCT to be the culprit.
>>>>>
>>>>> -Michael
>>>>>
>>>>>> On 7 Aug 2022, at 19:08, Peter Müller <peter.mueller(a)ipfire.org>
>>>>>> wrote:
>>>>>>
>>>>>> Hello *,
>>>>>>
>>>>>> enclosed is a screenshot of what booting the installer for Core
>>>>>> Update 170 (dirty)
>>>>>> with kernel 5.15.57 and slab merging disabled looks like. With
>>>>>> kernel 5.15.59, the
>>>>>> VM screen stays blank, so I had to revert this to get some
>>>>>> results.
>>>>>>
>>>>>> Frankly, I don't see why the kernel suddenly does not know
>>>>>> anything about efivarfs
>>>>>> anymore, and what's sunrpc got to do with it. For the latter,
>>>>>> /build/lib/modules/5.15.57-ipfire/kernel/net/sunrpc/auth_gss/rpcsec_gss_krb5.ko.xz
>>>>>> is still there, just as it has been in C169 before.
>>>>>>
>>>>>> Any ideas are appreciated. :-)
>>>>>>
>>>>>> Thanks, and best regards,
>>>>>> Peter Müller
>>>>>>
>>>>>>
>>>>>>> Hello all, especially Arne,
>>>>>>>
>>>>>>> today, I upgraded to "IPFire 2.27 - Core Update 170 Development
>>>>>>> Build: next/06b4164d",
>>>>>>> which primarily comes with Linux 5.15.59 and the slab cache
>>>>>>> merging disabled. On
>>>>>>> my physical testing hardware, the boot process stalled after
>>>>>>> several kernel trace
>>>>>>> message blocks being displayed.
>>>>>>>
>>>>>>> Unfortunately, I was unable to recover them in detail, but they
>>>>>>> occurred fairly
>>>>>>> early, roughly around the mounting of the root file system. Since
>>>>>>> the machine is
>>>>>>> semi-productive (we all test in production, don't we? ;-) ), I
>>>>>>> went back to C169
>>>>>>> and will now investigate further which change broke the update.
>>>>>>>
>>>>>>> An earlier version of Core Update 170 (commit
>>>>>>> 668cf4c0d0c2dbbc607716956daace413837a8da,
>>>>>>> I believe, but it was definitely after the randstruct changes)
>>>>>>> ran fine for days here,
>>>>>>> so it must be a pretty recent change. Will keep you updated.
>>>>>>>
>>>>>>> Thanks, and best regards,
>>>>>>> Peter Müller
>>>>>> <screenshot_c170_dirty_crash_on_boot_sunrpc_efivarfs.png>
>>>>>
next prev parent reply other threads:[~2022-08-09 6:23 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <d4d5fe5f-08c5-df44-4ba6-0a77f16bf890@ipfire.org>
2022-08-08 9:50 ` Michael Tremer
2022-08-08 10:16 ` Peter Müller
2022-08-08 10:22 ` Michael Tremer
2022-08-08 14:15 ` Arne Fitzenreiter
2022-08-08 15:47 ` Peter Müller
2022-08-09 6:23 ` Arne Fitzenreiter [this message]
2022-08-09 8:27 ` Arne Fitzenreiter
2022-08-09 9:28 ` Peter Müller
2022-08-09 9:31 ` Michael Tremer
2022-08-09 10:26 ` Peter Müller
2022-08-09 10:37 ` Michael Tremer
2022-08-07 12:14 Peter Müller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0b179153c471b0fe6b9d87486bdef423@ipfire.org \
--to=arne_f@ipfire.org \
--cc=development@lists.ipfire.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox