public inbox for development@lists.ipfire.org
 help / color / mirror / Atom feed
From: Arne Fitzenreiter <arne_f@ipfire.org>
To: development@lists.ipfire.org
Subject: Re: Core Update 170 testing report - "next/06b4164d" crashes on my x86_64 testing machine
Date: Tue, 09 Aug 2022 10:27:45 +0200	[thread overview]
Message-ID: <93be3c121d5cd2287091924c3d92884a@ipfire.org> (raw)
In-Reply-To: <0b179153c471b0fe6b9d87486bdef423@ipfire.org>

[-- Attachment #1: Type: text/plain, Size: 6852 bytes --]

A fresh build with empty ccache boots also with the slab cache patch
so RANDRTRUCT should be the real problem.

Arne

Am 2022-08-09 08:23, schrieb Arne Fitzenreiter:
> Am 2022-08-08 17:47, schrieb Peter Müller:
>> Hello Arne,
>> 
>> thanks for reporting back.
>> 
>> This means the slab cache patch is the problem.
> 
> Im not sure. I fear it could be the RANDSTRUCT because after a version
> update of the kernel
> it not use the ccache at first build and after a small config change
> it could break if parts of
> the kernel used from cache and some not.
> 
> At the moment i test a clean build without ccache but enabled slub
> cache patch. If this work
> it is the RANDSTRUCT change.
> 
> Arne
> 
>> 
>> Unfortunately, my local C-cache appears to be completely messed up 
>> now, so I
>> will have to start with a clean cache, hence it will probably take me 
>> until
>> tomorrow to have some testing results ready.
>> 
>> Will keep you updated.
>> 
>> Thanks, and best regards,
>> Peter Müller
>> 
>> 
>>> With this 
>>> https://nightly.ipfire.org/next/2022-08-06%2007:45:02%20+0000-43df4a03/
>>> nightly the kernel 5.15.59 boots on real hardware (x86_64 and 
>>> aarch64)
>>> After
>>> commit 06b4164dfe269704976b52421edbbbdf3b345679
>>> Author: Peter Müller <peter.mueller(a)ipfire.org>
>>> Date:   Mon Aug 1 17:39:59 2022 +0000
>>> 
>>>     linux: Do not allow slab caches to be merged
>>> 
>>> 
>>> it doesn't boot anymore. (also tested on x86_64 and aarch64)
>>> 
>>> Arne
>>> 
>>> 
>>> Am 2022-08-08 12:22, schrieb Michael Tremer:
>>>> Hello,
>>>> 
>>>>> On 8 Aug 2022, at 11:16, Peter Müller <peter.mueller(a)ipfire.org> 
>>>>> wrote:
>>>>> 
>>>>> Hello Michael, hello Arne,
>>>>> 
>>>>> just a quick reply: I think we are dealing with the combination of 
>>>>> two issues here,
>>>>> as kernel 5.15.59 without slab cache merging disabled won't even 
>>>>> boot in a VM (the
>>>>> screen stays blank indefinitely), and it crashes straight away with 
>>>>> the slab cache
>>>>> merging patch.
>>>>> 
>>>>> Since kernel 5.15.57 is running perfectly fine here with randstruct 
>>>>> enabled, and has
>>>>> been for days, I just reverted both the update to 5.15.59 and the 
>>>>> slab cache patch.
>>>>> For the time being, I would leave randstruct enabled, since it does 
>>>>> not seem to be a
>>>>> root cause for whatever bug(s) we are dealing with at the moment.
>>>> 
>>>> Is that from the first build or a consecutive one?
>>>> 
>>>>> @Arne: Were you able to boot 5.15.59 successfully on hardware? If 
>>>>> so, did it also
>>>>> boot properly in a VirtualBox VM?
>>>>> 
>>>>> Apologies for this coming up so unexpected.
>>>> 
>>>> Well, things break. We should however be fast to have at least a
>>>> booting kernel in the tree so that we won’t crash any more systems.
>>>> 
>>>> And if that requires to revert both patches until we know for 
>>>> certain
>>>> which one is the bad one, I find that the best option.
>>>> 
>>>> -Michael
>>>> 
>>>>> 
>>>>> Thanks, and best regards,
>>>>> Peter Müller
>>>>> 
>>>>>> Hello,
>>>>>> 
>>>>>> You seem to have a very classic NULL pointer dereference.
>>>>>> 
>>>>>> Something is trying to follow a NULL pointer. And that isn’t 
>>>>>> possible.
>>>>>> 
>>>>>> Now it is interesting to know why that is. The cap_capable 
>>>>>> function hasn’t been touched in the 5.15 tree in a while. The same 
>>>>>> goes for ns_capable.
>>>>>> 
>>>>>> I would therefore suspect that this is some issue from the 
>>>>>> RANDSTRUCT plugin which seems to be incompatible with ccache.
>>>>>> 
>>>>>> If you have built a kernel with a random seed for the first time, 
>>>>>> that will be put into the cache. If the next build is unmodified, 
>>>>>> the kernel with come out of the cache and will be exactly the same 
>>>>>> as the previous build.
>>>>>> 
>>>>>> If you however modify some parts of the kernel (a minor release 
>>>>>> for example) you will only compile the changed parts BUT with a 
>>>>>> different seed for the randstruct plugin.
>>>>>> 
>>>>>> And I suspect that this has happened here where your code is now 
>>>>>> simply reading the wrong memory.
>>>>>> 
>>>>>> I would recommend reverting the RANDSTRUCT patch and that should 
>>>>>> allow you to have a proper image again.
>>>>>> 
>>>>>> If you want to keep that, the only option would be to disable the 
>>>>>> ccache for the kernel. The kernel is however one of the largest 
>>>>>> packages and ccache works really really well here. We can discuss 
>>>>>> this if we have identified RADNSTRUCT to be the culprit.
>>>>>> 
>>>>>> -Michael
>>>>>> 
>>>>>>> On 7 Aug 2022, at 19:08, Peter Müller <peter.mueller(a)ipfire.org> 
>>>>>>> wrote:
>>>>>>> 
>>>>>>> Hello *,
>>>>>>> 
>>>>>>> enclosed is a screenshot of what booting the installer for Core 
>>>>>>> Update 170 (dirty)
>>>>>>> with kernel 5.15.57 and slab merging disabled looks like. With 
>>>>>>> kernel 5.15.59, the
>>>>>>> VM screen stays blank, so I had to revert this to get some 
>>>>>>> results.
>>>>>>> 
>>>>>>> Frankly, I don't see why the kernel suddenly does not know 
>>>>>>> anything about efivarfs
>>>>>>> anymore, and what's sunrpc got to do with it. For the latter,
>>>>>>> /build/lib/modules/5.15.57-ipfire/kernel/net/sunrpc/auth_gss/rpcsec_gss_krb5.ko.xz
>>>>>>> is still there, just as it has been in C169 before.
>>>>>>> 
>>>>>>> Any ideas are appreciated. :-)
>>>>>>> 
>>>>>>> Thanks, and best regards,
>>>>>>> Peter Müller
>>>>>>> 
>>>>>>> 
>>>>>>>> Hello all, especially Arne,
>>>>>>>> 
>>>>>>>> today, I upgraded to "IPFire 2.27 - Core Update 170 Development 
>>>>>>>> Build: next/06b4164d",
>>>>>>>> which primarily comes with Linux 5.15.59 and the slab cache 
>>>>>>>> merging disabled. On
>>>>>>>> my physical testing hardware, the boot process stalled after 
>>>>>>>> several kernel trace
>>>>>>>> message blocks being displayed.
>>>>>>>> 
>>>>>>>> Unfortunately, I was unable to recover them in detail, but they 
>>>>>>>> occurred fairly
>>>>>>>> early, roughly around the mounting of the root file system. 
>>>>>>>> Since the machine is
>>>>>>>> semi-productive (we all test in production, don't we? ;-) ), I 
>>>>>>>> went back to C169
>>>>>>>> and will now investigate further which change broke the update.
>>>>>>>> 
>>>>>>>> An earlier version of Core Update 170 (commit 
>>>>>>>> 668cf4c0d0c2dbbc607716956daace413837a8da,
>>>>>>>> I believe, but it was definitely after the randstruct changes) 
>>>>>>>> ran fine for days here,
>>>>>>>> so it must be a pretty recent change. Will keep you updated.
>>>>>>>> 
>>>>>>>> Thanks, and best regards,
>>>>>>>> Peter Müller
>>>>>>> <screenshot_c170_dirty_crash_on_boot_sunrpc_efivarfs.png>
>>>>>> 

  reply	other threads:[~2022-08-09  8:27 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <d4d5fe5f-08c5-df44-4ba6-0a77f16bf890@ipfire.org>
2022-08-08  9:50 ` Michael Tremer
2022-08-08 10:16   ` Peter Müller
2022-08-08 10:22     ` Michael Tremer
2022-08-08 14:15       ` Arne Fitzenreiter
2022-08-08 15:47         ` Peter Müller
2022-08-09  6:23           ` Arne Fitzenreiter
2022-08-09  8:27             ` Arne Fitzenreiter [this message]
2022-08-09  9:28               ` Peter Müller
2022-08-09  9:31                 ` Michael Tremer
2022-08-09 10:26                   ` Peter Müller
2022-08-09 10:37                     ` Michael Tremer
2022-08-07 12:14 Peter Müller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=93be3c121d5cd2287091924c3d92884a@ipfire.org \
    --to=arne_f@ipfire.org \
    --cc=development@lists.ipfire.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox