public inbox for development@lists.ipfire.org
 help / color / mirror / Atom feed
From: Arne Fitzenreiter <arne_f@ipfire.org>
To: development@lists.ipfire.org
Subject: Re: Core Update 170 testing report - "next/06b4164d" crashes on my x86_64 testing machine
Date: Mon, 08 Aug 2022 16:15:45 +0200	[thread overview]
Message-ID: <7300c922548070c647e561cbbf7817f2@ipfire.org> (raw)
In-Reply-To: <DF06E427-CFBD-42E4-8F91-731D78CA3D21@ipfire.org>

[-- Attachment #1: Type: text/plain, Size: 5320 bytes --]

With this 
https://nightly.ipfire.org/next/2022-08-06%2007:45:02%20+0000-43df4a03/
nightly the kernel 5.15.59 boots on real hardware (x86_64 and aarch64)
After
commit 06b4164dfe269704976b52421edbbbdf3b345679
Author: Peter Müller <peter.mueller(a)ipfire.org>
Date:   Mon Aug 1 17:39:59 2022 +0000

     linux: Do not allow slab caches to be merged


it doesn't boot anymore. (also tested on x86_64 and aarch64)

Arne


Am 2022-08-08 12:22, schrieb Michael Tremer:
> Hello,
> 
>> On 8 Aug 2022, at 11:16, Peter Müller <peter.mueller(a)ipfire.org> 
>> wrote:
>> 
>> Hello Michael, hello Arne,
>> 
>> just a quick reply: I think we are dealing with the combination of two 
>> issues here,
>> as kernel 5.15.59 without slab cache merging disabled won't even boot 
>> in a VM (the
>> screen stays blank indefinitely), and it crashes straight away with 
>> the slab cache
>> merging patch.
>> 
>> Since kernel 5.15.57 is running perfectly fine here with randstruct 
>> enabled, and has
>> been for days, I just reverted both the update to 5.15.59 and the slab 
>> cache patch.
>> For the time being, I would leave randstruct enabled, since it does 
>> not seem to be a
>> root cause for whatever bug(s) we are dealing with at the moment.
> 
> Is that from the first build or a consecutive one?
> 
>> @Arne: Were you able to boot 5.15.59 successfully on hardware? If so, 
>> did it also
>> boot properly in a VirtualBox VM?
>> 
>> Apologies for this coming up so unexpected.
> 
> Well, things break. We should however be fast to have at least a
> booting kernel in the tree so that we won’t crash any more systems.
> 
> And if that requires to revert both patches until we know for certain
> which one is the bad one, I find that the best option.
> 
> -Michael
> 
>> 
>> Thanks, and best regards,
>> Peter Müller
>> 
>>> Hello,
>>> 
>>> You seem to have a very classic NULL pointer dereference.
>>> 
>>> Something is trying to follow a NULL pointer. And that isn’t 
>>> possible.
>>> 
>>> Now it is interesting to know why that is. The cap_capable function 
>>> hasn’t been touched in the 5.15 tree in a while. The same goes for 
>>> ns_capable.
>>> 
>>> I would therefore suspect that this is some issue from the RANDSTRUCT 
>>> plugin which seems to be incompatible with ccache.
>>> 
>>> If you have built a kernel with a random seed for the first time, 
>>> that will be put into the cache. If the next build is unmodified, the 
>>> kernel with come out of the cache and will be exactly the same as the 
>>> previous build.
>>> 
>>> If you however modify some parts of the kernel (a minor release for 
>>> example) you will only compile the changed parts BUT with a different 
>>> seed for the randstruct plugin.
>>> 
>>> And I suspect that this has happened here where your code is now 
>>> simply reading the wrong memory.
>>> 
>>> I would recommend reverting the RANDSTRUCT patch and that should 
>>> allow you to have a proper image again.
>>> 
>>> If you want to keep that, the only option would be to disable the 
>>> ccache for the kernel. The kernel is however one of the largest 
>>> packages and ccache works really really well here. We can discuss 
>>> this if we have identified RADNSTRUCT to be the culprit.
>>> 
>>> -Michael
>>> 
>>>> On 7 Aug 2022, at 19:08, Peter Müller <peter.mueller(a)ipfire.org> 
>>>> wrote:
>>>> 
>>>> Hello *,
>>>> 
>>>> enclosed is a screenshot of what booting the installer for Core 
>>>> Update 170 (dirty)
>>>> with kernel 5.15.57 and slab merging disabled looks like. With 
>>>> kernel 5.15.59, the
>>>> VM screen stays blank, so I had to revert this to get some results.
>>>> 
>>>> Frankly, I don't see why the kernel suddenly does not know anything 
>>>> about efivarfs
>>>> anymore, and what's sunrpc got to do with it. For the latter,
>>>> /build/lib/modules/5.15.57-ipfire/kernel/net/sunrpc/auth_gss/rpcsec_gss_krb5.ko.xz
>>>> is still there, just as it has been in C169 before.
>>>> 
>>>> Any ideas are appreciated. :-)
>>>> 
>>>> Thanks, and best regards,
>>>> Peter Müller
>>>> 
>>>> 
>>>>> Hello all, especially Arne,
>>>>> 
>>>>> today, I upgraded to "IPFire 2.27 - Core Update 170 Development 
>>>>> Build: next/06b4164d",
>>>>> which primarily comes with Linux 5.15.59 and the slab cache merging 
>>>>> disabled. On
>>>>> my physical testing hardware, the boot process stalled after 
>>>>> several kernel trace
>>>>> message blocks being displayed.
>>>>> 
>>>>> Unfortunately, I was unable to recover them in detail, but they 
>>>>> occurred fairly
>>>>> early, roughly around the mounting of the root file system. Since 
>>>>> the machine is
>>>>> semi-productive (we all test in production, don't we? ;-) ), I went 
>>>>> back to C169
>>>>> and will now investigate further which change broke the update.
>>>>> 
>>>>> An earlier version of Core Update 170 (commit 
>>>>> 668cf4c0d0c2dbbc607716956daace413837a8da,
>>>>> I believe, but it was definitely after the randstruct changes) ran 
>>>>> fine for days here,
>>>>> so it must be a pretty recent change. Will keep you updated.
>>>>> 
>>>>> Thanks, and best regards,
>>>>> Peter Müller
>>>> <screenshot_c170_dirty_crash_on_boot_sunrpc_efivarfs.png>
>>> 

  reply	other threads:[~2022-08-08 14:15 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <d4d5fe5f-08c5-df44-4ba6-0a77f16bf890@ipfire.org>
2022-08-08  9:50 ` Michael Tremer
2022-08-08 10:16   ` Peter Müller
2022-08-08 10:22     ` Michael Tremer
2022-08-08 14:15       ` Arne Fitzenreiter [this message]
2022-08-08 15:47         ` Peter Müller
2022-08-09  6:23           ` Arne Fitzenreiter
2022-08-09  8:27             ` Arne Fitzenreiter
2022-08-09  9:28               ` Peter Müller
2022-08-09  9:31                 ` Michael Tremer
2022-08-09 10:26                   ` Peter Müller
2022-08-09 10:37                     ` Michael Tremer
2022-08-07 12:14 Peter Müller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7300c922548070c647e561cbbf7817f2@ipfire.org \
    --to=arne_f@ipfire.org \
    --cc=development@lists.ipfire.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox