* Re: Core Update 170 testing report - "next/06b4164d" crashes on my x86_64 testing machine
[not found] <d4d5fe5f-08c5-df44-4ba6-0a77f16bf890@ipfire.org>
@ 2022-08-08 9:50 ` Michael Tremer
2022-08-08 10:16 ` Peter Müller
0 siblings, 1 reply; 12+ messages in thread
From: Michael Tremer @ 2022-08-08 9:50 UTC (permalink / raw)
To: development
[-- Attachment #1: Type: text/plain, Size: 3062 bytes --]
Hello,
You seem to have a very classic NULL pointer dereference.
Something is trying to follow a NULL pointer. And that isn’t possible.
Now it is interesting to know why that is. The cap_capable function hasn’t been touched in the 5.15 tree in a while. The same goes for ns_capable.
I would therefore suspect that this is some issue from the RANDSTRUCT plugin which seems to be incompatible with ccache.
If you have built a kernel with a random seed for the first time, that will be put into the cache. If the next build is unmodified, the kernel with come out of the cache and will be exactly the same as the previous build.
If you however modify some parts of the kernel (a minor release for example) you will only compile the changed parts BUT with a different seed for the randstruct plugin.
And I suspect that this has happened here where your code is now simply reading the wrong memory.
I would recommend reverting the RANDSTRUCT patch and that should allow you to have a proper image again.
If you want to keep that, the only option would be to disable the ccache for the kernel. The kernel is however one of the largest packages and ccache works really really well here. We can discuss this if we have identified RADNSTRUCT to be the culprit.
-Michael
> On 7 Aug 2022, at 19:08, Peter Müller <peter.mueller(a)ipfire.org> wrote:
>
> Hello *,
>
> enclosed is a screenshot of what booting the installer for Core Update 170 (dirty)
> with kernel 5.15.57 and slab merging disabled looks like. With kernel 5.15.59, the
> VM screen stays blank, so I had to revert this to get some results.
>
> Frankly, I don't see why the kernel suddenly does not know anything about efivarfs
> anymore, and what's sunrpc got to do with it. For the latter,
> /build/lib/modules/5.15.57-ipfire/kernel/net/sunrpc/auth_gss/rpcsec_gss_krb5.ko.xz
> is still there, just as it has been in C169 before.
>
> Any ideas are appreciated. :-)
>
> Thanks, and best regards,
> Peter Müller
>
>
>> Hello all, especially Arne,
>>
>> today, I upgraded to "IPFire 2.27 - Core Update 170 Development Build: next/06b4164d",
>> which primarily comes with Linux 5.15.59 and the slab cache merging disabled. On
>> my physical testing hardware, the boot process stalled after several kernel trace
>> message blocks being displayed.
>>
>> Unfortunately, I was unable to recover them in detail, but they occurred fairly
>> early, roughly around the mounting of the root file system. Since the machine is
>> semi-productive (we all test in production, don't we? ;-) ), I went back to C169
>> and will now investigate further which change broke the update.
>>
>> An earlier version of Core Update 170 (commit 668cf4c0d0c2dbbc607716956daace413837a8da,
>> I believe, but it was definitely after the randstruct changes) ran fine for days here,
>> so it must be a pretty recent change. Will keep you updated.
>>
>> Thanks, and best regards,
>> Peter Müller
> <screenshot_c170_dirty_crash_on_boot_sunrpc_efivarfs.png>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Core Update 170 testing report - "next/06b4164d" crashes on my x86_64 testing machine
2022-08-08 9:50 ` Core Update 170 testing report - "next/06b4164d" crashes on my x86_64 testing machine Michael Tremer
@ 2022-08-08 10:16 ` Peter Müller
2022-08-08 10:22 ` Michael Tremer
0 siblings, 1 reply; 12+ messages in thread
From: Peter Müller @ 2022-08-08 10:16 UTC (permalink / raw)
To: development
[-- Attachment #1: Type: text/plain, Size: 3973 bytes --]
Hello Michael, hello Arne,
just a quick reply: I think we are dealing with the combination of two issues here,
as kernel 5.15.59 without slab cache merging disabled won't even boot in a VM (the
screen stays blank indefinitely), and it crashes straight away with the slab cache
merging patch.
Since kernel 5.15.57 is running perfectly fine here with randstruct enabled, and has
been for days, I just reverted both the update to 5.15.59 and the slab cache patch.
For the time being, I would leave randstruct enabled, since it does not seem to be a
root cause for whatever bug(s) we are dealing with at the moment.
@Arne: Were you able to boot 5.15.59 successfully on hardware? If so, did it also
boot properly in a VirtualBox VM?
Apologies for this coming up so unexpected.
Thanks, and best regards,
Peter Müller
> Hello,
>
> You seem to have a very classic NULL pointer dereference.
>
> Something is trying to follow a NULL pointer. And that isn’t possible.
>
> Now it is interesting to know why that is. The cap_capable function hasn’t been touched in the 5.15 tree in a while. The same goes for ns_capable.
>
> I would therefore suspect that this is some issue from the RANDSTRUCT plugin which seems to be incompatible with ccache.
>
> If you have built a kernel with a random seed for the first time, that will be put into the cache. If the next build is unmodified, the kernel with come out of the cache and will be exactly the same as the previous build.
>
> If you however modify some parts of the kernel (a minor release for example) you will only compile the changed parts BUT with a different seed for the randstruct plugin.
>
> And I suspect that this has happened here where your code is now simply reading the wrong memory.
>
> I would recommend reverting the RANDSTRUCT patch and that should allow you to have a proper image again.
>
> If you want to keep that, the only option would be to disable the ccache for the kernel. The kernel is however one of the largest packages and ccache works really really well here. We can discuss this if we have identified RADNSTRUCT to be the culprit.
>
> -Michael
>
>> On 7 Aug 2022, at 19:08, Peter Müller <peter.mueller(a)ipfire.org> wrote:
>>
>> Hello *,
>>
>> enclosed is a screenshot of what booting the installer for Core Update 170 (dirty)
>> with kernel 5.15.57 and slab merging disabled looks like. With kernel 5.15.59, the
>> VM screen stays blank, so I had to revert this to get some results.
>>
>> Frankly, I don't see why the kernel suddenly does not know anything about efivarfs
>> anymore, and what's sunrpc got to do with it. For the latter,
>> /build/lib/modules/5.15.57-ipfire/kernel/net/sunrpc/auth_gss/rpcsec_gss_krb5.ko.xz
>> is still there, just as it has been in C169 before.
>>
>> Any ideas are appreciated. :-)
>>
>> Thanks, and best regards,
>> Peter Müller
>>
>>
>>> Hello all, especially Arne,
>>>
>>> today, I upgraded to "IPFire 2.27 - Core Update 170 Development Build: next/06b4164d",
>>> which primarily comes with Linux 5.15.59 and the slab cache merging disabled. On
>>> my physical testing hardware, the boot process stalled after several kernel trace
>>> message blocks being displayed.
>>>
>>> Unfortunately, I was unable to recover them in detail, but they occurred fairly
>>> early, roughly around the mounting of the root file system. Since the machine is
>>> semi-productive (we all test in production, don't we? ;-) ), I went back to C169
>>> and will now investigate further which change broke the update.
>>>
>>> An earlier version of Core Update 170 (commit 668cf4c0d0c2dbbc607716956daace413837a8da,
>>> I believe, but it was definitely after the randstruct changes) ran fine for days here,
>>> so it must be a pretty recent change. Will keep you updated.
>>>
>>> Thanks, and best regards,
>>> Peter Müller
>> <screenshot_c170_dirty_crash_on_boot_sunrpc_efivarfs.png>
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Core Update 170 testing report - "next/06b4164d" crashes on my x86_64 testing machine
2022-08-08 10:16 ` Peter Müller
@ 2022-08-08 10:22 ` Michael Tremer
2022-08-08 14:15 ` Arne Fitzenreiter
0 siblings, 1 reply; 12+ messages in thread
From: Michael Tremer @ 2022-08-08 10:22 UTC (permalink / raw)
To: development
[-- Attachment #1: Type: text/plain, Size: 4511 bytes --]
Hello,
> On 8 Aug 2022, at 11:16, Peter Müller <peter.mueller(a)ipfire.org> wrote:
>
> Hello Michael, hello Arne,
>
> just a quick reply: I think we are dealing with the combination of two issues here,
> as kernel 5.15.59 without slab cache merging disabled won't even boot in a VM (the
> screen stays blank indefinitely), and it crashes straight away with the slab cache
> merging patch.
>
> Since kernel 5.15.57 is running perfectly fine here with randstruct enabled, and has
> been for days, I just reverted both the update to 5.15.59 and the slab cache patch.
> For the time being, I would leave randstruct enabled, since it does not seem to be a
> root cause for whatever bug(s) we are dealing with at the moment.
Is that from the first build or a consecutive one?
> @Arne: Were you able to boot 5.15.59 successfully on hardware? If so, did it also
> boot properly in a VirtualBox VM?
>
> Apologies for this coming up so unexpected.
Well, things break. We should however be fast to have at least a booting kernel in the tree so that we won’t crash any more systems.
And if that requires to revert both patches until we know for certain which one is the bad one, I find that the best option.
-Michael
>
> Thanks, and best regards,
> Peter Müller
>
>> Hello,
>>
>> You seem to have a very classic NULL pointer dereference.
>>
>> Something is trying to follow a NULL pointer. And that isn’t possible.
>>
>> Now it is interesting to know why that is. The cap_capable function hasn’t been touched in the 5.15 tree in a while. The same goes for ns_capable.
>>
>> I would therefore suspect that this is some issue from the RANDSTRUCT plugin which seems to be incompatible with ccache.
>>
>> If you have built a kernel with a random seed for the first time, that will be put into the cache. If the next build is unmodified, the kernel with come out of the cache and will be exactly the same as the previous build.
>>
>> If you however modify some parts of the kernel (a minor release for example) you will only compile the changed parts BUT with a different seed for the randstruct plugin.
>>
>> And I suspect that this has happened here where your code is now simply reading the wrong memory.
>>
>> I would recommend reverting the RANDSTRUCT patch and that should allow you to have a proper image again.
>>
>> If you want to keep that, the only option would be to disable the ccache for the kernel. The kernel is however one of the largest packages and ccache works really really well here. We can discuss this if we have identified RADNSTRUCT to be the culprit.
>>
>> -Michael
>>
>>> On 7 Aug 2022, at 19:08, Peter Müller <peter.mueller(a)ipfire.org> wrote:
>>>
>>> Hello *,
>>>
>>> enclosed is a screenshot of what booting the installer for Core Update 170 (dirty)
>>> with kernel 5.15.57 and slab merging disabled looks like. With kernel 5.15.59, the
>>> VM screen stays blank, so I had to revert this to get some results.
>>>
>>> Frankly, I don't see why the kernel suddenly does not know anything about efivarfs
>>> anymore, and what's sunrpc got to do with it. For the latter,
>>> /build/lib/modules/5.15.57-ipfire/kernel/net/sunrpc/auth_gss/rpcsec_gss_krb5.ko.xz
>>> is still there, just as it has been in C169 before.
>>>
>>> Any ideas are appreciated. :-)
>>>
>>> Thanks, and best regards,
>>> Peter Müller
>>>
>>>
>>>> Hello all, especially Arne,
>>>>
>>>> today, I upgraded to "IPFire 2.27 - Core Update 170 Development Build: next/06b4164d",
>>>> which primarily comes with Linux 5.15.59 and the slab cache merging disabled. On
>>>> my physical testing hardware, the boot process stalled after several kernel trace
>>>> message blocks being displayed.
>>>>
>>>> Unfortunately, I was unable to recover them in detail, but they occurred fairly
>>>> early, roughly around the mounting of the root file system. Since the machine is
>>>> semi-productive (we all test in production, don't we? ;-) ), I went back to C169
>>>> and will now investigate further which change broke the update.
>>>>
>>>> An earlier version of Core Update 170 (commit 668cf4c0d0c2dbbc607716956daace413837a8da,
>>>> I believe, but it was definitely after the randstruct changes) ran fine for days here,
>>>> so it must be a pretty recent change. Will keep you updated.
>>>>
>>>> Thanks, and best regards,
>>>> Peter Müller
>>> <screenshot_c170_dirty_crash_on_boot_sunrpc_efivarfs.png>
>>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Core Update 170 testing report - "next/06b4164d" crashes on my x86_64 testing machine
2022-08-08 10:22 ` Michael Tremer
@ 2022-08-08 14:15 ` Arne Fitzenreiter
2022-08-08 15:47 ` Peter Müller
0 siblings, 1 reply; 12+ messages in thread
From: Arne Fitzenreiter @ 2022-08-08 14:15 UTC (permalink / raw)
To: development
[-- Attachment #1: Type: text/plain, Size: 5320 bytes --]
With this
https://nightly.ipfire.org/next/2022-08-06%2007:45:02%20+0000-43df4a03/
nightly the kernel 5.15.59 boots on real hardware (x86_64 and aarch64)
After
commit 06b4164dfe269704976b52421edbbbdf3b345679
Author: Peter Müller <peter.mueller(a)ipfire.org>
Date: Mon Aug 1 17:39:59 2022 +0000
linux: Do not allow slab caches to be merged
it doesn't boot anymore. (also tested on x86_64 and aarch64)
Arne
Am 2022-08-08 12:22, schrieb Michael Tremer:
> Hello,
>
>> On 8 Aug 2022, at 11:16, Peter Müller <peter.mueller(a)ipfire.org>
>> wrote:
>>
>> Hello Michael, hello Arne,
>>
>> just a quick reply: I think we are dealing with the combination of two
>> issues here,
>> as kernel 5.15.59 without slab cache merging disabled won't even boot
>> in a VM (the
>> screen stays blank indefinitely), and it crashes straight away with
>> the slab cache
>> merging patch.
>>
>> Since kernel 5.15.57 is running perfectly fine here with randstruct
>> enabled, and has
>> been for days, I just reverted both the update to 5.15.59 and the slab
>> cache patch.
>> For the time being, I would leave randstruct enabled, since it does
>> not seem to be a
>> root cause for whatever bug(s) we are dealing with at the moment.
>
> Is that from the first build or a consecutive one?
>
>> @Arne: Were you able to boot 5.15.59 successfully on hardware? If so,
>> did it also
>> boot properly in a VirtualBox VM?
>>
>> Apologies for this coming up so unexpected.
>
> Well, things break. We should however be fast to have at least a
> booting kernel in the tree so that we won’t crash any more systems.
>
> And if that requires to revert both patches until we know for certain
> which one is the bad one, I find that the best option.
>
> -Michael
>
>>
>> Thanks, and best regards,
>> Peter Müller
>>
>>> Hello,
>>>
>>> You seem to have a very classic NULL pointer dereference.
>>>
>>> Something is trying to follow a NULL pointer. And that isn’t
>>> possible.
>>>
>>> Now it is interesting to know why that is. The cap_capable function
>>> hasn’t been touched in the 5.15 tree in a while. The same goes for
>>> ns_capable.
>>>
>>> I would therefore suspect that this is some issue from the RANDSTRUCT
>>> plugin which seems to be incompatible with ccache.
>>>
>>> If you have built a kernel with a random seed for the first time,
>>> that will be put into the cache. If the next build is unmodified, the
>>> kernel with come out of the cache and will be exactly the same as the
>>> previous build.
>>>
>>> If you however modify some parts of the kernel (a minor release for
>>> example) you will only compile the changed parts BUT with a different
>>> seed for the randstruct plugin.
>>>
>>> And I suspect that this has happened here where your code is now
>>> simply reading the wrong memory.
>>>
>>> I would recommend reverting the RANDSTRUCT patch and that should
>>> allow you to have a proper image again.
>>>
>>> If you want to keep that, the only option would be to disable the
>>> ccache for the kernel. The kernel is however one of the largest
>>> packages and ccache works really really well here. We can discuss
>>> this if we have identified RADNSTRUCT to be the culprit.
>>>
>>> -Michael
>>>
>>>> On 7 Aug 2022, at 19:08, Peter Müller <peter.mueller(a)ipfire.org>
>>>> wrote:
>>>>
>>>> Hello *,
>>>>
>>>> enclosed is a screenshot of what booting the installer for Core
>>>> Update 170 (dirty)
>>>> with kernel 5.15.57 and slab merging disabled looks like. With
>>>> kernel 5.15.59, the
>>>> VM screen stays blank, so I had to revert this to get some results.
>>>>
>>>> Frankly, I don't see why the kernel suddenly does not know anything
>>>> about efivarfs
>>>> anymore, and what's sunrpc got to do with it. For the latter,
>>>> /build/lib/modules/5.15.57-ipfire/kernel/net/sunrpc/auth_gss/rpcsec_gss_krb5.ko.xz
>>>> is still there, just as it has been in C169 before.
>>>>
>>>> Any ideas are appreciated. :-)
>>>>
>>>> Thanks, and best regards,
>>>> Peter Müller
>>>>
>>>>
>>>>> Hello all, especially Arne,
>>>>>
>>>>> today, I upgraded to "IPFire 2.27 - Core Update 170 Development
>>>>> Build: next/06b4164d",
>>>>> which primarily comes with Linux 5.15.59 and the slab cache merging
>>>>> disabled. On
>>>>> my physical testing hardware, the boot process stalled after
>>>>> several kernel trace
>>>>> message blocks being displayed.
>>>>>
>>>>> Unfortunately, I was unable to recover them in detail, but they
>>>>> occurred fairly
>>>>> early, roughly around the mounting of the root file system. Since
>>>>> the machine is
>>>>> semi-productive (we all test in production, don't we? ;-) ), I went
>>>>> back to C169
>>>>> and will now investigate further which change broke the update.
>>>>>
>>>>> An earlier version of Core Update 170 (commit
>>>>> 668cf4c0d0c2dbbc607716956daace413837a8da,
>>>>> I believe, but it was definitely after the randstruct changes) ran
>>>>> fine for days here,
>>>>> so it must be a pretty recent change. Will keep you updated.
>>>>>
>>>>> Thanks, and best regards,
>>>>> Peter Müller
>>>> <screenshot_c170_dirty_crash_on_boot_sunrpc_efivarfs.png>
>>>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Core Update 170 testing report - "next/06b4164d" crashes on my x86_64 testing machine
2022-08-08 14:15 ` Arne Fitzenreiter
@ 2022-08-08 15:47 ` Peter Müller
2022-08-09 6:23 ` Arne Fitzenreiter
0 siblings, 1 reply; 12+ messages in thread
From: Peter Müller @ 2022-08-08 15:47 UTC (permalink / raw)
To: development
[-- Attachment #1: Type: text/plain, Size: 5569 bytes --]
Hello Arne,
thanks for reporting back.
This means the slab cache patch is the problem.
Unfortunately, my local C-cache appears to be completely messed up now, so I
will have to start with a clean cache, hence it will probably take me until
tomorrow to have some testing results ready.
Will keep you updated.
Thanks, and best regards,
Peter Müller
> With this https://nightly.ipfire.org/next/2022-08-06%2007:45:02%20+0000-43df4a03/
> nightly the kernel 5.15.59 boots on real hardware (x86_64 and aarch64)
> After
> commit 06b4164dfe269704976b52421edbbbdf3b345679
> Author: Peter Müller <peter.mueller(a)ipfire.org>
> Date: Mon Aug 1 17:39:59 2022 +0000
>
> linux: Do not allow slab caches to be merged
>
>
> it doesn't boot anymore. (also tested on x86_64 and aarch64)
>
> Arne
>
>
> Am 2022-08-08 12:22, schrieb Michael Tremer:
>> Hello,
>>
>>> On 8 Aug 2022, at 11:16, Peter Müller <peter.mueller(a)ipfire.org> wrote:
>>>
>>> Hello Michael, hello Arne,
>>>
>>> just a quick reply: I think we are dealing with the combination of two issues here,
>>> as kernel 5.15.59 without slab cache merging disabled won't even boot in a VM (the
>>> screen stays blank indefinitely), and it crashes straight away with the slab cache
>>> merging patch.
>>>
>>> Since kernel 5.15.57 is running perfectly fine here with randstruct enabled, and has
>>> been for days, I just reverted both the update to 5.15.59 and the slab cache patch.
>>> For the time being, I would leave randstruct enabled, since it does not seem to be a
>>> root cause for whatever bug(s) we are dealing with at the moment.
>>
>> Is that from the first build or a consecutive one?
>>
>>> @Arne: Were you able to boot 5.15.59 successfully on hardware? If so, did it also
>>> boot properly in a VirtualBox VM?
>>>
>>> Apologies for this coming up so unexpected.
>>
>> Well, things break. We should however be fast to have at least a
>> booting kernel in the tree so that we won’t crash any more systems.
>>
>> And if that requires to revert both patches until we know for certain
>> which one is the bad one, I find that the best option.
>>
>> -Michael
>>
>>>
>>> Thanks, and best regards,
>>> Peter Müller
>>>
>>>> Hello,
>>>>
>>>> You seem to have a very classic NULL pointer dereference.
>>>>
>>>> Something is trying to follow a NULL pointer. And that isn’t possible.
>>>>
>>>> Now it is interesting to know why that is. The cap_capable function hasn’t been touched in the 5.15 tree in a while. The same goes for ns_capable.
>>>>
>>>> I would therefore suspect that this is some issue from the RANDSTRUCT plugin which seems to be incompatible with ccache.
>>>>
>>>> If you have built a kernel with a random seed for the first time, that will be put into the cache. If the next build is unmodified, the kernel with come out of the cache and will be exactly the same as the previous build.
>>>>
>>>> If you however modify some parts of the kernel (a minor release for example) you will only compile the changed parts BUT with a different seed for the randstruct plugin.
>>>>
>>>> And I suspect that this has happened here where your code is now simply reading the wrong memory.
>>>>
>>>> I would recommend reverting the RANDSTRUCT patch and that should allow you to have a proper image again.
>>>>
>>>> If you want to keep that, the only option would be to disable the ccache for the kernel. The kernel is however one of the largest packages and ccache works really really well here. We can discuss this if we have identified RADNSTRUCT to be the culprit.
>>>>
>>>> -Michael
>>>>
>>>>> On 7 Aug 2022, at 19:08, Peter Müller <peter.mueller(a)ipfire.org> wrote:
>>>>>
>>>>> Hello *,
>>>>>
>>>>> enclosed is a screenshot of what booting the installer for Core Update 170 (dirty)
>>>>> with kernel 5.15.57 and slab merging disabled looks like. With kernel 5.15.59, the
>>>>> VM screen stays blank, so I had to revert this to get some results.
>>>>>
>>>>> Frankly, I don't see why the kernel suddenly does not know anything about efivarfs
>>>>> anymore, and what's sunrpc got to do with it. For the latter,
>>>>> /build/lib/modules/5.15.57-ipfire/kernel/net/sunrpc/auth_gss/rpcsec_gss_krb5.ko.xz
>>>>> is still there, just as it has been in C169 before.
>>>>>
>>>>> Any ideas are appreciated. :-)
>>>>>
>>>>> Thanks, and best regards,
>>>>> Peter Müller
>>>>>
>>>>>
>>>>>> Hello all, especially Arne,
>>>>>>
>>>>>> today, I upgraded to "IPFire 2.27 - Core Update 170 Development Build: next/06b4164d",
>>>>>> which primarily comes with Linux 5.15.59 and the slab cache merging disabled. On
>>>>>> my physical testing hardware, the boot process stalled after several kernel trace
>>>>>> message blocks being displayed.
>>>>>>
>>>>>> Unfortunately, I was unable to recover them in detail, but they occurred fairly
>>>>>> early, roughly around the mounting of the root file system. Since the machine is
>>>>>> semi-productive (we all test in production, don't we? ;-) ), I went back to C169
>>>>>> and will now investigate further which change broke the update.
>>>>>>
>>>>>> An earlier version of Core Update 170 (commit 668cf4c0d0c2dbbc607716956daace413837a8da,
>>>>>> I believe, but it was definitely after the randstruct changes) ran fine for days here,
>>>>>> so it must be a pretty recent change. Will keep you updated.
>>>>>>
>>>>>> Thanks, and best regards,
>>>>>> Peter Müller
>>>>> <screenshot_c170_dirty_crash_on_boot_sunrpc_efivarfs.png>
>>>>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Core Update 170 testing report - "next/06b4164d" crashes on my x86_64 testing machine
2022-08-08 15:47 ` Peter Müller
@ 2022-08-09 6:23 ` Arne Fitzenreiter
2022-08-09 8:27 ` Arne Fitzenreiter
0 siblings, 1 reply; 12+ messages in thread
From: Arne Fitzenreiter @ 2022-08-09 6:23 UTC (permalink / raw)
To: development
[-- Attachment #1: Type: text/plain, Size: 6476 bytes --]
Am 2022-08-08 17:47, schrieb Peter Müller:
> Hello Arne,
>
> thanks for reporting back.
>
> This means the slab cache patch is the problem.
Im not sure. I fear it could be the RANDSTRUCT because after a version
update of the kernel
it not use the ccache at first build and after a small config change it
could break if parts of
the kernel used from cache and some not.
At the moment i test a clean build without ccache but enabled slub cache
patch. If this work
it is the RANDSTRUCT change.
Arne
>
> Unfortunately, my local C-cache appears to be completely messed up now,
> so I
> will have to start with a clean cache, hence it will probably take me
> until
> tomorrow to have some testing results ready.
>
> Will keep you updated.
>
> Thanks, and best regards,
> Peter Müller
>
>
>> With this
>> https://nightly.ipfire.org/next/2022-08-06%2007:45:02%20+0000-43df4a03/
>> nightly the kernel 5.15.59 boots on real hardware (x86_64 and aarch64)
>> After
>> commit 06b4164dfe269704976b52421edbbbdf3b345679
>> Author: Peter Müller <peter.mueller(a)ipfire.org>
>> Date: Mon Aug 1 17:39:59 2022 +0000
>>
>> linux: Do not allow slab caches to be merged
>>
>>
>> it doesn't boot anymore. (also tested on x86_64 and aarch64)
>>
>> Arne
>>
>>
>> Am 2022-08-08 12:22, schrieb Michael Tremer:
>>> Hello,
>>>
>>>> On 8 Aug 2022, at 11:16, Peter Müller <peter.mueller(a)ipfire.org>
>>>> wrote:
>>>>
>>>> Hello Michael, hello Arne,
>>>>
>>>> just a quick reply: I think we are dealing with the combination of
>>>> two issues here,
>>>> as kernel 5.15.59 without slab cache merging disabled won't even
>>>> boot in a VM (the
>>>> screen stays blank indefinitely), and it crashes straight away with
>>>> the slab cache
>>>> merging patch.
>>>>
>>>> Since kernel 5.15.57 is running perfectly fine here with randstruct
>>>> enabled, and has
>>>> been for days, I just reverted both the update to 5.15.59 and the
>>>> slab cache patch.
>>>> For the time being, I would leave randstruct enabled, since it does
>>>> not seem to be a
>>>> root cause for whatever bug(s) we are dealing with at the moment.
>>>
>>> Is that from the first build or a consecutive one?
>>>
>>>> @Arne: Were you able to boot 5.15.59 successfully on hardware? If
>>>> so, did it also
>>>> boot properly in a VirtualBox VM?
>>>>
>>>> Apologies for this coming up so unexpected.
>>>
>>> Well, things break. We should however be fast to have at least a
>>> booting kernel in the tree so that we won’t crash any more systems.
>>>
>>> And if that requires to revert both patches until we know for certain
>>> which one is the bad one, I find that the best option.
>>>
>>> -Michael
>>>
>>>>
>>>> Thanks, and best regards,
>>>> Peter Müller
>>>>
>>>>> Hello,
>>>>>
>>>>> You seem to have a very classic NULL pointer dereference.
>>>>>
>>>>> Something is trying to follow a NULL pointer. And that isn’t
>>>>> possible.
>>>>>
>>>>> Now it is interesting to know why that is. The cap_capable function
>>>>> hasn’t been touched in the 5.15 tree in a while. The same goes for
>>>>> ns_capable.
>>>>>
>>>>> I would therefore suspect that this is some issue from the
>>>>> RANDSTRUCT plugin which seems to be incompatible with ccache.
>>>>>
>>>>> If you have built a kernel with a random seed for the first time,
>>>>> that will be put into the cache. If the next build is unmodified,
>>>>> the kernel with come out of the cache and will be exactly the same
>>>>> as the previous build.
>>>>>
>>>>> If you however modify some parts of the kernel (a minor release for
>>>>> example) you will only compile the changed parts BUT with a
>>>>> different seed for the randstruct plugin.
>>>>>
>>>>> And I suspect that this has happened here where your code is now
>>>>> simply reading the wrong memory.
>>>>>
>>>>> I would recommend reverting the RANDSTRUCT patch and that should
>>>>> allow you to have a proper image again.
>>>>>
>>>>> If you want to keep that, the only option would be to disable the
>>>>> ccache for the kernel. The kernel is however one of the largest
>>>>> packages and ccache works really really well here. We can discuss
>>>>> this if we have identified RADNSTRUCT to be the culprit.
>>>>>
>>>>> -Michael
>>>>>
>>>>>> On 7 Aug 2022, at 19:08, Peter Müller <peter.mueller(a)ipfire.org>
>>>>>> wrote:
>>>>>>
>>>>>> Hello *,
>>>>>>
>>>>>> enclosed is a screenshot of what booting the installer for Core
>>>>>> Update 170 (dirty)
>>>>>> with kernel 5.15.57 and slab merging disabled looks like. With
>>>>>> kernel 5.15.59, the
>>>>>> VM screen stays blank, so I had to revert this to get some
>>>>>> results.
>>>>>>
>>>>>> Frankly, I don't see why the kernel suddenly does not know
>>>>>> anything about efivarfs
>>>>>> anymore, and what's sunrpc got to do with it. For the latter,
>>>>>> /build/lib/modules/5.15.57-ipfire/kernel/net/sunrpc/auth_gss/rpcsec_gss_krb5.ko.xz
>>>>>> is still there, just as it has been in C169 before.
>>>>>>
>>>>>> Any ideas are appreciated. :-)
>>>>>>
>>>>>> Thanks, and best regards,
>>>>>> Peter Müller
>>>>>>
>>>>>>
>>>>>>> Hello all, especially Arne,
>>>>>>>
>>>>>>> today, I upgraded to "IPFire 2.27 - Core Update 170 Development
>>>>>>> Build: next/06b4164d",
>>>>>>> which primarily comes with Linux 5.15.59 and the slab cache
>>>>>>> merging disabled. On
>>>>>>> my physical testing hardware, the boot process stalled after
>>>>>>> several kernel trace
>>>>>>> message blocks being displayed.
>>>>>>>
>>>>>>> Unfortunately, I was unable to recover them in detail, but they
>>>>>>> occurred fairly
>>>>>>> early, roughly around the mounting of the root file system. Since
>>>>>>> the machine is
>>>>>>> semi-productive (we all test in production, don't we? ;-) ), I
>>>>>>> went back to C169
>>>>>>> and will now investigate further which change broke the update.
>>>>>>>
>>>>>>> An earlier version of Core Update 170 (commit
>>>>>>> 668cf4c0d0c2dbbc607716956daace413837a8da,
>>>>>>> I believe, but it was definitely after the randstruct changes)
>>>>>>> ran fine for days here,
>>>>>>> so it must be a pretty recent change. Will keep you updated.
>>>>>>>
>>>>>>> Thanks, and best regards,
>>>>>>> Peter Müller
>>>>>> <screenshot_c170_dirty_crash_on_boot_sunrpc_efivarfs.png>
>>>>>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Core Update 170 testing report - "next/06b4164d" crashes on my x86_64 testing machine
2022-08-09 6:23 ` Arne Fitzenreiter
@ 2022-08-09 8:27 ` Arne Fitzenreiter
2022-08-09 9:28 ` Peter Müller
0 siblings, 1 reply; 12+ messages in thread
From: Arne Fitzenreiter @ 2022-08-09 8:27 UTC (permalink / raw)
To: development
[-- Attachment #1: Type: text/plain, Size: 6852 bytes --]
A fresh build with empty ccache boots also with the slab cache patch
so RANDRTRUCT should be the real problem.
Arne
Am 2022-08-09 08:23, schrieb Arne Fitzenreiter:
> Am 2022-08-08 17:47, schrieb Peter Müller:
>> Hello Arne,
>>
>> thanks for reporting back.
>>
>> This means the slab cache patch is the problem.
>
> Im not sure. I fear it could be the RANDSTRUCT because after a version
> update of the kernel
> it not use the ccache at first build and after a small config change
> it could break if parts of
> the kernel used from cache and some not.
>
> At the moment i test a clean build without ccache but enabled slub
> cache patch. If this work
> it is the RANDSTRUCT change.
>
> Arne
>
>>
>> Unfortunately, my local C-cache appears to be completely messed up
>> now, so I
>> will have to start with a clean cache, hence it will probably take me
>> until
>> tomorrow to have some testing results ready.
>>
>> Will keep you updated.
>>
>> Thanks, and best regards,
>> Peter Müller
>>
>>
>>> With this
>>> https://nightly.ipfire.org/next/2022-08-06%2007:45:02%20+0000-43df4a03/
>>> nightly the kernel 5.15.59 boots on real hardware (x86_64 and
>>> aarch64)
>>> After
>>> commit 06b4164dfe269704976b52421edbbbdf3b345679
>>> Author: Peter Müller <peter.mueller(a)ipfire.org>
>>> Date: Mon Aug 1 17:39:59 2022 +0000
>>>
>>> linux: Do not allow slab caches to be merged
>>>
>>>
>>> it doesn't boot anymore. (also tested on x86_64 and aarch64)
>>>
>>> Arne
>>>
>>>
>>> Am 2022-08-08 12:22, schrieb Michael Tremer:
>>>> Hello,
>>>>
>>>>> On 8 Aug 2022, at 11:16, Peter Müller <peter.mueller(a)ipfire.org>
>>>>> wrote:
>>>>>
>>>>> Hello Michael, hello Arne,
>>>>>
>>>>> just a quick reply: I think we are dealing with the combination of
>>>>> two issues here,
>>>>> as kernel 5.15.59 without slab cache merging disabled won't even
>>>>> boot in a VM (the
>>>>> screen stays blank indefinitely), and it crashes straight away with
>>>>> the slab cache
>>>>> merging patch.
>>>>>
>>>>> Since kernel 5.15.57 is running perfectly fine here with randstruct
>>>>> enabled, and has
>>>>> been for days, I just reverted both the update to 5.15.59 and the
>>>>> slab cache patch.
>>>>> For the time being, I would leave randstruct enabled, since it does
>>>>> not seem to be a
>>>>> root cause for whatever bug(s) we are dealing with at the moment.
>>>>
>>>> Is that from the first build or a consecutive one?
>>>>
>>>>> @Arne: Were you able to boot 5.15.59 successfully on hardware? If
>>>>> so, did it also
>>>>> boot properly in a VirtualBox VM?
>>>>>
>>>>> Apologies for this coming up so unexpected.
>>>>
>>>> Well, things break. We should however be fast to have at least a
>>>> booting kernel in the tree so that we won’t crash any more systems.
>>>>
>>>> And if that requires to revert both patches until we know for
>>>> certain
>>>> which one is the bad one, I find that the best option.
>>>>
>>>> -Michael
>>>>
>>>>>
>>>>> Thanks, and best regards,
>>>>> Peter Müller
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> You seem to have a very classic NULL pointer dereference.
>>>>>>
>>>>>> Something is trying to follow a NULL pointer. And that isn’t
>>>>>> possible.
>>>>>>
>>>>>> Now it is interesting to know why that is. The cap_capable
>>>>>> function hasn’t been touched in the 5.15 tree in a while. The same
>>>>>> goes for ns_capable.
>>>>>>
>>>>>> I would therefore suspect that this is some issue from the
>>>>>> RANDSTRUCT plugin which seems to be incompatible with ccache.
>>>>>>
>>>>>> If you have built a kernel with a random seed for the first time,
>>>>>> that will be put into the cache. If the next build is unmodified,
>>>>>> the kernel with come out of the cache and will be exactly the same
>>>>>> as the previous build.
>>>>>>
>>>>>> If you however modify some parts of the kernel (a minor release
>>>>>> for example) you will only compile the changed parts BUT with a
>>>>>> different seed for the randstruct plugin.
>>>>>>
>>>>>> And I suspect that this has happened here where your code is now
>>>>>> simply reading the wrong memory.
>>>>>>
>>>>>> I would recommend reverting the RANDSTRUCT patch and that should
>>>>>> allow you to have a proper image again.
>>>>>>
>>>>>> If you want to keep that, the only option would be to disable the
>>>>>> ccache for the kernel. The kernel is however one of the largest
>>>>>> packages and ccache works really really well here. We can discuss
>>>>>> this if we have identified RADNSTRUCT to be the culprit.
>>>>>>
>>>>>> -Michael
>>>>>>
>>>>>>> On 7 Aug 2022, at 19:08, Peter Müller <peter.mueller(a)ipfire.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Hello *,
>>>>>>>
>>>>>>> enclosed is a screenshot of what booting the installer for Core
>>>>>>> Update 170 (dirty)
>>>>>>> with kernel 5.15.57 and slab merging disabled looks like. With
>>>>>>> kernel 5.15.59, the
>>>>>>> VM screen stays blank, so I had to revert this to get some
>>>>>>> results.
>>>>>>>
>>>>>>> Frankly, I don't see why the kernel suddenly does not know
>>>>>>> anything about efivarfs
>>>>>>> anymore, and what's sunrpc got to do with it. For the latter,
>>>>>>> /build/lib/modules/5.15.57-ipfire/kernel/net/sunrpc/auth_gss/rpcsec_gss_krb5.ko.xz
>>>>>>> is still there, just as it has been in C169 before.
>>>>>>>
>>>>>>> Any ideas are appreciated. :-)
>>>>>>>
>>>>>>> Thanks, and best regards,
>>>>>>> Peter Müller
>>>>>>>
>>>>>>>
>>>>>>>> Hello all, especially Arne,
>>>>>>>>
>>>>>>>> today, I upgraded to "IPFire 2.27 - Core Update 170 Development
>>>>>>>> Build: next/06b4164d",
>>>>>>>> which primarily comes with Linux 5.15.59 and the slab cache
>>>>>>>> merging disabled. On
>>>>>>>> my physical testing hardware, the boot process stalled after
>>>>>>>> several kernel trace
>>>>>>>> message blocks being displayed.
>>>>>>>>
>>>>>>>> Unfortunately, I was unable to recover them in detail, but they
>>>>>>>> occurred fairly
>>>>>>>> early, roughly around the mounting of the root file system.
>>>>>>>> Since the machine is
>>>>>>>> semi-productive (we all test in production, don't we? ;-) ), I
>>>>>>>> went back to C169
>>>>>>>> and will now investigate further which change broke the update.
>>>>>>>>
>>>>>>>> An earlier version of Core Update 170 (commit
>>>>>>>> 668cf4c0d0c2dbbc607716956daace413837a8da,
>>>>>>>> I believe, but it was definitely after the randstruct changes)
>>>>>>>> ran fine for days here,
>>>>>>>> so it must be a pretty recent change. Will keep you updated.
>>>>>>>>
>>>>>>>> Thanks, and best regards,
>>>>>>>> Peter Müller
>>>>>>> <screenshot_c170_dirty_crash_on_boot_sunrpc_efivarfs.png>
>>>>>>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Core Update 170 testing report - "next/06b4164d" crashes on my x86_64 testing machine
2022-08-09 8:27 ` Arne Fitzenreiter
@ 2022-08-09 9:28 ` Peter Müller
2022-08-09 9:31 ` Michael Tremer
0 siblings, 1 reply; 12+ messages in thread
From: Peter Müller @ 2022-08-09 9:28 UTC (permalink / raw)
To: development
[-- Attachment #1: Type: text/plain, Size: 6786 bytes --]
Hello Arne,
thank you very much for reporting back.
Okay, then I will put the slab cache patch in again and leave randstruct disabled.
Thanks, and best regards,
Peter Müller
> A fresh build with empty ccache boots also with the slab cache patch
> so RANDRTRUCT should be the real problem.
>
> Arne
>
> Am 2022-08-09 08:23, schrieb Arne Fitzenreiter:
>> Am 2022-08-08 17:47, schrieb Peter Müller:
>>> Hello Arne,
>>>
>>> thanks for reporting back.
>>>
>>> This means the slab cache patch is the problem.
>>
>> Im not sure. I fear it could be the RANDSTRUCT because after a version
>> update of the kernel
>> it not use the ccache at first build and after a small config change
>> it could break if parts of
>> the kernel used from cache and some not.
>>
>> At the moment i test a clean build without ccache but enabled slub
>> cache patch. If this work
>> it is the RANDSTRUCT change.
>>
>> Arne
>>
>>>
>>> Unfortunately, my local C-cache appears to be completely messed up now, so I
>>> will have to start with a clean cache, hence it will probably take me until
>>> tomorrow to have some testing results ready.
>>>
>>> Will keep you updated.
>>>
>>> Thanks, and best regards,
>>> Peter Müller
>>>
>>>
>>>> With this https://nightly.ipfire.org/next/2022-08-06%2007:45:02%20+0000-43df4a03/
>>>> nightly the kernel 5.15.59 boots on real hardware (x86_64 and aarch64)
>>>> After
>>>> commit 06b4164dfe269704976b52421edbbbdf3b345679
>>>> Author: Peter Müller <peter.mueller(a)ipfire.org>
>>>> Date: Mon Aug 1 17:39:59 2022 +0000
>>>>
>>>> linux: Do not allow slab caches to be merged
>>>>
>>>>
>>>> it doesn't boot anymore. (also tested on x86_64 and aarch64)
>>>>
>>>> Arne
>>>>
>>>>
>>>> Am 2022-08-08 12:22, schrieb Michael Tremer:
>>>>> Hello,
>>>>>
>>>>>> On 8 Aug 2022, at 11:16, Peter Müller <peter.mueller(a)ipfire.org> wrote:
>>>>>>
>>>>>> Hello Michael, hello Arne,
>>>>>>
>>>>>> just a quick reply: I think we are dealing with the combination of two issues here,
>>>>>> as kernel 5.15.59 without slab cache merging disabled won't even boot in a VM (the
>>>>>> screen stays blank indefinitely), and it crashes straight away with the slab cache
>>>>>> merging patch.
>>>>>>
>>>>>> Since kernel 5.15.57 is running perfectly fine here with randstruct enabled, and has
>>>>>> been for days, I just reverted both the update to 5.15.59 and the slab cache patch.
>>>>>> For the time being, I would leave randstruct enabled, since it does not seem to be a
>>>>>> root cause for whatever bug(s) we are dealing with at the moment.
>>>>>
>>>>> Is that from the first build or a consecutive one?
>>>>>
>>>>>> @Arne: Were you able to boot 5.15.59 successfully on hardware? If so, did it also
>>>>>> boot properly in a VirtualBox VM?
>>>>>>
>>>>>> Apologies for this coming up so unexpected.
>>>>>
>>>>> Well, things break. We should however be fast to have at least a
>>>>> booting kernel in the tree so that we won’t crash any more systems.
>>>>>
>>>>> And if that requires to revert both patches until we know for certain
>>>>> which one is the bad one, I find that the best option.
>>>>>
>>>>> -Michael
>>>>>
>>>>>>
>>>>>> Thanks, and best regards,
>>>>>> Peter Müller
>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> You seem to have a very classic NULL pointer dereference.
>>>>>>>
>>>>>>> Something is trying to follow a NULL pointer. And that isn’t possible.
>>>>>>>
>>>>>>> Now it is interesting to know why that is. The cap_capable function hasn’t been touched in the 5.15 tree in a while. The same goes for ns_capable.
>>>>>>>
>>>>>>> I would therefore suspect that this is some issue from the RANDSTRUCT plugin which seems to be incompatible with ccache.
>>>>>>>
>>>>>>> If you have built a kernel with a random seed for the first time, that will be put into the cache. If the next build is unmodified, the kernel with come out of the cache and will be exactly the same as the previous build.
>>>>>>>
>>>>>>> If you however modify some parts of the kernel (a minor release for example) you will only compile the changed parts BUT with a different seed for the randstruct plugin.
>>>>>>>
>>>>>>> And I suspect that this has happened here where your code is now simply reading the wrong memory.
>>>>>>>
>>>>>>> I would recommend reverting the RANDSTRUCT patch and that should allow you to have a proper image again.
>>>>>>>
>>>>>>> If you want to keep that, the only option would be to disable the ccache for the kernel. The kernel is however one of the largest packages and ccache works really really well here. We can discuss this if we have identified RADNSTRUCT to be the culprit.
>>>>>>>
>>>>>>> -Michael
>>>>>>>
>>>>>>>> On 7 Aug 2022, at 19:08, Peter Müller <peter.mueller(a)ipfire.org> wrote:
>>>>>>>>
>>>>>>>> Hello *,
>>>>>>>>
>>>>>>>> enclosed is a screenshot of what booting the installer for Core Update 170 (dirty)
>>>>>>>> with kernel 5.15.57 and slab merging disabled looks like. With kernel 5.15.59, the
>>>>>>>> VM screen stays blank, so I had to revert this to get some results.
>>>>>>>>
>>>>>>>> Frankly, I don't see why the kernel suddenly does not know anything about efivarfs
>>>>>>>> anymore, and what's sunrpc got to do with it. For the latter,
>>>>>>>> /build/lib/modules/5.15.57-ipfire/kernel/net/sunrpc/auth_gss/rpcsec_gss_krb5.ko.xz
>>>>>>>> is still there, just as it has been in C169 before.
>>>>>>>>
>>>>>>>> Any ideas are appreciated. :-)
>>>>>>>>
>>>>>>>> Thanks, and best regards,
>>>>>>>> Peter Müller
>>>>>>>>
>>>>>>>>
>>>>>>>>> Hello all, especially Arne,
>>>>>>>>>
>>>>>>>>> today, I upgraded to "IPFire 2.27 - Core Update 170 Development Build: next/06b4164d",
>>>>>>>>> which primarily comes with Linux 5.15.59 and the slab cache merging disabled. On
>>>>>>>>> my physical testing hardware, the boot process stalled after several kernel trace
>>>>>>>>> message blocks being displayed.
>>>>>>>>>
>>>>>>>>> Unfortunately, I was unable to recover them in detail, but they occurred fairly
>>>>>>>>> early, roughly around the mounting of the root file system. Since the machine is
>>>>>>>>> semi-productive (we all test in production, don't we? ;-) ), I went back to C169
>>>>>>>>> and will now investigate further which change broke the update.
>>>>>>>>>
>>>>>>>>> An earlier version of Core Update 170 (commit 668cf4c0d0c2dbbc607716956daace413837a8da,
>>>>>>>>> I believe, but it was definitely after the randstruct changes) ran fine for days here,
>>>>>>>>> so it must be a pretty recent change. Will keep you updated.
>>>>>>>>>
>>>>>>>>> Thanks, and best regards,
>>>>>>>>> Peter Müller
>>>>>>>> <screenshot_c170_dirty_crash_on_boot_sunrpc_efivarfs.png>
>>>>>>>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Core Update 170 testing report - "next/06b4164d" crashes on my x86_64 testing machine
2022-08-09 9:28 ` Peter Müller
@ 2022-08-09 9:31 ` Michael Tremer
2022-08-09 10:26 ` Peter Müller
0 siblings, 1 reply; 12+ messages in thread
From: Michael Tremer @ 2022-08-09 9:31 UTC (permalink / raw)
To: development
[-- Attachment #1: Type: text/plain, Size: 7408 bytes --]
Hello,
Okay. That will leave us with the question if we have destroyed the ccache on the nightly builders (or any others).
Since ccache might be unaware of the seed, we might have mixed files in the cache.
If builds still fail after reverting the RANDSTRUCT patch, we might need to wipe the cache.
-Michael
> On 9 Aug 2022, at 10:28, Peter Müller <peter.mueller(a)ipfire.org> wrote:
>
> Hello Arne,
>
> thank you very much for reporting back.
>
> Okay, then I will put the slab cache patch in again and leave randstruct disabled.
>
> Thanks, and best regards,
> Peter Müller
>
>
>> A fresh build with empty ccache boots also with the slab cache patch
>> so RANDRTRUCT should be the real problem.
>>
>> Arne
>>
>> Am 2022-08-09 08:23, schrieb Arne Fitzenreiter:
>>> Am 2022-08-08 17:47, schrieb Peter Müller:
>>>> Hello Arne,
>>>>
>>>> thanks for reporting back.
>>>>
>>>> This means the slab cache patch is the problem.
>>>
>>> Im not sure. I fear it could be the RANDSTRUCT because after a version
>>> update of the kernel
>>> it not use the ccache at first build and after a small config change
>>> it could break if parts of
>>> the kernel used from cache and some not.
>>>
>>> At the moment i test a clean build without ccache but enabled slub
>>> cache patch. If this work
>>> it is the RANDSTRUCT change.
>>>
>>> Arne
>>>
>>>>
>>>> Unfortunately, my local C-cache appears to be completely messed up now, so I
>>>> will have to start with a clean cache, hence it will probably take me until
>>>> tomorrow to have some testing results ready.
>>>>
>>>> Will keep you updated.
>>>>
>>>> Thanks, and best regards,
>>>> Peter Müller
>>>>
>>>>
>>>>> With this https://nightly.ipfire.org/next/2022-08-06%2007:45:02%20+0000-43df4a03/
>>>>> nightly the kernel 5.15.59 boots on real hardware (x86_64 and aarch64)
>>>>> After
>>>>> commit 06b4164dfe269704976b52421edbbbdf3b345679
>>>>> Author: Peter Müller <peter.mueller(a)ipfire.org>
>>>>> Date: Mon Aug 1 17:39:59 2022 +0000
>>>>>
>>>>> linux: Do not allow slab caches to be merged
>>>>>
>>>>>
>>>>> it doesn't boot anymore. (also tested on x86_64 and aarch64)
>>>>>
>>>>> Arne
>>>>>
>>>>>
>>>>> Am 2022-08-08 12:22, schrieb Michael Tremer:
>>>>>> Hello,
>>>>>>
>>>>>>> On 8 Aug 2022, at 11:16, Peter Müller <peter.mueller(a)ipfire.org> wrote:
>>>>>>>
>>>>>>> Hello Michael, hello Arne,
>>>>>>>
>>>>>>> just a quick reply: I think we are dealing with the combination of two issues here,
>>>>>>> as kernel 5.15.59 without slab cache merging disabled won't even boot in a VM (the
>>>>>>> screen stays blank indefinitely), and it crashes straight away with the slab cache
>>>>>>> merging patch.
>>>>>>>
>>>>>>> Since kernel 5.15.57 is running perfectly fine here with randstruct enabled, and has
>>>>>>> been for days, I just reverted both the update to 5.15.59 and the slab cache patch.
>>>>>>> For the time being, I would leave randstruct enabled, since it does not seem to be a
>>>>>>> root cause for whatever bug(s) we are dealing with at the moment.
>>>>>>
>>>>>> Is that from the first build or a consecutive one?
>>>>>>
>>>>>>> @Arne: Were you able to boot 5.15.59 successfully on hardware? If so, did it also
>>>>>>> boot properly in a VirtualBox VM?
>>>>>>>
>>>>>>> Apologies for this coming up so unexpected.
>>>>>>
>>>>>> Well, things break. We should however be fast to have at least a
>>>>>> booting kernel in the tree so that we won’t crash any more systems.
>>>>>>
>>>>>> And if that requires to revert both patches until we know for certain
>>>>>> which one is the bad one, I find that the best option.
>>>>>>
>>>>>> -Michael
>>>>>>
>>>>>>>
>>>>>>> Thanks, and best regards,
>>>>>>> Peter Müller
>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> You seem to have a very classic NULL pointer dereference.
>>>>>>>>
>>>>>>>> Something is trying to follow a NULL pointer. And that isn’t possible.
>>>>>>>>
>>>>>>>> Now it is interesting to know why that is. The cap_capable function hasn’t been touched in the 5.15 tree in a while. The same goes for ns_capable.
>>>>>>>>
>>>>>>>> I would therefore suspect that this is some issue from the RANDSTRUCT plugin which seems to be incompatible with ccache.
>>>>>>>>
>>>>>>>> If you have built a kernel with a random seed for the first time, that will be put into the cache. If the next build is unmodified, the kernel with come out of the cache and will be exactly the same as the previous build.
>>>>>>>>
>>>>>>>> If you however modify some parts of the kernel (a minor release for example) you will only compile the changed parts BUT with a different seed for the randstruct plugin.
>>>>>>>>
>>>>>>>> And I suspect that this has happened here where your code is now simply reading the wrong memory.
>>>>>>>>
>>>>>>>> I would recommend reverting the RANDSTRUCT patch and that should allow you to have a proper image again.
>>>>>>>>
>>>>>>>> If you want to keep that, the only option would be to disable the ccache for the kernel. The kernel is however one of the largest packages and ccache works really really well here. We can discuss this if we have identified RADNSTRUCT to be the culprit.
>>>>>>>>
>>>>>>>> -Michael
>>>>>>>>
>>>>>>>>> On 7 Aug 2022, at 19:08, Peter Müller <peter.mueller(a)ipfire.org> wrote:
>>>>>>>>>
>>>>>>>>> Hello *,
>>>>>>>>>
>>>>>>>>> enclosed is a screenshot of what booting the installer for Core Update 170 (dirty)
>>>>>>>>> with kernel 5.15.57 and slab merging disabled looks like. With kernel 5.15.59, the
>>>>>>>>> VM screen stays blank, so I had to revert this to get some results.
>>>>>>>>>
>>>>>>>>> Frankly, I don't see why the kernel suddenly does not know anything about efivarfs
>>>>>>>>> anymore, and what's sunrpc got to do with it. For the latter,
>>>>>>>>> /build/lib/modules/5.15.57-ipfire/kernel/net/sunrpc/auth_gss/rpcsec_gss_krb5.ko.xz
>>>>>>>>> is still there, just as it has been in C169 before.
>>>>>>>>>
>>>>>>>>> Any ideas are appreciated. :-)
>>>>>>>>>
>>>>>>>>> Thanks, and best regards,
>>>>>>>>> Peter Müller
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Hello all, especially Arne,
>>>>>>>>>>
>>>>>>>>>> today, I upgraded to "IPFire 2.27 - Core Update 170 Development Build: next/06b4164d",
>>>>>>>>>> which primarily comes with Linux 5.15.59 and the slab cache merging disabled. On
>>>>>>>>>> my physical testing hardware, the boot process stalled after several kernel trace
>>>>>>>>>> message blocks being displayed.
>>>>>>>>>>
>>>>>>>>>> Unfortunately, I was unable to recover them in detail, but they occurred fairly
>>>>>>>>>> early, roughly around the mounting of the root file system. Since the machine is
>>>>>>>>>> semi-productive (we all test in production, don't we? ;-) ), I went back to C169
>>>>>>>>>> and will now investigate further which change broke the update.
>>>>>>>>>>
>>>>>>>>>> An earlier version of Core Update 170 (commit 668cf4c0d0c2dbbc607716956daace413837a8da,
>>>>>>>>>> I believe, but it was definitely after the randstruct changes) ran fine for days here,
>>>>>>>>>> so it must be a pretty recent change. Will keep you updated.
>>>>>>>>>>
>>>>>>>>>> Thanks, and best regards,
>>>>>>>>>> Peter Müller
>>>>>>>>> <screenshot_c170_dirty_crash_on_boot_sunrpc_efivarfs.png>
>>>>>>>>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Core Update 170 testing report - "next/06b4164d" crashes on my x86_64 testing machine
2022-08-09 9:31 ` Michael Tremer
@ 2022-08-09 10:26 ` Peter Müller
2022-08-09 10:37 ` Michael Tremer
0 siblings, 1 reply; 12+ messages in thread
From: Peter Müller @ 2022-08-09 10:26 UTC (permalink / raw)
To: development
[-- Attachment #1: Type: text/plain, Size: 7872 bytes --]
Hello Michael,
agreed. I will keep an eye on the mails emitted from the nightly builders and
delete the ccache, if necessary.
If so, sending a short notice to this mailing list to inform fellow developers
on this issue should be sufficient AFAIC, since we are only dealing with "next".
Thanks, and best regards,
Peter Müller
> Hello,
>
> Okay. That will leave us with the question if we have destroyed the ccache on the nightly builders (or any others).
>
> Since ccache might be unaware of the seed, we might have mixed files in the cache.
>
> If builds still fail after reverting the RANDSTRUCT patch, we might need to wipe the cache.
>
> -Michael
>
>> On 9 Aug 2022, at 10:28, Peter Müller <peter.mueller(a)ipfire.org> wrote:
>>
>> Hello Arne,
>>
>> thank you very much for reporting back.
>>
>> Okay, then I will put the slab cache patch in again and leave randstruct disabled.
>>
>> Thanks, and best regards,
>> Peter Müller
>>
>>
>>> A fresh build with empty ccache boots also with the slab cache patch
>>> so RANDRTRUCT should be the real problem.
>>>
>>> Arne
>>>
>>> Am 2022-08-09 08:23, schrieb Arne Fitzenreiter:
>>>> Am 2022-08-08 17:47, schrieb Peter Müller:
>>>>> Hello Arne,
>>>>>
>>>>> thanks for reporting back.
>>>>>
>>>>> This means the slab cache patch is the problem.
>>>>
>>>> Im not sure. I fear it could be the RANDSTRUCT because after a version
>>>> update of the kernel
>>>> it not use the ccache at first build and after a small config change
>>>> it could break if parts of
>>>> the kernel used from cache and some not.
>>>>
>>>> At the moment i test a clean build without ccache but enabled slub
>>>> cache patch. If this work
>>>> it is the RANDSTRUCT change.
>>>>
>>>> Arne
>>>>
>>>>>
>>>>> Unfortunately, my local C-cache appears to be completely messed up now, so I
>>>>> will have to start with a clean cache, hence it will probably take me until
>>>>> tomorrow to have some testing results ready.
>>>>>
>>>>> Will keep you updated.
>>>>>
>>>>> Thanks, and best regards,
>>>>> Peter Müller
>>>>>
>>>>>
>>>>>> With this https://nightly.ipfire.org/next/2022-08-06%2007:45:02%20+0000-43df4a03/
>>>>>> nightly the kernel 5.15.59 boots on real hardware (x86_64 and aarch64)
>>>>>> After
>>>>>> commit 06b4164dfe269704976b52421edbbbdf3b345679
>>>>>> Author: Peter Müller <peter.mueller(a)ipfire.org>
>>>>>> Date: Mon Aug 1 17:39:59 2022 +0000
>>>>>>
>>>>>> linux: Do not allow slab caches to be merged
>>>>>>
>>>>>>
>>>>>> it doesn't boot anymore. (also tested on x86_64 and aarch64)
>>>>>>
>>>>>> Arne
>>>>>>
>>>>>>
>>>>>> Am 2022-08-08 12:22, schrieb Michael Tremer:
>>>>>>> Hello,
>>>>>>>
>>>>>>>> On 8 Aug 2022, at 11:16, Peter Müller <peter.mueller(a)ipfire.org> wrote:
>>>>>>>>
>>>>>>>> Hello Michael, hello Arne,
>>>>>>>>
>>>>>>>> just a quick reply: I think we are dealing with the combination of two issues here,
>>>>>>>> as kernel 5.15.59 without slab cache merging disabled won't even boot in a VM (the
>>>>>>>> screen stays blank indefinitely), and it crashes straight away with the slab cache
>>>>>>>> merging patch.
>>>>>>>>
>>>>>>>> Since kernel 5.15.57 is running perfectly fine here with randstruct enabled, and has
>>>>>>>> been for days, I just reverted both the update to 5.15.59 and the slab cache patch.
>>>>>>>> For the time being, I would leave randstruct enabled, since it does not seem to be a
>>>>>>>> root cause for whatever bug(s) we are dealing with at the moment.
>>>>>>>
>>>>>>> Is that from the first build or a consecutive one?
>>>>>>>
>>>>>>>> @Arne: Were you able to boot 5.15.59 successfully on hardware? If so, did it also
>>>>>>>> boot properly in a VirtualBox VM?
>>>>>>>>
>>>>>>>> Apologies for this coming up so unexpected.
>>>>>>>
>>>>>>> Well, things break. We should however be fast to have at least a
>>>>>>> booting kernel in the tree so that we won’t crash any more systems.
>>>>>>>
>>>>>>> And if that requires to revert both patches until we know for certain
>>>>>>> which one is the bad one, I find that the best option.
>>>>>>>
>>>>>>> -Michael
>>>>>>>
>>>>>>>>
>>>>>>>> Thanks, and best regards,
>>>>>>>> Peter Müller
>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> You seem to have a very classic NULL pointer dereference.
>>>>>>>>>
>>>>>>>>> Something is trying to follow a NULL pointer. And that isn’t possible.
>>>>>>>>>
>>>>>>>>> Now it is interesting to know why that is. The cap_capable function hasn’t been touched in the 5.15 tree in a while. The same goes for ns_capable.
>>>>>>>>>
>>>>>>>>> I would therefore suspect that this is some issue from the RANDSTRUCT plugin which seems to be incompatible with ccache.
>>>>>>>>>
>>>>>>>>> If you have built a kernel with a random seed for the first time, that will be put into the cache. If the next build is unmodified, the kernel with come out of the cache and will be exactly the same as the previous build.
>>>>>>>>>
>>>>>>>>> If you however modify some parts of the kernel (a minor release for example) you will only compile the changed parts BUT with a different seed for the randstruct plugin.
>>>>>>>>>
>>>>>>>>> And I suspect that this has happened here where your code is now simply reading the wrong memory.
>>>>>>>>>
>>>>>>>>> I would recommend reverting the RANDSTRUCT patch and that should allow you to have a proper image again.
>>>>>>>>>
>>>>>>>>> If you want to keep that, the only option would be to disable the ccache for the kernel. The kernel is however one of the largest packages and ccache works really really well here. We can discuss this if we have identified RADNSTRUCT to be the culprit.
>>>>>>>>>
>>>>>>>>> -Michael
>>>>>>>>>
>>>>>>>>>> On 7 Aug 2022, at 19:08, Peter Müller <peter.mueller(a)ipfire.org> wrote:
>>>>>>>>>>
>>>>>>>>>> Hello *,
>>>>>>>>>>
>>>>>>>>>> enclosed is a screenshot of what booting the installer for Core Update 170 (dirty)
>>>>>>>>>> with kernel 5.15.57 and slab merging disabled looks like. With kernel 5.15.59, the
>>>>>>>>>> VM screen stays blank, so I had to revert this to get some results.
>>>>>>>>>>
>>>>>>>>>> Frankly, I don't see why the kernel suddenly does not know anything about efivarfs
>>>>>>>>>> anymore, and what's sunrpc got to do with it. For the latter,
>>>>>>>>>> /build/lib/modules/5.15.57-ipfire/kernel/net/sunrpc/auth_gss/rpcsec_gss_krb5.ko.xz
>>>>>>>>>> is still there, just as it has been in C169 before.
>>>>>>>>>>
>>>>>>>>>> Any ideas are appreciated. :-)
>>>>>>>>>>
>>>>>>>>>> Thanks, and best regards,
>>>>>>>>>> Peter Müller
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Hello all, especially Arne,
>>>>>>>>>>>
>>>>>>>>>>> today, I upgraded to "IPFire 2.27 - Core Update 170 Development Build: next/06b4164d",
>>>>>>>>>>> which primarily comes with Linux 5.15.59 and the slab cache merging disabled. On
>>>>>>>>>>> my physical testing hardware, the boot process stalled after several kernel trace
>>>>>>>>>>> message blocks being displayed.
>>>>>>>>>>>
>>>>>>>>>>> Unfortunately, I was unable to recover them in detail, but they occurred fairly
>>>>>>>>>>> early, roughly around the mounting of the root file system. Since the machine is
>>>>>>>>>>> semi-productive (we all test in production, don't we? ;-) ), I went back to C169
>>>>>>>>>>> and will now investigate further which change broke the update.
>>>>>>>>>>>
>>>>>>>>>>> An earlier version of Core Update 170 (commit 668cf4c0d0c2dbbc607716956daace413837a8da,
>>>>>>>>>>> I believe, but it was definitely after the randstruct changes) ran fine for days here,
>>>>>>>>>>> so it must be a pretty recent change. Will keep you updated.
>>>>>>>>>>>
>>>>>>>>>>> Thanks, and best regards,
>>>>>>>>>>> Peter Müller
>>>>>>>>>> <screenshot_c170_dirty_crash_on_boot_sunrpc_efivarfs.png>
>>>>>>>>>
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Core Update 170 testing report - "next/06b4164d" crashes on my x86_64 testing machine
2022-08-09 10:26 ` Peter Müller
@ 2022-08-09 10:37 ` Michael Tremer
0 siblings, 0 replies; 12+ messages in thread
From: Michael Tremer @ 2022-08-09 10:37 UTC (permalink / raw)
To: development
[-- Attachment #1: Type: text/plain, Size: 8231 bytes --]
Hello,
> On 9 Aug 2022, at 11:26, Peter Müller <peter.mueller(a)ipfire.org> wrote:
>
> Hello Michael,
>
> agreed. I will keep an eye on the mails emitted from the nightly builders and
> delete the ccache, if necessary.
>
> If so, sending a short notice to this mailing list to inform fellow developers
> on this issue should be sufficient AFAIC, since we are only dealing with "next".
Yes.
>
> Thanks, and best regards,
> Peter Müller
>
>
>> Hello,
>>
>> Okay. That will leave us with the question if we have destroyed the ccache on the nightly builders (or any others).
>>
>> Since ccache might be unaware of the seed, we might have mixed files in the cache.
>>
>> If builds still fail after reverting the RANDSTRUCT patch, we might need to wipe the cache.
>>
>> -Michael
>>
>>> On 9 Aug 2022, at 10:28, Peter Müller <peter.mueller(a)ipfire.org> wrote:
>>>
>>> Hello Arne,
>>>
>>> thank you very much for reporting back.
>>>
>>> Okay, then I will put the slab cache patch in again and leave randstruct disabled.
>>>
>>> Thanks, and best regards,
>>> Peter Müller
>>>
>>>
>>>> A fresh build with empty ccache boots also with the slab cache patch
>>>> so RANDRTRUCT should be the real problem.
>>>>
>>>> Arne
>>>>
>>>> Am 2022-08-09 08:23, schrieb Arne Fitzenreiter:
>>>>> Am 2022-08-08 17:47, schrieb Peter Müller:
>>>>>> Hello Arne,
>>>>>>
>>>>>> thanks for reporting back.
>>>>>>
>>>>>> This means the slab cache patch is the problem.
>>>>>
>>>>> Im not sure. I fear it could be the RANDSTRUCT because after a version
>>>>> update of the kernel
>>>>> it not use the ccache at first build and after a small config change
>>>>> it could break if parts of
>>>>> the kernel used from cache and some not.
>>>>>
>>>>> At the moment i test a clean build without ccache but enabled slub
>>>>> cache patch. If this work
>>>>> it is the RANDSTRUCT change.
>>>>>
>>>>> Arne
>>>>>
>>>>>>
>>>>>> Unfortunately, my local C-cache appears to be completely messed up now, so I
>>>>>> will have to start with a clean cache, hence it will probably take me until
>>>>>> tomorrow to have some testing results ready.
>>>>>>
>>>>>> Will keep you updated.
>>>>>>
>>>>>> Thanks, and best regards,
>>>>>> Peter Müller
>>>>>>
>>>>>>
>>>>>>> With this https://nightly.ipfire.org/next/2022-08-06%2007:45:02%20+0000-43df4a03/
>>>>>>> nightly the kernel 5.15.59 boots on real hardware (x86_64 and aarch64)
>>>>>>> After
>>>>>>> commit 06b4164dfe269704976b52421edbbbdf3b345679
>>>>>>> Author: Peter Müller <peter.mueller(a)ipfire.org>
>>>>>>> Date: Mon Aug 1 17:39:59 2022 +0000
>>>>>>>
>>>>>>> linux: Do not allow slab caches to be merged
>>>>>>>
>>>>>>>
>>>>>>> it doesn't boot anymore. (also tested on x86_64 and aarch64)
>>>>>>>
>>>>>>> Arne
>>>>>>>
>>>>>>>
>>>>>>> Am 2022-08-08 12:22, schrieb Michael Tremer:
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>>> On 8 Aug 2022, at 11:16, Peter Müller <peter.mueller(a)ipfire.org> wrote:
>>>>>>>>>
>>>>>>>>> Hello Michael, hello Arne,
>>>>>>>>>
>>>>>>>>> just a quick reply: I think we are dealing with the combination of two issues here,
>>>>>>>>> as kernel 5.15.59 without slab cache merging disabled won't even boot in a VM (the
>>>>>>>>> screen stays blank indefinitely), and it crashes straight away with the slab cache
>>>>>>>>> merging patch.
>>>>>>>>>
>>>>>>>>> Since kernel 5.15.57 is running perfectly fine here with randstruct enabled, and has
>>>>>>>>> been for days, I just reverted both the update to 5.15.59 and the slab cache patch.
>>>>>>>>> For the time being, I would leave randstruct enabled, since it does not seem to be a
>>>>>>>>> root cause for whatever bug(s) we are dealing with at the moment.
>>>>>>>>
>>>>>>>> Is that from the first build or a consecutive one?
>>>>>>>>
>>>>>>>>> @Arne: Were you able to boot 5.15.59 successfully on hardware? If so, did it also
>>>>>>>>> boot properly in a VirtualBox VM?
>>>>>>>>>
>>>>>>>>> Apologies for this coming up so unexpected.
>>>>>>>>
>>>>>>>> Well, things break. We should however be fast to have at least a
>>>>>>>> booting kernel in the tree so that we won’t crash any more systems.
>>>>>>>>
>>>>>>>> And if that requires to revert both patches until we know for certain
>>>>>>>> which one is the bad one, I find that the best option.
>>>>>>>>
>>>>>>>> -Michael
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks, and best regards,
>>>>>>>>> Peter Müller
>>>>>>>>>
>>>>>>>>>> Hello,
>>>>>>>>>>
>>>>>>>>>> You seem to have a very classic NULL pointer dereference.
>>>>>>>>>>
>>>>>>>>>> Something is trying to follow a NULL pointer. And that isn’t possible.
>>>>>>>>>>
>>>>>>>>>> Now it is interesting to know why that is. The cap_capable function hasn’t been touched in the 5.15 tree in a while. The same goes for ns_capable.
>>>>>>>>>>
>>>>>>>>>> I would therefore suspect that this is some issue from the RANDSTRUCT plugin which seems to be incompatible with ccache.
>>>>>>>>>>
>>>>>>>>>> If you have built a kernel with a random seed for the first time, that will be put into the cache. If the next build is unmodified, the kernel with come out of the cache and will be exactly the same as the previous build.
>>>>>>>>>>
>>>>>>>>>> If you however modify some parts of the kernel (a minor release for example) you will only compile the changed parts BUT with a different seed for the randstruct plugin.
>>>>>>>>>>
>>>>>>>>>> And I suspect that this has happened here where your code is now simply reading the wrong memory.
>>>>>>>>>>
>>>>>>>>>> I would recommend reverting the RANDSTRUCT patch and that should allow you to have a proper image again.
>>>>>>>>>>
>>>>>>>>>> If you want to keep that, the only option would be to disable the ccache for the kernel. The kernel is however one of the largest packages and ccache works really really well here. We can discuss this if we have identified RADNSTRUCT to be the culprit.
>>>>>>>>>>
>>>>>>>>>> -Michael
>>>>>>>>>>
>>>>>>>>>>> On 7 Aug 2022, at 19:08, Peter Müller <peter.mueller(a)ipfire.org> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hello *,
>>>>>>>>>>>
>>>>>>>>>>> enclosed is a screenshot of what booting the installer for Core Update 170 (dirty)
>>>>>>>>>>> with kernel 5.15.57 and slab merging disabled looks like. With kernel 5.15.59, the
>>>>>>>>>>> VM screen stays blank, so I had to revert this to get some results.
>>>>>>>>>>>
>>>>>>>>>>> Frankly, I don't see why the kernel suddenly does not know anything about efivarfs
>>>>>>>>>>> anymore, and what's sunrpc got to do with it. For the latter,
>>>>>>>>>>> /build/lib/modules/5.15.57-ipfire/kernel/net/sunrpc/auth_gss/rpcsec_gss_krb5.ko.xz
>>>>>>>>>>> is still there, just as it has been in C169 before.
>>>>>>>>>>>
>>>>>>>>>>> Any ideas are appreciated. :-)
>>>>>>>>>>>
>>>>>>>>>>> Thanks, and best regards,
>>>>>>>>>>> Peter Müller
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Hello all, especially Arne,
>>>>>>>>>>>>
>>>>>>>>>>>> today, I upgraded to "IPFire 2.27 - Core Update 170 Development Build: next/06b4164d",
>>>>>>>>>>>> which primarily comes with Linux 5.15.59 and the slab cache merging disabled. On
>>>>>>>>>>>> my physical testing hardware, the boot process stalled after several kernel trace
>>>>>>>>>>>> message blocks being displayed.
>>>>>>>>>>>>
>>>>>>>>>>>> Unfortunately, I was unable to recover them in detail, but they occurred fairly
>>>>>>>>>>>> early, roughly around the mounting of the root file system. Since the machine is
>>>>>>>>>>>> semi-productive (we all test in production, don't we? ;-) ), I went back to C169
>>>>>>>>>>>> and will now investigate further which change broke the update.
>>>>>>>>>>>>
>>>>>>>>>>>> An earlier version of Core Update 170 (commit 668cf4c0d0c2dbbc607716956daace413837a8da,
>>>>>>>>>>>> I believe, but it was definitely after the randstruct changes) ran fine for days here,
>>>>>>>>>>>> so it must be a pretty recent change. Will keep you updated.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks, and best regards,
>>>>>>>>>>>> Peter Müller
>>>>>>>>>>> <screenshot_c170_dirty_crash_on_boot_sunrpc_efivarfs.png>
>>>>>>>>>>
>>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Core Update 170 testing report - "next/06b4164d" crashes on my x86_64 testing machine
@ 2022-08-07 12:14 Peter Müller
0 siblings, 0 replies; 12+ messages in thread
From: Peter Müller @ 2022-08-07 12:14 UTC (permalink / raw)
To: development
[-- Attachment #1: Type: text/plain, Size: 914 bytes --]
Hello all, especially Arne,
today, I upgraded to "IPFire 2.27 - Core Update 170 Development Build: next/06b4164d",
which primarily comes with Linux 5.15.59 and the slab cache merging disabled. On
my physical testing hardware, the boot process stalled after several kernel trace
message blocks being displayed.
Unfortunately, I was unable to recover them in detail, but they occurred fairly
early, roughly around the mounting of the root file system. Since the machine is
semi-productive (we all test in production, don't we? ;-) ), I went back to C169
and will now investigate further which change broke the update.
An earlier version of Core Update 170 (commit 668cf4c0d0c2dbbc607716956daace413837a8da,
I believe, but it was definitely after the randstruct changes) ran fine for days here,
so it must be a pretty recent change. Will keep you updated.
Thanks, and best regards,
Peter Müller
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2022-08-09 10:37 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <d4d5fe5f-08c5-df44-4ba6-0a77f16bf890@ipfire.org>
2022-08-08 9:50 ` Core Update 170 testing report - "next/06b4164d" crashes on my x86_64 testing machine Michael Tremer
2022-08-08 10:16 ` Peter Müller
2022-08-08 10:22 ` Michael Tremer
2022-08-08 14:15 ` Arne Fitzenreiter
2022-08-08 15:47 ` Peter Müller
2022-08-09 6:23 ` Arne Fitzenreiter
2022-08-09 8:27 ` Arne Fitzenreiter
2022-08-09 9:28 ` Peter Müller
2022-08-09 9:31 ` Michael Tremer
2022-08-09 10:26 ` Peter Müller
2022-08-09 10:37 ` Michael Tremer
2022-08-07 12:14 Peter Müller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox