From mboxrd@z Thu Jan  1 00:00:00 1970
From: Michael Tremer <michael.tremer@ipfire.org>
To: development@lists.ipfire.org
Subject: Re: Problem during building of samba on arm builder
Date: Wed, 10 Jul 2024 10:57:16 +0100
Message-ID: <3134CD7D-25D6-48FC-B72E-AEC881EF9357@ipfire.org>
In-Reply-To: <0D208105-6697-41E7-88FF-2DAAD6483158@ipfire.org>
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="===============4569108461891801027=="
List-Id: <development.lists.ipfire.org>

--===============4569108461891801027==
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable

Hello again,

I managed to (finally) build the toolchain with the updated system. So hopefu=
lly there should not be any more outstanding problems that I know of so far.

Best,
-Michael

> On 9 Jul 2024, at 22:29, Michael Tremer <michael.tremer(a)ipfire.org> wrote:
>=20
> Hello Adolf,
>=20
> Thank you for testing this.
>=20
> There have indeed been plenty of problems there=E2=80=A6 I spent a lot of t=
ime on this today and hopefully fixed most of them.
>=20
> I cannot build the toolchain on my machine and I am not sure why yet, but a=
 build with the packaged toolchain runs through.
>=20
> I have also spent some time on getting rid of the strip stage because it an=
noyed me how long it takes and creating the disk images as well as packages s=
hould be significantly faster now, too. I hope I didn=E2=80=99t introduce too=
 many new bugs.
>=20
> Please let me know if you have more success now.
>=20
> Best,
> -Michael
>=20
>> On 8 Jul 2024, at 20:34, Adolf Belka <adolf.belka(a)ipfire.org> wrote:
>>=20
>> Hi Michael,
>>=20
>> On 08/07/2024 21:15, Adolf Belka wrote:
>>> Hi Michael,
>>>=20
>>> On 08/07/2024 18:11, Michael Tremer wrote:
>>>> Hello,
>>>>=20
>>>> I have been spending a lot of time on this problem, because it has been =
bothering me for a long time. I also saw an opportunity to make more changes =
to the build system.
>>>>=20
>>>> Currently this is all a little bit WIP, but I hope that we can merge thi=
s into next as soon as the current update has been moved to master.
>>>>=20
>>>> I am referring to this branch which is currently based on next: https://=
git.ipfire.org/?p=3Dpeople/ms/ipfire-2.x.git;a=3Dshortlog;h=3Drefs/heads/unsh=
are
>>>>=20
>>>> It makes use of the unshare command which creates new namespaces in Linu=
x. That way, we can isolate the build system better from the host system and =
in case something goes wrong, there is less damage. We can also enforce some =
more rules=E2=80=A6
>>>>=20
>>>> So, what has changed?
>>>>=20
>>>> * The make.sh script might re-execute itself into a new mount namespace =
when it is suitable. This happens for =E2=80=9Cmake.sh build=E2=80=9D and =E2=
=80=9Cmake.sh shell=E2=80=9D, but it does not happen for =E2=80=9Cmake.sh dow=
nloadsrc=E2=80=9D for example.
>>>>=20
>>>> https://git.ipfire.org/?p=3Dpeople/ms/ipfire-2.x.git;a=3Dblob;f=3Dmake.s=
h;h=3Db952627782a0d5ef4ac75f17315b689fcb3b4fe0;hb=3Drefs/heads/unshare#l2129
>>>> https://git.ipfire.org/?p=3Dpeople/ms/ipfire-2.x.git;a=3Dblob;f=3Dmake.s=
h;h=3Db952627782a0d5ef4ac75f17315b689fcb3b4fe0;hb=3Drefs/heads/unshare#l2251
>>>>=20
>>>> * The new mount namespace means that we will no longer see any bind-moun=
ts in the host system and we no longer need to umount anything ourselves whic=
h is where we occasionally wiped the entire hard drive of the host system. Wh=
en the last process exits, the namespace is being cleaned up and everything i=
s being umounted.
>>>>=20
>>>> * The function that prepares the build environment has been almost entir=
ely rewritten:
>>>>=20
>>>> https://git.ipfire.org/?p=3Dpeople/ms/ipfire-2.x.git;a=3Dblob;f=3Dmake.s=
h;h=3Db952627782a0d5ef4ac75f17315b689fcb3b4fe0;hb=3Drefs/heads/unshare#l426
>>>>=20
>>>> It used to mount parts of the host system into the build environment whi=
ch are needed to run anything. Those were /dev, /proc, /sys, etc=E2=80=A6
>>>>=20
>>>> Instead, it now creates a new /dev mount point and creates a minimal amo=
unt of device nodes and symlinks. That way, we detach from the host system an=
d no longer allow the build system access to the host=E2=80=99s filesystem an=
d block devices. We also bind-mount the sources in read-only mode now, so tha=
t the build system cannot change anything in the source tree. On top of that,=
 cache is read-only, too. ccache and the log directory are the only places th=
at are writable.
>>>>=20
>>>> We mount a separate /tmp directory.
>>>>=20
>>>> * When we then build a package, we create more namespaces for each packa=
ge. These isolate each build process from each other.
>>>>=20
>>>> Mostly, this is to detach from the host system. A new UTS namespace allo=
ws to change the hostname in the build system without affecting the host and =
so on. We do the same thing with a new time namespace.
>>>>=20
>>>> We do however create a new PID namespace which means that the build syst=
em no longer will see any processes running on the host system. That requires=
 to mount a new instance of /proc with each package. This also has the effect=
 that if the shell that we launched terminates (because the build is done) an=
y background processes will be killed immediately.
>>>>=20
>>>> Last, we clone the mount namespace that we have created before so that n=
o build command can modify what we set up earlier.
>>>>=20
>>>> * Since everything is now so decoupled, we gain a couple of new (maybe m=
inor?) features:
>>>>=20
>>>>   It is now possible to run =E2=80=9Cmake.sh shell=E2=80=9D while a buil=
d is running. That does not happen a lot, but we can do this now :)
>>>>=20
>>>>   If the build crashes or the host system is being shut down while a bui=
ld is running, there is nothing to clean up afterwards.
>>>>=20
>>>> * I have garnished this all with a lot of code cleanup and I suppose I m=
ight have introduced some new bugs here or there :)
>>>>=20
>>>> * This is probably mostly around a new implementation of the timer that =
updates the build time. It has been annoying me a lot that it takes a long ti=
me to walk through all packages that have been built before to finally get to=
 a package that we want to rebuild. Mostly this was all help up by a call of =
=E2=80=9Csleep 0.1=E2=80=9D
>>>>=20
>>>> Since bash does not really do any concurrency, I had to be creative and =
replaced the busy-loop with a background process that is launched whenever it=
 is needed and which will =E2=80=9Cping=E2=80=9D the main make.sh script once=
 a second. That way, we can just run as usual, but regularly get interrupted =
to update the runtime.
>>>>=20
>>>> https://git.ipfire.org/?p=3Dpeople/ms/ipfire-2.x.git;a=3Dblob;f=3Dmake.s=
h;h=3Db952627782a0d5ef4ac75f17315b689fcb3b4fe0;hb=3Drefs/heads/unshare#l361
>>>> https://git.ipfire.org/?p=3Dpeople/ms/ipfire-2.x.git;a=3Dblob;f=3Dmake.s=
h;h=3Db952627782a0d5ef4ac75f17315b689fcb3b4fe0;hb=3Drefs/heads/unshare#l834
>>>>=20
>>>> We now only fork one extra sub shell and we have to handle the timer eve=
nts which is a lot cheaper as well as more straight-forward to code.
>>>>=20
>>>> * As there is no difference between the different stages any more (those=
 stages that we inherited from LFS), I have merged them all into one.
>>>>=20
>>>> * Last but not least, I have create the option to build for multiple arc=
hitectures on the same system. Since we can now mount the entire source tree =
into (many independent) build environments, we might as well=E2=80=A6 As disc=
ussed on the last call, this might not be the best option for ARM, but RISV-C=
 builds at a decent speed even when emulated.
>>>>=20
>>>> The only thing that I needed to do for this is to suffix the build and l=
og directories which are now called build_${ARCH}, i.e. build_aarch64, build_=
x86_64, and so on. The packages/ directory is not changed yet, but that will =
have to happen as well. Most likely I want to merge this with the generated i=
mages, but I am not sure what to call this, yet. Happy to hear suggestions. r=
esult_x86_64? Just images_x86_64?
>>>>=20
>>>> ---
>>>>=20
>>>> I have run a build and this seems to be working just fine on my Debian m=
achine. I am writing to all of you to first of all let you know what I am up =
to; and secondly to ask to give this a go on your systems. I think it should =
run just fine, as all the tools that I require should be available everywhere=
. However, there might be some older kernels that might not support all of th=
is, yet or any other problems I cannot think of yet. Please give me some feed=
back and send me all the bugs :)
>>> I gave this a go but it didn't work.
>>>=20
>>> Not sure if I should have run the ./make.sh clean command on the old vers=
ion before I pulled the unshare branch into my clone of your repo.
>>>=20
>>> Should I have started with a complete new clone of your repo? I might try=
 that anyway just to see.
>>>=20
>> I created a completely new clone of you ipfire-2.x repor and then checked =
out the unshare branch to a branch called unshare in my local repo clone.
>>=20
>> gettoolchain gave the same issue, except that this time the toolchain dire=
ctory ended up completely empty.
>>=20
>> downloadsrc had the same result.
>>=20
>> clean had nothing to clean up as it was a fresh clone.
>>=20
>> build then tried to build the toolchain and came up with this error, diffe=
rent from before.
>>=20
>> ./make.sh build
>> chroot: failed to run command =E2=80=98env=E2=80=99: No such file or direc=
tory
>> Full toolchain compilation
>> stage1 [ FAIL ]
>>=20
>>    Jul  8 19:26:39: Building stage1 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D Installing stage1 ...
>>    mkdir -pv /tools_x86_64/lib
>>    mkdir: cannot create directory '/tools_x86_64': File exists
>>    make: *** [stage1:50: /home/ahb/sandbox/ms/ipfire-2.x/log/stage1] Error=
 1
>>=20
>> ERROR: Building stage1 [ FAIL ]
>>    Check /home/ahb/sandbox/ms/ipfire-2.x/log_x86_64/_build.toolchain.log f=
or errors if applicable [ FAIL ]
>>=20
>> so it wasn't as simple as doing a fresh git clone.
>>=20
>> Regards,
>>=20
>> Adolf.
>>=20
>>=20
>>> So I ran ./make.sh gettoolchain first, as I usually would.
>>>=20
>>> ./make.sh gettoolchain
>>> b2sum: cache/toolchains/ipfire-2.29-toolchain-20240521-x86_64.tar.zst: No=
 such file or directory
>>> cache/toolchains/ipfire-2.29-toolchain-20240521-x86_64.tar.zst: FAILED op=
en or read
>>> b2sum: WARNING: 1 listed file could not be read
>>>=20
>>> ipfire-2.29-toolchain-20240210-x86_64.tar.zst is present together with it=
s b2 file.
>>>=20
>>>=20
>>> Then ran ./make.sh downloadsrc
>>>=20
>>> Previous version ends with
>>>=20
>>> ***Verifying BLAKE2 checksum
>>> all files BLAKE2 checksum match                                          =
             [ DONE]
>>>=20
>>> after zstd has been checked.
>>>=20
>>> New version stops at zstd entry.
>>>=20
>>>=20
>>> ./make.sh clean gave the message Cleaning Build directory... but was comp=
leted very quickly.
>>> Log and Build directories have not been cleaned out. The img and iso file=
s are still present.
>>>=20
>>>=20
>>> ./make.sh build gave message
>>>=20
>>> chroot: failed to run command =E2=80=98env=E2=80=99: No such file or dire=
ctory
>>>=20
>>> and then did a full toolchain compilation which failed with gcc but log i=
s >9000 lines.
>>>=20
>>>=20
>>> Regards,
>>> Adolf.
>>>=20
>>>>=20
>>>> Thank you for listening to this brain-dump.
>>>>=20
>>>> All the best,
>>>> -Michael
>>>>=20
>>>>> On 3 Jul 2024, at 10:58, Michael Tremer <michael.tremer(a)ipfire.org> w=
rote:
>>>>>=20
>>>>> Hello Adolf,
>>>>>=20
>>>>> This happens occasionally that the buildsystem umounts /dev and then no=
thing will really work any more.
>>>>>=20
>>>>> I rebooted the machine and it is back up again.
>>>>>=20
>>>>> -Michael
>>>>>=20
>>>>>> On 2 Jul 2024, at 15:42, Adolf Belka <adolf.belka(a)ipfire.org> wrote:
>>>>>>=20
>>>>>> Hi Michael and all,
>>>>>>=20
>>>>>>=20
>>>>>> I ran the arm builder with the 4.20.2 version of samba to test it out.
>>>>>>=20
>>>>>> The build got to building gdb and then failed.
>>>>>>=20
>>>>>> Interestingly, the nightly build of arm was successful with the same v=
ersion of gdb.
>>>>>>=20
>>>>>> The build log for gdb is attached. The actual error is at line 618.
>>>>>>=20
>>>>>> Another thing I found is that I just tried to go back into the arm bui=
lder. I successfully got into people.ipfire.org but then trying to scp into t=
he arm builder failed with the following message.
>>>>>>=20
>>>>>> ------------------------------------------
>>>>>>=20
>>>>>> ssh bonnietwin(a)arm64-01.zrh.ipfire.org
>>>>>> PTY allocation request failed on channel 0
>>>>>> Linux arm64-01.zrh.ipfire.org 6.1.0-21-cloud-arm64 #1 SMP Debian 6.1.9=
0-1 (2024-05-03) aarch64
>>>>>>=20
>>>>>> The programs included with the Debian GNU/Linux system are free softwa=
re;
>>>>>> the exact distribution terms for each program are described in the
>>>>>> individual files in /usr/share/doc/*/copyright.
>>>>>>=20
>>>>>> Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
>>>>>> permitted by applicable law.
>>>>>> /etc/profile.d/Z99-cloud-locale-test.sh: line 14: /dev/null: Permissio=
n denied
>>>>>>=20
>>>>>> ------------------------------------------
>>>>>>=20
>>>>>> Regards,
>>>>>>=20
>>>>>> Adolf.
>>>>>>=20
>>>>>> <_build.ipfire.gdb.log>
>=20
>=20


--===============4569108461891801027==--