From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michael Tremer To: development@lists.ipfire.org Subject: Re: Problem during building of samba on arm builder Date: Wed, 10 Jul 2024 15:24:49 +0100 Message-ID: <9540863A-4EE7-42F4-A779-92E370266439@ipfire.org> In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============9189332420602573616==" List-Id: --===============9189332420602573616== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Hello, > On 10 Jul 2024, at 14:21, Adolf Belka wrote: >=20 > Hi Michael, >=20 > On 10/07/2024 15:05, Adolf Belka wrote: >> Hi Michael, >>=20 >> On 10/07/2024 14:59, Adolf Belka wrote: >>> Hi Michael, >>>=20 >>> On 10/07/2024 12:33, Adolf Belka wrote: >>>> Hi Michael, >>>>=20 >>>> On 10/07/2024 11:57, Michael Tremer wrote: >>>>> Hello again, >>>>>=20 >>>>> I managed to (finally) build the toolchain with the updated system. So = hopefully there should not be any more outstanding problems that I know of so= far. >>>>=20 >>>> I just did a git pull on your repo to my clone. >>>>=20 >>>> Ran ./make.sh gettoolchain and it successfully downloaded the toolchain. >>>>=20 >>>> Ran ./make.sh downloadsrc and it successfully tested everything. >>>>=20 >>>> Ran ./make.sh clean and build and log directories were cleared out and r= emoved. As far as I can tell it was successful. >>>>=20 >>>> Currently running ./make.sh build. So up to this point everything going = well. Will let you know how it goes. >>>>=20 >>> It has got to building popt. In the normal build system this takes around= 3 secs. Currently in the new build it is at nearly 2 hours. Even with an emp= ty cache, that seems a long build time for popt, unless I am being too optimi= stic. >>>=20 >>> I will let it keep going. >>>=20 >> I just had a look at the log file and it looks like it completed popt but = then is stuck on trying to leave the directory /usr/src/lfs. Here is the outp= ut from the log, nothing new is getting written to the log. >>=20 >> make[3]: Leaving directory '/usr/src/popt-1.19/po' >> Making install in tests >> make[3]: Entering directory '/usr/src/popt-1.19/tests' >> make[4]: Entering directory '/usr/src/popt-1.19/tests' >> make[4]: Nothing to be done for 'install-exec-am'. >> make[4]: Nothing to be done for 'install-data-am'. >> make[4]: Leaving directory '/usr/src/popt-1.19/tests' >> make[3]: Leaving directory '/usr/src/popt-1.19/tests' >> make[3]: Entering directory '/usr/src/popt-1.19' >> make[4]: Entering directory '/usr/src/popt-1.19' >> make[4]: Nothing to be done for 'install-exec-am'. >> /bin/mkdir -p '/usr/share/man/man3' >> /usr/bin/install -c -m 644 popt.3 '/usr/share/man/man3' >> /bin/mkdir -p '/usr/lib/pkgconfig' >> /usr/bin/install -c -m 644 popt.pc '/usr/lib/pkgconfig' >> make[4]: Leaving directory '/usr/src/popt-1.19' >> make[3]: Leaving directory '/usr/src/popt-1.19' >> make[2]: Leaving directory '/usr/src/popt-1.19' >> make[1]: Leaving directory '/usr/src/popt-1.19' >> Updating linker cache... >> Install done; saving file list to /usr/src/log/popt-1.19 ... >>=20 >> make: Leaving directory '/usr/src/lfs' >>=20 >>=20 > I stopped the build with Ctrl-C and then ran ./make.sh build again without = doing a clean. It got to popt and went straight to libedit, the next package,= and started to build it. >=20 > Something must have just put the build into a loop of some sort. It was not= writing anything to the log file. The log file says that popt was done. It might be that you interrupted the bu= ild just after it has finished, but the root file was not created, yet. That = should however not directly be a problem=E2=80=A6 If this is however the only problem that you are seeing now than I would be h= appy :) -Michael >=20 > Regards, >=20 > Adolf. >=20 >=20 >> Regards, >>=20 >> Adolf. >>=20 >>=20 >>>=20 >>> One thing I found. I am running the new build system while I have been ru= nning some package updates with the old system with its mount points. The two= have each run without any impact on the other. >>>=20 >>> Regards, >>>=20 >>> Adolf. >>>=20 >>>> Regards, >>>> Adolf. >>>>=20 >>>>>=20 >>>>> Best, >>>>> -Michael >>>>>=20 >>>>>> On 9 Jul 2024, at 22:29, Michael Tremer = wrote: >>>>>>=20 >>>>>> Hello Adolf, >>>>>>=20 >>>>>> Thank you for testing this. >>>>>>=20 >>>>>> There have indeed been plenty of problems there=E2=80=A6 I spent a lot= of time on this today and hopefully fixed most of them. >>>>>>=20 >>>>>> I cannot build the toolchain on my machine and I am not sure why yet, = but a build with the packaged toolchain runs through. >>>>>>=20 >>>>>> I have also spent some time on getting rid of the strip stage because = it annoyed me how long it takes and creating the disk images as well as packa= ges should be significantly faster now, too. I hope I didn=E2=80=99t introduc= e too many new bugs. >>>>>>=20 >>>>>> Please let me know if you have more success now. >>>>>>=20 >>>>>> Best, >>>>>> -Michael >>>>>>=20 >>>>>>> On 8 Jul 2024, at 20:34, Adolf Belka wrote: >>>>>>>=20 >>>>>>> Hi Michael, >>>>>>>=20 >>>>>>> On 08/07/2024 21:15, Adolf Belka wrote: >>>>>>>> Hi Michael, >>>>>>>>=20 >>>>>>>> On 08/07/2024 18:11, Michael Tremer wrote: >>>>>>>>> Hello, >>>>>>>>>=20 >>>>>>>>> I have been spending a lot of time on this problem, because it has = been bothering me for a long time. I also saw an opportunity to make more cha= nges to the build system. >>>>>>>>>=20 >>>>>>>>> Currently this is all a little bit WIP, but I hope that we can merg= e this into next as soon as the current update has been moved to master. >>>>>>>>>=20 >>>>>>>>> I am referring to this branch which is currently based on next: htt= ps://git.ipfire.org/?p=3Dpeople/ms/ipfire-2.x.git;a=3Dshortlog;h=3Drefs/heads= /unshare >>>>>>>>>=20 >>>>>>>>> It makes use of the unshare command which creates new namespaces in= Linux. That way, we can isolate the build system better from the host system= and in case something goes wrong, there is less damage. We can also enforce = some more rules=E2=80=A6 >>>>>>>>>=20 >>>>>>>>> So, what has changed? >>>>>>>>>=20 >>>>>>>>> * The make.sh script might re-execute itself into a new mount names= pace when it is suitable. This happens for =E2=80=9Cmake.sh build=E2=80=9D an= d =E2=80=9Cmake.sh shell=E2=80=9D, but it does not happen for =E2=80=9Cmake.s= h downloadsrc=E2=80=9D for example. >>>>>>>>>=20 >>>>>>>>> https://git.ipfire.org/?p=3Dpeople/ms/ipfire-2.x.git;a=3Dblob;f=3Dm= ake.sh;h=3Db952627782a0d5ef4ac75f17315b689fcb3b4fe0;hb=3Drefs/heads/unshare#l= 2129 >>>>>>>>> https://git.ipfire.org/?p=3Dpeople/ms/ipfire-2.x.git;a=3Dblob;f=3Dm= ake.sh;h=3Db952627782a0d5ef4ac75f17315b689fcb3b4fe0;hb=3Drefs/heads/unshare#l= 2251 >>>>>>>>>=20 >>>>>>>>> * The new mount namespace means that we will no longer see any bind= -mounts in the host system and we no longer need to umount anything ourselves= which is where we occasionally wiped the entire hard drive of the host syste= m. When the last process exits, the namespace is being cleaned up and everyth= ing is being umounted. >>>>>>>>>=20 >>>>>>>>> * The function that prepares the build environment has been almost = entirely rewritten: >>>>>>>>>=20 >>>>>>>>> https://git.ipfire.org/?p=3Dpeople/ms/ipfire-2.x.git;a=3Dblob;f=3Dm= ake.sh;h=3Db952627782a0d5ef4ac75f17315b689fcb3b4fe0;hb=3Drefs/heads/unshare#l= 426 >>>>>>>>>=20 >>>>>>>>> It used to mount parts of the host system into the build environmen= t which are needed to run anything. Those were /dev, /proc, /sys, etc=E2=80=A6 >>>>>>>>>=20 >>>>>>>>> Instead, it now creates a new /dev mount point and creates a minima= l amount of device nodes and symlinks. That way, we detach from the host syst= em and no longer allow the build system access to the host=E2=80=99s filesyst= em and block devices. We also bind-mount the sources in read-only mode now, s= o that the build system cannot change anything in the source tree. On top of = that, cache is read-only, too. ccache and the log directory are the only plac= es that are writable. >>>>>>>>>=20 >>>>>>>>> We mount a separate /tmp directory. >>>>>>>>>=20 >>>>>>>>> * When we then build a package, we create more namespaces for each = package. These isolate each build process from each other. >>>>>>>>>=20 >>>>>>>>> Mostly, this is to detach from the host system. A new UTS namespace= allows to change the hostname in the build system without affecting the host= and so on. We do the same thing with a new time namespace. >>>>>>>>>=20 >>>>>>>>> We do however create a new PID namespace which means that the build= system no longer will see any processes running on the host system. That req= uires to mount a new instance of /proc with each package. This also has the e= ffect that if the shell that we launched terminates (because the build is don= e) any background processes will be killed immediately. >>>>>>>>>=20 >>>>>>>>> Last, we clone the mount namespace that we have created before so t= hat no build command can modify what we set up earlier. >>>>>>>>>=20 >>>>>>>>> * Since everything is now so decoupled, we gain a couple of new (ma= ybe minor?) features: >>>>>>>>>=20 >>>>>>>>> It is now possible to run =E2=80=9Cmake.sh shell=E2=80=9D while = a build is running. That does not happen a lot, but we can do this now :) >>>>>>>>>=20 >>>>>>>>> If the build crashes or the host system is being shut down while= a build is running, there is nothing to clean up afterwards. >>>>>>>>>=20 >>>>>>>>> * I have garnished this all with a lot of code cleanup and I suppos= e I might have introduced some new bugs here or there :) >>>>>>>>>=20 >>>>>>>>> * This is probably mostly around a new implementation of the timer = that updates the build time. It has been annoying me a lot that it takes a lo= ng time to walk through all packages that have been built before to finally g= et to a package that we want to rebuild. Mostly this was all help up by a cal= l of =E2=80=9Csleep 0.1=E2=80=9D >>>>>>>>>=20 >>>>>>>>> Since bash does not really do any concurrency, I had to be creative= and replaced the busy-loop with a background process that is launched whenev= er it is needed and which will =E2=80=9Cping=E2=80=9D the main make.sh script= once a second. That way, we can just run as usual, but regularly get interru= pted to update the runtime. >>>>>>>>>=20 >>>>>>>>> https://git.ipfire.org/?p=3Dpeople/ms/ipfire-2.x.git;a=3Dblob;f=3Dm= ake.sh;h=3Db952627782a0d5ef4ac75f17315b689fcb3b4fe0;hb=3Drefs/heads/unshare#l= 361 >>>>>>>>> https://git.ipfire.org/?p=3Dpeople/ms/ipfire-2.x.git;a=3Dblob;f=3Dm= ake.sh;h=3Db952627782a0d5ef4ac75f17315b689fcb3b4fe0;hb=3Drefs/heads/unshare#l= 834 >>>>>>>>>=20 >>>>>>>>> We now only fork one extra sub shell and we have to handle the time= r events which is a lot cheaper as well as more straight-forward to code. >>>>>>>>>=20 >>>>>>>>> * As there is no difference between the different stages any more (= those stages that we inherited from LFS), I have merged them all into one. >>>>>>>>>=20 >>>>>>>>> * Last but not least, I have create the option to build for multipl= e architectures on the same system. Since we can now mount the entire source = tree into (many independent) build environments, we might as well=E2=80=A6 As= discussed on the last call, this might not be the best option for ARM, but R= ISV-C builds at a decent speed even when emulated. >>>>>>>>>=20 >>>>>>>>> The only thing that I needed to do for this is to suffix the build = and log directories which are now called build_${ARCH}, i.e. build_aarch64, b= uild_x86_64, and so on. The packages/ directory is not changed yet, but that = will have to happen as well. Most likely I want to merge this with the genera= ted images, but I am not sure what to call this, yet. Happy to hear suggestio= ns. result_x86_64? Just images_x86_64? >>>>>>>>>=20 >>>>>>>>> --- >>>>>>>>>=20 >>>>>>>>> I have run a build and this seems to be working just fine on my Deb= ian machine. I am writing to all of you to first of all let you know what I a= m up to; and secondly to ask to give this a go on your systems. I think it sh= ould run just fine, as all the tools that I require should be available every= where. However, there might be some older kernels that might not support all = of this, yet or any other problems I cannot think of yet. Please give me some= feedback and send me all the bugs :) >>>>>>>> I gave this a go but it didn't work. >>>>>>>>=20 >>>>>>>> Not sure if I should have run the ./make.sh clean command on the old= version before I pulled the unshare branch into my clone of your repo. >>>>>>>>=20 >>>>>>>> Should I have started with a complete new clone of your repo? I migh= t try that anyway just to see. >>>>>>>>=20 >>>>>>> I created a completely new clone of you ipfire-2.x repor and then che= cked out the unshare branch to a branch called unshare in my local repo clone. >>>>>>>=20 >>>>>>> gettoolchain gave the same issue, except that this time the toolchain= directory ended up completely empty. >>>>>>>=20 >>>>>>> downloadsrc had the same result. >>>>>>>=20 >>>>>>> clean had nothing to clean up as it was a fresh clone. >>>>>>>=20 >>>>>>> build then tried to build the toolchain and came up with this error, = different from before. >>>>>>>=20 >>>>>>> ./make.sh build >>>>>>> chroot: failed to run command =E2=80=98env=E2=80=99: No such file or = directory >>>>>>> Full toolchain compilation >>>>>>> stage1 [ FAIL ] >>>>>>>=20 >>>>>>> Jul 8 19:26:39: Building stage1 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D Installing stage1 ... >>>>>>> mkdir -pv /tools_x86_64/lib >>>>>>> mkdir: cannot create directory '/tools_x86_64': File exists >>>>>>> make: *** [stage1:50: /home/ahb/sandbox/ms/ipfire-2.x/log/stage1]= Error 1 >>>>>>>=20 >>>>>>> ERROR: Building stage1 [ FAIL ] >>>>>>> Check /home/ahb/sandbox/ms/ipfire-2.x/log_x86_64/_build.toolchain= .log for errors if applicable [ FAIL ] >>>>>>>=20 >>>>>>> so it wasn't as simple as doing a fresh git clone. >>>>>>>=20 >>>>>>> Regards, >>>>>>>=20 >>>>>>> Adolf. >>>>>>>=20 >>>>>>>=20 >>>>>>>> So I ran ./make.sh gettoolchain first, as I usually would. >>>>>>>>=20 >>>>>>>> ./make.sh gettoolchain >>>>>>>> b2sum: cache/toolchains/ipfire-2.29-toolchain-20240521-x86_64.tar.zs= t: No such file or directory >>>>>>>> cache/toolchains/ipfire-2.29-toolchain-20240521-x86_64.tar.zst: FAIL= ED open or read >>>>>>>> b2sum: WARNING: 1 listed file could not be read >>>>>>>>=20 >>>>>>>> ipfire-2.29-toolchain-20240210-x86_64.tar.zst is present together wi= th its b2 file. >>>>>>>>=20 >>>>>>>>=20 >>>>>>>> Then ran ./make.sh downloadsrc >>>>>>>>=20 >>>>>>>> Previous version ends with >>>>>>>>=20 >>>>>>>> ***Verifying BLAKE2 checksum >>>>>>>> all files BLAKE2 checksum match [ DONE] >>>>>>>>=20 >>>>>>>> after zstd has been checked. >>>>>>>>=20 >>>>>>>> New version stops at zstd entry. >>>>>>>>=20 >>>>>>>>=20 >>>>>>>> ./make.sh clean gave the message Cleaning Build directory... but was= completed very quickly. >>>>>>>> Log and Build directories have not been cleaned out. The img and iso= files are still present. >>>>>>>>=20 >>>>>>>>=20 >>>>>>>> ./make.sh build gave message >>>>>>>>=20 >>>>>>>> chroot: failed to run command =E2=80=98env=E2=80=99: No such file or= directory >>>>>>>>=20 >>>>>>>> and then did a full toolchain compilation which failed with gcc but = log is >9000 lines. >>>>>>>>=20 >>>>>>>>=20 >>>>>>>> Regards, >>>>>>>> Adolf. >>>>>>>>=20 >>>>>>>>>=20 >>>>>>>>> Thank you for listening to this brain-dump. >>>>>>>>>=20 >>>>>>>>> All the best, >>>>>>>>> -Michael >>>>>>>>>=20 >>>>>>>>>> On 3 Jul 2024, at 10:58, Michael Tremer wrote: >>>>>>>>>>=20 >>>>>>>>>> Hello Adolf, >>>>>>>>>>=20 >>>>>>>>>> This happens occasionally that the buildsystem umounts /dev and th= en nothing will really work any more. >>>>>>>>>>=20 >>>>>>>>>> I rebooted the machine and it is back up again. >>>>>>>>>>=20 >>>>>>>>>> -Michael >>>>>>>>>>=20 >>>>>>>>>>> On 2 Jul 2024, at 15:42, Adolf Belka w= rote: >>>>>>>>>>>=20 >>>>>>>>>>> Hi Michael and all, >>>>>>>>>>>=20 >>>>>>>>>>>=20 >>>>>>>>>>> I ran the arm builder with the 4.20.2 version of samba to test it= out. >>>>>>>>>>>=20 >>>>>>>>>>> The build got to building gdb and then failed. >>>>>>>>>>>=20 >>>>>>>>>>> Interestingly, the nightly build of arm was successful with the s= ame version of gdb. >>>>>>>>>>>=20 >>>>>>>>>>> The build log for gdb is attached. The actual error is at line 61= 8. >>>>>>>>>>>=20 >>>>>>>>>>> Another thing I found is that I just tried to go back into the ar= m builder. I successfully got into people.ipfire.org but then trying to scp i= nto the arm builder failed with the following message. >>>>>>>>>>>=20 >>>>>>>>>>> ------------------------------------------ >>>>>>>>>>>=20 >>>>>>>>>>> ssh bonnietwin(a)arm64-01.zrh.ipfire.org >>>>>>>>>>> PTY allocation request failed on channel 0 >>>>>>>>>>> Linux arm64-01.zrh.ipfire.org 6.1.0-21-cloud-arm64 #1 SMP Debian = 6.1.90-1 (2024-05-03) aarch64 >>>>>>>>>>>=20 >>>>>>>>>>> The programs included with the Debian GNU/Linux system are free s= oftware; >>>>>>>>>>> the exact distribution terms for each program are described in the >>>>>>>>>>> individual files in /usr/share/doc/*/copyright. >>>>>>>>>>>=20 >>>>>>>>>>> Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent >>>>>>>>>>> permitted by applicable law. >>>>>>>>>>> /etc/profile.d/Z99-cloud-locale-test.sh: line 14: /dev/null: Perm= ission denied >>>>>>>>>>>=20 >>>>>>>>>>> ------------------------------------------ >>>>>>>>>>>=20 >>>>>>>>>>> Regards, >>>>>>>>>>>=20 >>>>>>>>>>> Adolf. >>>>>>>>>>>=20 >>>>>>>>>>> <_build.ipfire.gdb.log> --===============9189332420602573616==--