From mboxrd@z Thu Jan 1 00:00:00 1970 From: Adolf Belka To: development@lists.ipfire.org Subject: Re: Problem during building of samba on arm builder Date: Wed, 10 Jul 2024 15:21:56 +0200 Message-ID: In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="===============2201970846166765883==" List-Id: --===============2201970846166765883== Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Hi Michael, On 10/07/2024 15:05, Adolf Belka wrote: > Hi Michael, > > On 10/07/2024 14:59, Adolf Belka wrote: >> Hi Michael, >> >> On 10/07/2024 12:33, Adolf Belka wrote: >>> Hi Michael, >>> >>> On 10/07/2024 11:57, Michael Tremer wrote: >>>> Hello again, >>>> >>>> I managed to (finally) build the toolchain with the updated system. So h= opefully there should not be any more outstanding problems that I know of so = far. >>> >>> I just did a git pull on your repo to my clone. >>> >>> Ran ./make.sh gettoolchain and it successfully downloaded the toolchain. >>> >>> Ran ./make.sh downloadsrc and it successfully tested everything. >>> >>> Ran ./make.sh clean and build and log directories were cleared out and re= moved. As far as I can tell it was successful. >>> >>> Currently running ./make.sh build. So up to this point everything going w= ell. Will let you know how it goes. >>> >> It has got to building popt. In the normal build system this takes around = 3 secs. Currently in the new build it is at nearly 2 hours. Even with an empt= y cache, that seems a long build time for popt, unless I am being too optimis= tic. >> >> I will let it keep going. >> > I just had a look at the log file and it looks like it completed popt but t= hen is stuck on trying to leave the directory /usr/src/lfs. Here is the outpu= t from the log, nothing new is getting written to the log. > > make[3]: Leaving directory '/usr/src/popt-1.19/po' > Making install in tests > make[3]: Entering directory '/usr/src/popt-1.19/tests' > make[4]: Entering directory '/usr/src/popt-1.19/tests' > make[4]: Nothing to be done for 'install-exec-am'. > make[4]: Nothing to be done for 'install-data-am'. > make[4]: Leaving directory '/usr/src/popt-1.19/tests' > make[3]: Leaving directory '/usr/src/popt-1.19/tests' > make[3]: Entering directory '/usr/src/popt-1.19' > make[4]: Entering directory '/usr/src/popt-1.19' > make[4]: Nothing to be done for 'install-exec-am'. > =C2=A0/bin/mkdir -p '/usr/share/man/man3' > =C2=A0/usr/bin/install -c -m 644 popt.3 '/usr/share/man/man3' > =C2=A0/bin/mkdir -p '/usr/lib/pkgconfig' > =C2=A0/usr/bin/install -c -m 644 popt.pc '/usr/lib/pkgconfig' > make[4]: Leaving directory '/usr/src/popt-1.19' > make[3]: Leaving directory '/usr/src/popt-1.19' > make[2]: Leaving directory '/usr/src/popt-1.19' > make[1]: Leaving directory '/usr/src/popt-1.19' > Updating linker cache... > Install done; saving file list to /usr/src/log/popt-1.19 ... > > make: Leaving directory '/usr/src/lfs' > > I stopped the build with Ctrl-C and then ran ./make.sh build again without do= ing a clean. It got to popt and went straight to libedit, the next package, a= nd started to build it. Something must have just put the build into a loop of some sort. It was not w= riting anything to the log file. Regards, Adolf. > Regards, > > Adolf. > > >> >> One thing I found. I am running the new build system while I have been run= ning some package updates with the old system with its mount points. The two = have each run without any impact on the other. >> >> Regards, >> >> Adolf. >> >>> Regards, >>> Adolf. >>> >>>> >>>> Best, >>>> -Michael >>>> >>>>> On 9 Jul 2024, at 22:29, Michael Tremer w= rote: >>>>> >>>>> Hello Adolf, >>>>> >>>>> Thank you for testing this. >>>>> >>>>> There have indeed been plenty of problems there=E2=80=A6 I spent a lot = of time on this today and hopefully fixed most of them. >>>>> >>>>> I cannot build the toolchain on my machine and I am not sure why yet, b= ut a build with the packaged toolchain runs through. >>>>> >>>>> I have also spent some time on getting rid of the strip stage because i= t annoyed me how long it takes and creating the disk images as well as packag= es should be significantly faster now, too. I hope I didn=E2=80=99t introduce= too many new bugs. >>>>> >>>>> Please let me know if you have more success now. >>>>> >>>>> Best, >>>>> -Michael >>>>> >>>>>> On 8 Jul 2024, at 20:34, Adolf Belka wrote: >>>>>> >>>>>> Hi Michael, >>>>>> >>>>>> On 08/07/2024 21:15, Adolf Belka wrote: >>>>>>> Hi Michael, >>>>>>> >>>>>>> On 08/07/2024 18:11, Michael Tremer wrote: >>>>>>>> Hello, >>>>>>>> >>>>>>>> I have been spending a lot of time on this problem, because it has b= een bothering me for a long time. I also saw an opportunity to make more chan= ges to the build system. >>>>>>>> >>>>>>>> Currently this is all a little bit WIP, but I hope that we can merge= this into next as soon as the current update has been moved to master. >>>>>>>> >>>>>>>> I am referring to this branch which is currently based on next: http= s://git.ipfire.org/?p=3Dpeople/ms/ipfire-2.x.git;a=3Dshortlog;h=3Drefs/heads/= unshare >>>>>>>> >>>>>>>> It makes use of the unshare command which creates new namespaces in = Linux. That way, we can isolate the build system better from the host system = and in case something goes wrong, there is less damage. We can also enforce s= ome more rules=E2=80=A6 >>>>>>>> >>>>>>>> So, what has changed? >>>>>>>> >>>>>>>> * The make.sh script might re-execute itself into a new mount namesp= ace when it is suitable. This happens for =E2=80=9Cmake.sh build=E2=80=9D and= =E2=80=9Cmake.sh shell=E2=80=9D, but it does not happen for =E2=80=9Cmake.sh= downloadsrc=E2=80=9D for example. >>>>>>>> >>>>>>>> https://git.ipfire.org/?p=3Dpeople/ms/ipfire-2.x.git;a=3Dblob;f=3Dma= ke.sh;h=3Db952627782a0d5ef4ac75f17315b689fcb3b4fe0;hb=3Drefs/heads/unshare#l2= 129 >>>>>>>> https://git.ipfire.org/?p=3Dpeople/ms/ipfire-2.x.git;a=3Dblob;f=3Dma= ke.sh;h=3Db952627782a0d5ef4ac75f17315b689fcb3b4fe0;hb=3Drefs/heads/unshare#l2= 251 >>>>>>>> >>>>>>>> * The new mount namespace means that we will no longer see any bind-= mounts in the host system and we no longer need to umount anything ourselves = which is where we occasionally wiped the entire hard drive of the host system= . When the last process exits, the namespace is being cleaned up and everythi= ng is being umounted. >>>>>>>> >>>>>>>> * The function that prepares the build environment has been almost e= ntirely rewritten: >>>>>>>> >>>>>>>> https://git.ipfire.org/?p=3Dpeople/ms/ipfire-2.x.git;a=3Dblob;f=3Dma= ke.sh;h=3Db952627782a0d5ef4ac75f17315b689fcb3b4fe0;hb=3Drefs/heads/unshare#l4= 26 >>>>>>>> >>>>>>>> It used to mount parts of the host system into the build environment= which are needed to run anything. Those were /dev, /proc, /sys, etc=E2=80=A6 >>>>>>>> >>>>>>>> Instead, it now creates a new /dev mount point and creates a minimal= amount of device nodes and symlinks. That way, we detach from the host syste= m and no longer allow the build system access to the host=E2=80=99s filesyste= m and block devices. We also bind-mount the sources in read-only mode now, so= that the build system cannot change anything in the source tree. On top of t= hat, cache is read-only, too. ccache and the log directory are the only place= s that are writable. >>>>>>>> >>>>>>>> We mount a separate /tmp directory. >>>>>>>> >>>>>>>> * When we then build a package, we create more namespaces for each p= ackage. These isolate each build process from each other. >>>>>>>> >>>>>>>> Mostly, this is to detach from the host system. A new UTS namespace = allows to change the hostname in the build system without affecting the host = and so on. We do the same thing with a new time namespace. >>>>>>>> >>>>>>>> We do however create a new PID namespace which means that the build = system no longer will see any processes running on the host system. That requ= ires to mount a new instance of /proc with each package. This also has the ef= fect that if the shell that we launched terminates (because the build is done= ) any background processes will be killed immediately. >>>>>>>> >>>>>>>> Last, we clone the mount namespace that we have created before so th= at no build command can modify what we set up earlier. >>>>>>>> >>>>>>>> * Since everything is now so decoupled, we gain a couple of new (may= be minor?) features: >>>>>>>> >>>>>>>> =C2=A0=C2=A0 It is now possible to run =E2=80=9Cmake.sh shell=E2=80= =9D while a build is running. That does not happen a lot, but we can do this = now :) >>>>>>>> >>>>>>>> =C2=A0=C2=A0 If the build crashes or the host system is being shut d= own while a build is running, there is nothing to clean up afterwards. >>>>>>>> >>>>>>>> * I have garnished this all with a lot of code cleanup and I suppose= I might have introduced some new bugs here or there :) >>>>>>>> >>>>>>>> * This is probably mostly around a new implementation of the timer t= hat updates the build time. It has been annoying me a lot that it takes a lon= g time to walk through all packages that have been built before to finally ge= t to a package that we want to rebuild. Mostly this was all help up by a call= of =E2=80=9Csleep 0.1=E2=80=9D >>>>>>>> >>>>>>>> Since bash does not really do any concurrency, I had to be creative = and replaced the busy-loop with a background process that is launched wheneve= r it is needed and which will =E2=80=9Cping=E2=80=9D the main make.sh script = once a second. That way, we can just run as usual, but regularly get interrup= ted to update the runtime. >>>>>>>> >>>>>>>> https://git.ipfire.org/?p=3Dpeople/ms/ipfire-2.x.git;a=3Dblob;f=3Dma= ke.sh;h=3Db952627782a0d5ef4ac75f17315b689fcb3b4fe0;hb=3Drefs/heads/unshare#l3= 61 >>>>>>>> https://git.ipfire.org/?p=3Dpeople/ms/ipfire-2.x.git;a=3Dblob;f=3Dma= ke.sh;h=3Db952627782a0d5ef4ac75f17315b689fcb3b4fe0;hb=3Drefs/heads/unshare#l8= 34 >>>>>>>> >>>>>>>> We now only fork one extra sub shell and we have to handle the timer= events which is a lot cheaper as well as more straight-forward to code. >>>>>>>> >>>>>>>> * As there is no difference between the different stages any more (t= hose stages that we inherited from LFS), I have merged them all into one. >>>>>>>> >>>>>>>> * Last but not least, I have create the option to build for multiple= architectures on the same system. Since we can now mount the entire source t= ree into (many independent) build environments, we might as well=E2=80=A6 As = discussed on the last call, this might not be the best option for ARM, but RI= SV-C builds at a decent speed even when emulated. >>>>>>>> >>>>>>>> The only thing that I needed to do for this is to suffix the build a= nd log directories which are now called build_${ARCH}, i.e. build_aarch64, bu= ild_x86_64, and so on. The packages/ directory is not changed yet, but that w= ill have to happen as well. Most likely I want to merge this with the generat= ed images, but I am not sure what to call this, yet. Happy to hear suggestion= s. result_x86_64? Just images_x86_64? >>>>>>>> >>>>>>>> --- >>>>>>>> >>>>>>>> I have run a build and this seems to be working just fine on my Debi= an machine. I am writing to all of you to first of all let you know what I am= up to; and secondly to ask to give this a go on your systems. I think it sho= uld run just fine, as all the tools that I require should be available everyw= here. However, there might be some older kernels that might not support all o= f this, yet or any other problems I cannot think of yet. Please give me some = feedback and send me all the bugs :) >>>>>>> I gave this a go but it didn't work. >>>>>>> >>>>>>> Not sure if I should have run the ./make.sh clean command on the old = version before I pulled the unshare branch into my clone of your repo. >>>>>>> >>>>>>> Should I have started with a complete new clone of your repo? I might= try that anyway just to see. >>>>>>> >>>>>> I created a completely new clone of you ipfire-2.x repor and then chec= ked out the unshare branch to a branch called unshare in my local repo clone. >>>>>> >>>>>> gettoolchain gave the same issue, except that this time the toolchain = directory ended up completely empty. >>>>>> >>>>>> downloadsrc had the same result. >>>>>> >>>>>> clean had nothing to clean up as it was a fresh clone. >>>>>> >>>>>> build then tried to build the toolchain and came up with this error, d= ifferent from before. >>>>>> >>>>>> ./make.sh build >>>>>> chroot: failed to run command =E2=80=98env=E2=80=99: No such file or d= irectory >>>>>> Full toolchain compilation >>>>>> stage1 [ FAIL ] >>>>>> >>>>>> =C2=A0=C2=A0=C2=A0 Jul=C2=A0 8 19:26:39: Building stage1 =3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D Installing stage1 ... >>>>>> =C2=A0=C2=A0=C2=A0 mkdir -pv /tools_x86_64/lib >>>>>> =C2=A0=C2=A0=C2=A0 mkdir: cannot create directory '/tools_x86_64': Fil= e exists >>>>>> =C2=A0=C2=A0=C2=A0 make: *** [stage1:50: /home/ahb/sandbox/ms/ipfire-2= .x/log/stage1] Error 1 >>>>>> >>>>>> ERROR: Building stage1 [ FAIL ] >>>>>> =C2=A0=C2=A0=C2=A0 Check /home/ahb/sandbox/ms/ipfire-2.x/log_x86_64/_b= uild.toolchain.log for errors if applicable [ FAIL ] >>>>>> >>>>>> so it wasn't as simple as doing a fresh git clone. >>>>>> >>>>>> Regards, >>>>>> >>>>>> Adolf. >>>>>> >>>>>> >>>>>>> So I ran ./make.sh gettoolchain first, as I usually would. >>>>>>> >>>>>>> ./make.sh gettoolchain >>>>>>> b2sum: cache/toolchains/ipfire-2.29-toolchain-20240521-x86_64.tar.zst= : No such file or directory >>>>>>> cache/toolchains/ipfire-2.29-toolchain-20240521-x86_64.tar.zst: FAILE= D open or read >>>>>>> b2sum: WARNING: 1 listed file could not be read >>>>>>> >>>>>>> ipfire-2.29-toolchain-20240210-x86_64.tar.zst is present together wit= h its b2 file. >>>>>>> >>>>>>> >>>>>>> Then ran ./make.sh downloadsrc >>>>>>> >>>>>>> Previous version ends with >>>>>>> >>>>>>> ***Verifying BLAKE2 checksum >>>>>>> all files BLAKE2 checksum match [ DONE] >>>>>>> >>>>>>> after zstd has been checked. >>>>>>> >>>>>>> New version stops at zstd entry. >>>>>>> >>>>>>> >>>>>>> ./make.sh clean gave the message Cleaning Build directory... but was = completed very quickly. >>>>>>> Log and Build directories have not been cleaned out. The img and iso = files are still present. >>>>>>> >>>>>>> >>>>>>> ./make.sh build gave message >>>>>>> >>>>>>> chroot: failed to run command =E2=80=98env=E2=80=99: No such file or = directory >>>>>>> >>>>>>> and then did a full toolchain compilation which failed with gcc but l= og is >9000 lines. >>>>>>> >>>>>>> >>>>>>> Regards, >>>>>>> Adolf. >>>>>>> >>>>>>>> >>>>>>>> Thank you for listening to this brain-dump. >>>>>>>> >>>>>>>> All the best, >>>>>>>> -Michael >>>>>>>> >>>>>>>>> On 3 Jul 2024, at 10:58, Michael Tremer wrote: >>>>>>>>> >>>>>>>>> Hello Adolf, >>>>>>>>> >>>>>>>>> This happens occasionally that the buildsystem umounts /dev and the= n nothing will really work any more. >>>>>>>>> >>>>>>>>> I rebooted the machine and it is back up again. >>>>>>>>> >>>>>>>>> -Michael >>>>>>>>> >>>>>>>>>> On 2 Jul 2024, at 15:42, Adolf Belka wr= ote: >>>>>>>>>> >>>>>>>>>> Hi Michael and all, >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I ran the arm builder with the 4.20.2 version of samba to test it = out. >>>>>>>>>> >>>>>>>>>> The build got to building gdb and then failed. >>>>>>>>>> >>>>>>>>>> Interestingly, the nightly build of arm was successful with the sa= me version of gdb. >>>>>>>>>> >>>>>>>>>> The build log for gdb is attached. The actual error is at line 618. >>>>>>>>>> >>>>>>>>>> Another thing I found is that I just tried to go back into the arm= builder. I successfully got into people.ipfire.org but then trying to scp in= to the arm builder failed with the following message. >>>>>>>>>> >>>>>>>>>> ------------------------------------------ >>>>>>>>>> >>>>>>>>>> ssh bonnietwin(a)arm64-01.zrh.ipfire.org >>>>>>>>>> PTY allocation request failed on channel 0 >>>>>>>>>> Linux arm64-01.zrh.ipfire.org 6.1.0-21-cloud-arm64 #1 SMP Debian 6= .1.90-1 (2024-05-03) aarch64 >>>>>>>>>> >>>>>>>>>> The programs included with the Debian GNU/Linux system are free so= ftware; >>>>>>>>>> the exact distribution terms for each program are described in the >>>>>>>>>> individual files in /usr/share/doc/*/copyright. >>>>>>>>>> >>>>>>>>>> Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent >>>>>>>>>> permitted by applicable law. >>>>>>>>>> /etc/profile.d/Z99-cloud-locale-test.sh: line 14: /dev/null: Permi= ssion denied >>>>>>>>>> >>>>>>>>>> ------------------------------------------ >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> >>>>>>>>>> Adolf. >>>>>>>>>> >>>>>>>>>> <_build.ipfire.gdb.log> >>>>> >>>>> >>>> --===============2201970846166765883==--