Hello,
I have been spending a lot of time on this problem, because it has been bothering me for a long time. I also saw an opportunity to make more changes to the build system.
Currently this is all a little bit WIP, but I hope that we can merge this into next as soon as the current update has been moved to master.
I am referring to this branch which is currently based on next: https://git.ipfire.org/?p=people/ms/ipfire-2.x.git;a=shortlog;h=refs/heads/u...
It makes use of the unshare command which creates new namespaces in Linux. That way, we can isolate the build system better from the host system and in case something goes wrong, there is less damage. We can also enforce some more rules…
So, what has changed?
* The make.sh script might re-execute itself into a new mount namespace when it is suitable. This happens for “make.sh build” and “make.sh shell”, but it does not happen for “make.sh downloadsrc” for example.
https://git.ipfire.org/?p=people/ms/ipfire-2.x.git;a=blob;f=make.sh;h=b95262... https://git.ipfire.org/?p=people/ms/ipfire-2.x.git;a=blob;f=make.sh;h=b95262...
* The new mount namespace means that we will no longer see any bind-mounts in the host system and we no longer need to umount anything ourselves which is where we occasionally wiped the entire hard drive of the host system. When the last process exits, the namespace is being cleaned up and everything is being umounted.
* The function that prepares the build environment has been almost entirely rewritten:
https://git.ipfire.org/?p=people/ms/ipfire-2.x.git;a=blob;f=make.sh;h=b95262...
It used to mount parts of the host system into the build environment which are needed to run anything. Those were /dev, /proc, /sys, etc…
Instead, it now creates a new /dev mount point and creates a minimal amount of device nodes and symlinks. That way, we detach from the host system and no longer allow the build system access to the host’s filesystem and block devices. We also bind-mount the sources in read-only mode now, so that the build system cannot change anything in the source tree. On top of that, cache is read-only, too. ccache and the log directory are the only places that are writable.
We mount a separate /tmp directory.
* When we then build a package, we create more namespaces for each package. These isolate each build process from each other.
Mostly, this is to detach from the host system. A new UTS namespace allows to change the hostname in the build system without affecting the host and so on. We do the same thing with a new time namespace.
We do however create a new PID namespace which means that the build system no longer will see any processes running on the host system. That requires to mount a new instance of /proc with each package. This also has the effect that if the shell that we launched terminates (because the build is done) any background processes will be killed immediately.
Last, we clone the mount namespace that we have created before so that no build command can modify what we set up earlier.
* Since everything is now so decoupled, we gain a couple of new (maybe minor?) features:
It is now possible to run “make.sh shell” while a build is running. That does not happen a lot, but we can do this now :)
If the build crashes or the host system is being shut down while a build is running, there is nothing to clean up afterwards.
* I have garnished this all with a lot of code cleanup and I suppose I might have introduced some new bugs here or there :)
* This is probably mostly around a new implementation of the timer that updates the build time. It has been annoying me a lot that it takes a long time to walk through all packages that have been built before to finally get to a package that we want to rebuild. Mostly this was all help up by a call of “sleep 0.1”
Since bash does not really do any concurrency, I had to be creative and replaced the busy-loop with a background process that is launched whenever it is needed and which will “ping” the main make.sh script once a second. That way, we can just run as usual, but regularly get interrupted to update the runtime.
https://git.ipfire.org/?p=people/ms/ipfire-2.x.git;a=blob;f=make.sh;h=b95262... https://git.ipfire.org/?p=people/ms/ipfire-2.x.git;a=blob;f=make.sh;h=b95262...
We now only fork one extra sub shell and we have to handle the timer events which is a lot cheaper as well as more straight-forward to code.
* As there is no difference between the different stages any more (those stages that we inherited from LFS), I have merged them all into one.
* Last but not least, I have create the option to build for multiple architectures on the same system. Since we can now mount the entire source tree into (many independent) build environments, we might as well… As discussed on the last call, this might not be the best option for ARM, but RISV-C builds at a decent speed even when emulated.
The only thing that I needed to do for this is to suffix the build and log directories which are now called build_${ARCH}, i.e. build_aarch64, build_x86_64, and so on. The packages/ directory is not changed yet, but that will have to happen as well. Most likely I want to merge this with the generated images, but I am not sure what to call this, yet. Happy to hear suggestions. result_x86_64? Just images_x86_64?
---
I have run a build and this seems to be working just fine on my Debian machine. I am writing to all of you to first of all let you know what I am up to; and secondly to ask to give this a go on your systems. I think it should run just fine, as all the tools that I require should be available everywhere. However, there might be some older kernels that might not support all of this, yet or any other problems I cannot think of yet. Please give me some feedback and send me all the bugs :)
Thank you for listening to this brain-dump.
All the best, -Michael
On 3 Jul 2024, at 10:58, Michael Tremer michael.tremer@ipfire.org wrote:
Hello Adolf,
This happens occasionally that the buildsystem umounts /dev and then nothing will really work any more.
I rebooted the machine and it is back up again.
-Michael
On 2 Jul 2024, at 15:42, Adolf Belka adolf.belka@ipfire.org wrote:
Hi Michael and all,
I ran the arm builder with the 4.20.2 version of samba to test it out.
The build got to building gdb and then failed.
Interestingly, the nightly build of arm was successful with the same version of gdb.
The build log for gdb is attached. The actual error is at line 618.
Another thing I found is that I just tried to go back into the arm builder. I successfully got into people.ipfire.org but then trying to scp into the arm builder failed with the following message.
ssh bonnietwin@arm64-01.zrh.ipfire.org PTY allocation request failed on channel 0 Linux arm64-01.zrh.ipfire.org 6.1.0-21-cloud-arm64 #1 SMP Debian 6.1.90-1 (2024-05-03) aarch64
The programs included with the Debian GNU/Linux system are free software; the exact distribution terms for each program are described in the individual files in /usr/share/doc/*/copyright.
Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent permitted by applicable law. /etc/profile.d/Z99-cloud-locale-test.sh: line 14: /dev/null: Permission denied
Regards,
Adolf.
<_build.ipfire.gdb.log>