From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail02.haj.ipfire.org (localhost [127.0.0.1]) by mail02.haj.ipfire.org (Postfix) with ESMTP id 4ZJvxh2dDmz332d for ; Fri, 21 Mar 2025 08:00:08 +0000 (UTC) Received: from mail01.ipfire.org (mail01.haj.ipfire.org [172.28.1.202]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mail01.haj.ipfire.org", Issuer "R10" (verified OK)) by mail02.haj.ipfire.org (Postfix) with ESMTPS id 4ZJvxc5cjLz32vw for ; Fri, 21 Mar 2025 08:00:04 +0000 (UTC) Received: from [127.0.0.1] (localhost [127.0.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail01.ipfire.org (Postfix) with ESMTPSA id 4ZJvxb6NbLz5S4 for ; Fri, 21 Mar 2025 08:00:03 +0000 (UTC) DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=ipfire.org; s=202003ed25519; t=1742544004; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=Tz4zaml30fL0ZkscsdcLK8Z8CUMvR6EHWKrGY9iUxr4=; b=vjsuct0DYP2rZXJ+PPTGIirovv8+w8Wj0fNoCWeFWfhwFjtCgCN9MSKejjZQWlkeK5kaW4 fI2XYCGqbg2E+vBg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ipfire.org; s=202003rsa; t=1742544004; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=Tz4zaml30fL0ZkscsdcLK8Z8CUMvR6EHWKrGY9iUxr4=; b=vw3asxpmKtcHjuyphCKkhFqxTgG918l6OMvl/QD16UCtN9ObH079PyiA2ZSZ9Vp4/3WLwc M+KDseoi3QeReT3as+N1Pi8f0ttQTtDWa2iIJw9KcvQIDXcmaKcqgeKdtMAxx/bhdwj80Y Q9aG75Byfvhu+Vf8US6/X+6Fo2rkNvlBO4tC+osO6h1c2F1db3bUWYp7fOpPka6hH7zNLF ju+c4uqieGUjKVKtOaa6cRb/M42Xxfu33AFwe/7pMBWEmluyrY8qMNlJVBLtw2L3MZD5sq 21nwXpjt4tDT6YIoXo6RHoyuGf1aJ1G/6wjbvNT3y3B3/ta7/UPqVIoDPaOp2w== From: Michael Tremer Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Precedence: list List-Id: List-Subscribe: , List-Unsubscribe: , List-Post: List-Help: Sender: Mail-Followup-To: Mime-Version: 1.0 Subject: This Week In Pakfire: Jail Message-Id: <54513329-A288-4D7A-96E6-110DFD38F06C@ipfire.org> Date: Fri, 21 Mar 2025 08:00:02 +0000 To: "IPFire: Development-List" When building a package, Pakfire is using a novel approach to achieve = two main goals: =E2=80=A2 Have a predictable and individually configure build = environment so that all tools are available that are needed to build a = certain package =E2=80=A2 Isolate everything from the host system so that the host = cannot be compromised, either accidentally or by building malicious code This is done by an internal jail that Pakfire is using. If you will, it = is a glorified chroot environment, or a little container that can be = created very quickly which has a lot of features to separate the build = environment from the host. =20 This is done with using Linux namespaces which are probably best known = for working inside Docker and other container technologies. Usually, = when a container is spawned, it is built from a certain image that is = being downloaded, extracted and run. Communication usually will then = only happen over the network. That is however not suitable for Pakfire. We need to have a lot of = control over the jail and we want to send commands to it and read logs = and status. Therefore we could not use any off-the-shelf solution and I = built our own. So how does it work? Pakfire has a single-threaded design. The build = process is called by the developer from the command line and runs with = root permissions (it would be nice to run this as a non-privileged user, = but there are some hurdles that I we have to take which have not been = implemented, yet). It will then, whenever needed, fork a second process = with is set up with lower privileges in order to perform the build. That = way, we can run any kind of untrusted code and can be sure that there is = no way to ever change the configuration of the build environment or host = system itself. A feature which will enable a lot of other features for = us later. When that process is forked, a number of new namespaces are being = created. All is starting with a new user namespace. In that new user = namespace, the less privileged process will be running us root, but it = will not actually have all root permissions - it will only be root in = its own demise the kernel automatically knows that it won=E2=80=99t = allow any special admin tasks whatsoever (like loading kernel modules). There will be a new IPC namespace which governs inter-process = communication to prevent the process from talking to any other services. = A new PID namespace will ensure that inside the jail, the first process = will be running as PID 1 and it won=E2=80=99t see any other processes = running on the host system or in any other concurrently running jails. A = new time namespace will allow to set the time to whatever you like = inside the jail without affecting the time of the host as it is = sometimes needed in test suites. And it is even possible to set a = different hostname inside the jail because of a new UTS namespace. As we do it in IPFire 2, there is no network connectivity from the build = environment. We don=E2=80=99t want anything to download any code that = could either be malicious or at least is not tracked by Pakfire itself; = nor does this help with reproducibility of the packages. This is = achieved by setting up a new networking namespace which only has a = loopback device, so that testsuites can run servers which can bind to an = IP address and being talked to by a client application. However, there = is the option to run a build environment in interactive mode where the = developer gets a shell to test things manually. In this special case, we = are able to enable networking support so that code can be downloaded, = additional packages can be installed, and files can be uploaded, too. Finally, there is a new mount namespace created which ensures that /dev, = /proc, /sys are isolated. The root filesystem of the host cannot be = accessed at all. Instead, Pakfire will create an own build root = environment which will be a topic for another post. This is still only the beginning, using a seccomp filter, the jail will = be limited in what syscalls it can execute. The namespaces create some = great isolation, but not everything in the Linux kernel is namespaces = (yet). Therefore, we have a blacklist of some administrative actions = that cannot be executed from inside the jail to further protect the host = system. =20 All of these mechanics prohibit certain administrative actions from = inside the jail. But what if something just wants to run a = denial-of-service attack on one of the builders? What if a fork bomb = goes off? What if something is using up all memory? For this case, the = jail is creating its own cgroup - another resource limiting feature of = the Linux kernel. Using this, we limit the maximum number of processes = running inside a jail to only 1024 - which should be plenty. We also = limit (and at the same time guarantee a minimum amount of) memory so = that if there are multiple build processes running simultaneously, one = cannot starve the others out of all CPU and memory resources. Cgroups = even have some accounting features which allow us to read back the CPU = time being used, peak memory consumption, total bytes read and written = and many more things. In order to communicate with any process inside the jail, Pakfire is = creating a new PTY which basically creates a new, virtual terminal = inside the jail. That way, we can prevent that even any old security = vulnerabilities (or should I say old features) in the Linux terminal = emulation can be used. This is a rather complex construct of pipes and = various syscalls which once again provide best isolation between the = jail and the host system. All in all, this is great building block for running secure, = reproducible builds with untrusted code inside them. This brings huge = value, because developers can run any builds - even though there is = untrusted code in it (and should we *always* trust upstream?), and even = if there is only good code being built, which is the most common case, = obviously we all make mistakes, and now there is zero chance, no matter = how bad things are going, to ever break the host system.=