From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail02.haj.ipfire.org (localhost [127.0.0.1]) by mail02.haj.ipfire.org (Postfix) with ESMTP id 4ZLFB830vlz332b for ; Sun, 23 Mar 2025 12:00:32 +0000 (UTC) Received: from mail01.ipfire.org (mail01.haj.ipfire.org [172.28.1.202]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) client-signature RSA-PSS (4096 bits)) (Client CN "mail01.haj.ipfire.org", Issuer "R10" (verified OK)) by mail02.haj.ipfire.org (Postfix) with ESMTPS id 4ZLFB45wlVz32vy for ; Sun, 23 Mar 2025 12:00:28 +0000 (UTC) Received: from [127.0.0.1] (localhost [127.0.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail01.ipfire.org (Postfix) with ESMTPSA id 4ZLFB40pLDz17q; Sun, 23 Mar 2025 12:00:28 +0000 (UTC) DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=ipfire.org; s=202003ed25519; t=1742731228; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=HB4HPYFxC9jDzcEyVTpEDSymtG4ShRKfoOSEHc08ok4=; b=/GgdcQ1U5QPz9bGmjLOVh1Vp5R+LKqCuUPQAiXBnDGgl7YPQKHllOejgwGKlDcFD2P4eXR ZgAB1g7ptlN+mECA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ipfire.org; s=202003rsa; t=1742731228; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=HB4HPYFxC9jDzcEyVTpEDSymtG4ShRKfoOSEHc08ok4=; b=Y+Wd+pkesJloA2G/OvowO0SEFoHHIMi8ACkz7IQ9DwtnAwTCTq8Pokd58dosOUj+vctm/h YiWc4GmLijYqftQMsgp1OilQZ+isvkqIwly8UIPD+ZADNZgaLTvpxOjBSPpTaaEzrlNAQp D/k99R8lHOLlY/0wrZvnDwlmWrawlPqoamvAvTx5ihf+EX/RwjwTNBxGn+P7Wqros8Fppg H9eOpdUBTUL0VZp5A+36RygjURsadTDqEEAKQcIWMxleXVDzuEBHPzlbgQMsCf79VsV3aw aAeHgcyRRXq1lxfRP2xvzVhNVjrvVLLiqWP8AWhXiH75S53Z5UHMv6Z3G+YhCA== Content-Type: text/plain; charset=utf-8 Precedence: list List-Id: List-Subscribe: , List-Unsubscribe: , List-Post: List-Help: Sender: Mail-Followup-To: Mime-Version: 1.0 Subject: Re: This Week In Pakfire: Jail From: Michael Tremer In-Reply-To: Date: Sun, 23 Mar 2025 12:00:27 +0000 Cc: development@lists.ipfire.org Content-Transfer-Encoding: quoted-printable Message-Id: <5E237127-DA4D-49B0-AF3F-3DDD438DF819@ipfire.org> References: <54513329-A288-4D7A-96E6-110DFD38F06C@ipfire.org> To: Adolf Belka Hello Adolf, Thank you for getting back to me and opening up some discussion. > On 22 Mar 2025, at 11:31, Adolf Belka wrote: >=20 > The previous blog post on mirrors was interesting. I understood in = general about mirrors and how they worked so it gave some flesh to the = specifics for IPFire. Great! This is exactly what I am intending. Because you will be one of = the people who will be using Pakfire a lot, it massively helps if you = know a couple of things about the internals. It helps to understand what = is possible and what isn=E2=80=99t; and if something does not look right = it might all give you a starting point where to look first. > This post on the jail aspects of the IPFire3 build system was really = interesting. I knew there had to be some isolation of the build system = from the host to protect it but this really helped me visualise how it = worked and how multi faceted it is and has to be. Painful I used to call it, but it is a rather complex thing which on the = data sheet will probably only be mentioned as =E2=80=9Csandbox=E2=80=9D. = I think that term is slightly broken because so many people considered = so many things a sandbox, but my objective was something that actually = is solid enough so that we don=E2=80=99t have to worry about = compromising our infrastructure. > This was definitely all new stuff to me. I have never really used any = jail techniques yet. >=20 > Thanks very much. >=20 > Adolf. >=20 >=20 > On 21/03/2025 09:00, Michael Tremer wrote: >> When building a package, Pakfire is using a novel approach to achieve = two main goals: >> =E2=80=A2 Have a predictable and individually configure build = environment so that all tools are available that are needed to build a = certain package >> =E2=80=A2 Isolate everything from the host system so that the = host cannot be compromised, either accidentally or by building malicious = code >> This is done by an internal jail that Pakfire is using. If you will, = it is a glorified chroot environment, or a little container that can be = created very quickly which has a lot of features to separate the build = environment from the host. >> This is done with using Linux namespaces which are probably best = known for working inside Docker and other container technologies. = Usually, when a container is spawned, it is built from a certain image = that is being downloaded, extracted and run. Communication usually will = then only happen over the network. >> That is however not suitable for Pakfire. We need to have a lot of = control over the jail and we want to send commands to it and read logs = and status. Therefore we could not use any off-the-shelf solution and I = built our own. >> So how does it work? Pakfire has a single-threaded design. The build = process is called by the developer from the command line and runs with = root permissions (it would be nice to run this as a non-privileged user, = but there are some hurdles that I we have to take which have not been = implemented, yet). It will then, whenever needed, fork a second process = with is set up with lower privileges in order to perform the build. That = way, we can run any kind of untrusted code and can be sure that there is = no way to ever change the configuration of the build environment or host = system itself. A feature which will enable a lot of other features for = us later. >> When that process is forked, a number of new namespaces are being = created. All is starting with a new user namespace. In that new user = namespace, the less privileged process will be running us root, but it = will not actually have all root permissions - it will only be root in = its own demise the kernel automatically knows that it won=E2=80=99t = allow any special admin tasks whatsoever (like loading kernel modules). >> There will be a new IPC namespace which governs inter-process = communication to prevent the process from talking to any other services. = A new PID namespace will ensure that inside the jail, the first process = will be running as PID 1 and it won=E2=80=99t see any other processes = running on the host system or in any other concurrently running jails. A = new time namespace will allow to set the time to whatever you like = inside the jail without affecting the time of the host as it is = sometimes needed in test suites. And it is even possible to set a = different hostname inside the jail because of a new UTS namespace. >> As we do it in IPFire 2, there is no network connectivity from the = build environment. We don=E2=80=99t want anything to download any code = that could either be malicious or at least is not tracked by Pakfire = itself; nor does this help with reproducibility of the packages. This is = achieved by setting up a new networking namespace which only has a = loopback device, so that testsuites can run servers which can bind to an = IP address and being talked to by a client application. However, there = is the option to run a build environment in interactive mode where the = developer gets a shell to test things manually. In this special case, we = are able to enable networking support so that code can be downloaded, = additional packages can be installed, and files can be uploaded, too. >> Finally, there is a new mount namespace created which ensures that = /dev, /proc, /sys are isolated. The root filesystem of the host cannot = be accessed at all. Instead, Pakfire will create an own build root = environment which will be a topic for another post. >> This is still only the beginning, using a seccomp filter, the jail = will be limited in what syscalls it can execute. The namespaces create = some great isolation, but not everything in the Linux kernel is = namespaces (yet). Therefore, we have a blacklist of some administrative = actions that cannot be executed from inside the jail to further protect = the host system. >> All of these mechanics prohibit certain administrative actions from = inside the jail. But what if something just wants to run a = denial-of-service attack on one of the builders? What if a fork bomb = goes off? What if something is using up all memory? For this case, the = jail is creating its own cgroup - another resource limiting feature of = the Linux kernel. Using this, we limit the maximum number of processes = running inside a jail to only 1024 - which should be plenty. We also = limit (and at the same time guarantee a minimum amount of) memory so = that if there are multiple build processes running simultaneously, one = cannot starve the others out of all CPU and memory resources. Cgroups = even have some accounting features which allow us to read back the CPU = time being used, peak memory consumption, total bytes read and written = and many more things. >> In order to communicate with any process inside the jail, Pakfire is = creating a new PTY which basically creates a new, virtual terminal = inside the jail. That way, we can prevent that even any old security = vulnerabilities (or should I say old features) in the Linux terminal = emulation can be used. This is a rather complex construct of pipes and = various syscalls which once again provide best isolation between the = jail and the host system. >> All in all, this is great building block for running secure, = reproducible builds with untrusted code inside them. This brings huge = value, because developers can run any builds - even though there is = untrusted code in it (and should we *always* trust upstream?), and even = if there is only good code being built, which is the most common case, = obviously we all make mistakes, and now there is zero chance, no matter = how bad things are going, to ever break the host system. >=20 >=20