This Week In Pakfire: Jail

public inbox for development@lists.ipfire.org
 help / color / mirror / Atom feed

* This Week In Pakfire: Jail
@ 2025-03-21  8:00 Michael Tremer
  2025-03-22 11:31 ` Adolf Belka
  0 siblings, 1 reply; 3+ messages in thread
From: Michael Tremer @ 2025-03-21  8:00 UTC (permalink / raw)
  To: IPFire: Development-List

When building a package, Pakfire is using a novel approach to achieve two main goals:

    • Have a predictable and individually configure build environment so that all tools are available that are needed to build a certain package
    • Isolate everything from the host system so that the host cannot be compromised, either accidentally or by building malicious code

This is done by an internal jail that Pakfire is using. If you will, it is a glorified chroot environment, or a little container that can be created very quickly which has a lot of features to separate the build environment from the host.  
This is done with using Linux namespaces which are probably best known for working inside Docker and other container technologies. Usually, when a container is spawned, it is built from a certain image that is being downloaded, extracted and run. Communication usually will then only happen over the network.

That is however not suitable for Pakfire. We need to have a lot of control over the jail and we want to send commands to it and read logs and status. Therefore we could not use any off-the-shelf solution and I built our own.

So how does it work? Pakfire has a single-threaded design. The build process is called by the developer from the command line and runs with root permissions (it would be nice to run this as a non-privileged user, but there are some hurdles that I we have to take which have not been implemented, yet). It will then, whenever needed, fork a second process with is set up with lower privileges in order to perform the build. That way, we can run any kind of untrusted code and can be sure that there is no way to ever change the configuration of the build environment or host system itself. A feature which will enable a lot of other features for us later.

When that process is forked, a number of new namespaces are being created. All is starting with a new user namespace. In that new user namespace, the less privileged process will be running us root, but it will not actually have all root permissions - it will only be root in its own demise the kernel automatically knows that it won’t allow any special admin tasks whatsoever (like loading kernel modules).

There will be a new IPC namespace which governs inter-process communication to prevent the process from talking to any other services. A new PID namespace will ensure that inside the jail, the first process will be running as PID 1 and it won’t see any other processes running on the host system or in any other concurrently running jails. A new time namespace will allow to set the time to whatever you like inside the jail without affecting the time of the host as it is sometimes needed in test suites. And it is even possible to set a different hostname inside the jail because of a new UTS namespace.

As we do it in IPFire 2, there is no network connectivity from the build environment. We don’t want anything to download any code that could either be malicious or at least is not tracked by Pakfire itself; nor does this help with reproducibility of the packages. This is achieved by setting up a new networking namespace which only has a loopback device, so that testsuites can run servers which can bind to an IP address and being talked to by a client application. However, there is the option to run a build environment in interactive mode where the developer gets a shell to test things manually. In this special case, we are able to enable networking support so that code can be downloaded, additional packages can be installed, and files can be uploaded, too.

Finally, there is a new mount namespace created which ensures that /dev, /proc, /sys are isolated. The root filesystem of the host cannot be accessed at all. Instead, Pakfire will create an own build root environment which will be a topic for another post.

This is still only the beginning, using a seccomp filter, the jail will be limited in what syscalls it can execute. The namespaces create some great isolation, but not everything in the Linux kernel is namespaces (yet). Therefore, we have a blacklist of some administrative actions that cannot be executed from inside the jail to further protect the host system.  
All of these mechanics prohibit certain administrative actions from inside the jail. But what if something just wants to run a denial-of-service attack on one of the builders? What if a fork bomb goes off? What if something is using up all memory? For this case, the jail is creating its own cgroup - another resource limiting feature of the Linux kernel. Using this, we limit the maximum number of processes running inside a jail to only 1024 - which should be plenty. We also limit (and at the same time guarantee a minimum amount of) memory so that if there are multiple build processes running simultaneously, one cannot starve the others out of all CPU and memory resources. Cgroups even have some accounting features which allow us to read back the CPU time being used, peak memory consumption, total bytes read and written and many more things.

In order to communicate with any process inside the jail, Pakfire is creating a new PTY which basically creates a new, virtual terminal inside the jail. That way, we can prevent that even any old security vulnerabilities (or should I say old features) in the Linux terminal emulation can be used. This is a rather complex construct of pipes and various syscalls which once again provide best isolation between the jail and the host system.

All in all, this is great building block for running secure, reproducible builds with untrusted code inside them. This brings huge value, because developers can run any builds - even though there is untrusted code in it (and should we *always* trust upstream?), and even if there is only good code being built, which is the most common case, obviously we all make mistakes, and now there is zero chance, no matter how bad things are going, to ever break the host system.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: This Week In Pakfire: Jail
  2025-03-21  8:00 This Week In Pakfire: Jail Michael Tremer
@ 2025-03-22 11:31 ` Adolf Belka
  2025-03-23 12:00   ` Michael Tremer
  0 siblings, 1 reply; 3+ messages in thread
From: Adolf Belka @ 2025-03-22 11:31 UTC (permalink / raw)
  To: development

The previous blog post on mirrors was interesting. I understood in general about mirrors and how they worked so it gave some flesh to the specifics for IPFire.

This post on the jail aspects of the IPFire3 build system was really interesting. I knew there had to be some isolation of the build system from the host to protect it but this really helped me visualise how it worked and how multi faceted it is and has to be.

This was definitely all new stuff to me. I have never really used any jail techniques yet.

Thanks very much.

Adolf.


On 21/03/2025 09:00, Michael Tremer wrote:
> When building a package, Pakfire is using a novel approach to achieve two main goals:
> 
>      • Have a predictable and individually configure build environment so that all tools are available that are needed to build a certain package
>      • Isolate everything from the host system so that the host cannot be compromised, either accidentally or by building malicious code
> 
> This is done by an internal jail that Pakfire is using. If you will, it is a glorified chroot environment, or a little container that can be created very quickly which has a lot of features to separate the build environment from the host.
> This is done with using Linux namespaces which are probably best known for working inside Docker and other container technologies. Usually, when a container is spawned, it is built from a certain image that is being downloaded, extracted and run. Communication usually will then only happen over the network.
> 
> That is however not suitable for Pakfire. We need to have a lot of control over the jail and we want to send commands to it and read logs and status. Therefore we could not use any off-the-shelf solution and I built our own.
> 
> So how does it work? Pakfire has a single-threaded design. The build process is called by the developer from the command line and runs with root permissions (it would be nice to run this as a non-privileged user, but there are some hurdles that I we have to take which have not been implemented, yet). It will then, whenever needed, fork a second process with is set up with lower privileges in order to perform the build. That way, we can run any kind of untrusted code and can be sure that there is no way to ever change the configuration of the build environment or host system itself. A feature which will enable a lot of other features for us later.
> 
> When that process is forked, a number of new namespaces are being created. All is starting with a new user namespace. In that new user namespace, the less privileged process will be running us root, but it will not actually have all root permissions - it will only be root in its own demise the kernel automatically knows that it won’t allow any special admin tasks whatsoever (like loading kernel modules).
> 
> There will be a new IPC namespace which governs inter-process communication to prevent the process from talking to any other services. A new PID namespace will ensure that inside the jail, the first process will be running as PID 1 and it won’t see any other processes running on the host system or in any other concurrently running jails. A new time namespace will allow to set the time to whatever you like inside the jail without affecting the time of the host as it is sometimes needed in test suites. And it is even possible to set a different hostname inside the jail because of a new UTS namespace.
> 
> As we do it in IPFire 2, there is no network connectivity from the build environment. We don’t want anything to download any code that could either be malicious or at least is not tracked by Pakfire itself; nor does this help with reproducibility of the packages. This is achieved by setting up a new networking namespace which only has a loopback device, so that testsuites can run servers which can bind to an IP address and being talked to by a client application. However, there is the option to run a build environment in interactive mode where the developer gets a shell to test things manually. In this special case, we are able to enable networking support so that code can be downloaded, additional packages can be installed, and files can be uploaded, too.
> 
> Finally, there is a new mount namespace created which ensures that /dev, /proc, /sys are isolated. The root filesystem of the host cannot be accessed at all. Instead, Pakfire will create an own build root environment which will be a topic for another post.
> 
> This is still only the beginning, using a seccomp filter, the jail will be limited in what syscalls it can execute. The namespaces create some great isolation, but not everything in the Linux kernel is namespaces (yet). Therefore, we have a blacklist of some administrative actions that cannot be executed from inside the jail to further protect the host system.
> All of these mechanics prohibit certain administrative actions from inside the jail. But what if something just wants to run a denial-of-service attack on one of the builders? What if a fork bomb goes off? What if something is using up all memory? For this case, the jail is creating its own cgroup - another resource limiting feature of the Linux kernel. Using this, we limit the maximum number of processes running inside a jail to only 1024 - which should be plenty. We also limit (and at the same time guarantee a minimum amount of) memory so that if there are multiple build processes running simultaneously, one cannot starve the others out of all CPU and memory resources. Cgroups even have some accounting features which allow us to read back the CPU time being used, peak memory consumption, total bytes read and written and many more things.
> 
> In order to communicate with any process inside the jail, Pakfire is creating a new PTY which basically creates a new, virtual terminal inside the jail. That way, we can prevent that even any old security vulnerabilities (or should I say old features) in the Linux terminal emulation can be used. This is a rather complex construct of pipes and various syscalls which once again provide best isolation between the jail and the host system.
> 
> All in all, this is great building block for running secure, reproducible builds with untrusted code inside them. This brings huge value, because developers can run any builds - even though there is untrusted code in it (and should we *always* trust upstream?), and even if there is only good code being built, which is the most common case, obviously we all make mistakes, and now there is zero chance, no matter how bad things are going, to ever break the host system.



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: This Week In Pakfire: Jail
  2025-03-22 11:31 ` Adolf Belka
@ 2025-03-23 12:00   ` Michael Tremer
  0 siblings, 0 replies; 3+ messages in thread
From: Michael Tremer @ 2025-03-23 12:00 UTC (permalink / raw)
  To: Adolf Belka; +Cc: development

Hello Adolf,

Thank you for getting back to me and opening up some discussion.

> On 22 Mar 2025, at 11:31, Adolf Belka <adolf.belka@ipfire.org> wrote:
> 
> The previous blog post on mirrors was interesting. I understood in general about mirrors and how they worked so it gave some flesh to the specifics for IPFire.

Great! This is exactly what I am intending. Because you will be one of the people who will be using Pakfire a lot, it massively helps if you know a couple of things about the internals. It helps to understand what is possible and what isn’t; and if something does not look right it might all give you a starting point where to look first.

> This post on the jail aspects of the IPFire3 build system was really interesting. I knew there had to be some isolation of the build system from the host to protect it but this really helped me visualise how it worked and how multi faceted it is and has to be.

Painful I used to call it, but it is a rather complex thing which on the data sheet will probably only be mentioned as “sandbox”. I think that term is slightly broken because so many people considered so many things a sandbox, but my objective was something that actually is solid enough so that we don’t have to worry about compromising our infrastructure.

> This was definitely all new stuff to me. I have never really used any jail techniques yet.
> 
> Thanks very much.
> 
> Adolf.
> 
> 
> On 21/03/2025 09:00, Michael Tremer wrote:
>> When building a package, Pakfire is using a novel approach to achieve two main goals:
>>     • Have a predictable and individually configure build environment so that all tools are available that are needed to build a certain package
>>     • Isolate everything from the host system so that the host cannot be compromised, either accidentally or by building malicious code
>> This is done by an internal jail that Pakfire is using. If you will, it is a glorified chroot environment, or a little container that can be created very quickly which has a lot of features to separate the build environment from the host.
>> This is done with using Linux namespaces which are probably best known for working inside Docker and other container technologies. Usually, when a container is spawned, it is built from a certain image that is being downloaded, extracted and run. Communication usually will then only happen over the network.
>> That is however not suitable for Pakfire. We need to have a lot of control over the jail and we want to send commands to it and read logs and status. Therefore we could not use any off-the-shelf solution and I built our own.
>> So how does it work? Pakfire has a single-threaded design. The build process is called by the developer from the command line and runs with root permissions (it would be nice to run this as a non-privileged user, but there are some hurdles that I we have to take which have not been implemented, yet). It will then, whenever needed, fork a second process with is set up with lower privileges in order to perform the build. That way, we can run any kind of untrusted code and can be sure that there is no way to ever change the configuration of the build environment or host system itself. A feature which will enable a lot of other features for us later.
>> When that process is forked, a number of new namespaces are being created. All is starting with a new user namespace. In that new user namespace, the less privileged process will be running us root, but it will not actually have all root permissions - it will only be root in its own demise the kernel automatically knows that it won’t allow any special admin tasks whatsoever (like loading kernel modules).
>> There will be a new IPC namespace which governs inter-process communication to prevent the process from talking to any other services. A new PID namespace will ensure that inside the jail, the first process will be running as PID 1 and it won’t see any other processes running on the host system or in any other concurrently running jails. A new time namespace will allow to set the time to whatever you like inside the jail without affecting the time of the host as it is sometimes needed in test suites. And it is even possible to set a different hostname inside the jail because of a new UTS namespace.
>> As we do it in IPFire 2, there is no network connectivity from the build environment. We don’t want anything to download any code that could either be malicious or at least is not tracked by Pakfire itself; nor does this help with reproducibility of the packages. This is achieved by setting up a new networking namespace which only has a loopback device, so that testsuites can run servers which can bind to an IP address and being talked to by a client application. However, there is the option to run a build environment in interactive mode where the developer gets a shell to test things manually. In this special case, we are able to enable networking support so that code can be downloaded, additional packages can be installed, and files can be uploaded, too.
>> Finally, there is a new mount namespace created which ensures that /dev, /proc, /sys are isolated. The root filesystem of the host cannot be accessed at all. Instead, Pakfire will create an own build root environment which will be a topic for another post.
>> This is still only the beginning, using a seccomp filter, the jail will be limited in what syscalls it can execute. The namespaces create some great isolation, but not everything in the Linux kernel is namespaces (yet). Therefore, we have a blacklist of some administrative actions that cannot be executed from inside the jail to further protect the host system.
>> All of these mechanics prohibit certain administrative actions from inside the jail. But what if something just wants to run a denial-of-service attack on one of the builders? What if a fork bomb goes off? What if something is using up all memory? For this case, the jail is creating its own cgroup - another resource limiting feature of the Linux kernel. Using this, we limit the maximum number of processes running inside a jail to only 1024 - which should be plenty. We also limit (and at the same time guarantee a minimum amount of) memory so that if there are multiple build processes running simultaneously, one cannot starve the others out of all CPU and memory resources. Cgroups even have some accounting features which allow us to read back the CPU time being used, peak memory consumption, total bytes read and written and many more things.
>> In order to communicate with any process inside the jail, Pakfire is creating a new PTY which basically creates a new, virtual terminal inside the jail. That way, we can prevent that even any old security vulnerabilities (or should I say old features) in the Linux terminal emulation can be used. This is a rather complex construct of pipes and various syscalls which once again provide best isolation between the jail and the host system.
>> All in all, this is great building block for running secure, reproducible builds with untrusted code inside them. This brings huge value, because developers can run any builds - even though there is untrusted code in it (and should we *always* trust upstream?), and even if there is only good code being built, which is the most common case, obviously we all make mistakes, and now there is zero chance, no matter how bad things are going, to ever break the host system.
> 
> 



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-03-23 12:00 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-03-21  8:00 This Week In Pakfire: Jail Michael Tremer
2025-03-22 11:31 ` Adolf Belka
2025-03-23 12:00   ` Michael Tremer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox