public inbox for development@lists.ipfire.org
 help / color / mirror / Atom feed
* This Week In Pakfire: Mirror Management
@ 2025-03-14  8:00 Michael Tremer
  0 siblings, 0 replies; only message in thread
From: Michael Tremer @ 2025-03-14  8:00 UTC (permalink / raw)
  To: IPFire: Development-List

This is the start of a small blog series. Yes, blog series. Although this list is not a blog, I think this is the best place to keep everyone in the loop of what I am actually doing. I believe that I have been too much working in the dark for large parts of the community which I don’t want to.

This series might go on for a little while and the posts will be deep dives into some technology that has been built into Pakfire. Feel free to ask me questions if you are interested.

## Mirror Management

Although this is not the most essential feature of Pakfire, it is necessary to ensure that we can provide packages fast and guarantee their integrity for users all over the world.

For that, we will have a number of mirror servers spread all over the globe. They are hosted by universities, web hosting providers, and organisations of all kinds. Although we vet them and they all have a good credentials, there is a chance that their infrastructure might be compromised. With so many mirror servers on the world, how could we possibly keep track of them?

And of course, they are there to fail. The system is designed that mirrors can be down, can serve corrupted files and so on. Pakfire is there to make sure that it finds the fastest, functioning mirror server.

This needs to happen in two places:

The first is when downloading packages. They are small files and they are downloaded fairly often. IPFire is a modular distribution but most people will have the same set of base packages installed. If there is an update, Pakfire will try to download a package from the first mirror server on the mirror list that was provided. If the mirror server responds with a 404, or if there is any other problem, Pakfire will move on to the next mirror server. Simples.

Servers could not have the right files if they are out of sync. To not try an out of sync mirror too often, Pakfire will keep track of how many downloads have failed, and if there have been too many, it will disable the mirror. A mirror might also be disabled immediately if there has been an unrecoverable error, for example an expired TLS certificate, the mirror not responding at all, etc…

During the download, the checksum of the downloaded file will be compared and if it does not match the file that we wanted, we know that the mirror is either out of sync and serving an old file; or there has been a problem where the file has either been corrupted by a filesystem problem or broken hard drive, or has been replaced by some adversary. In that case, we will throw away the package and download it again from another mirror until we have found the right file. This feature allows us to not trust any of the providers and we will also guarantee that nobody else has tempered with the file - like a web proxy.

## How do packages get onto a mirror server?

In the Pakfire Build Service, each repository has a flag that can be enabled to sync them to our master mirror. Usually, we only do this for anything that will be downloaded by a lot of people, like stable releases.

Testing repositories change too often and will be downloaded only by a few people so the sync traffic is not worth it, and since mirrors are very likely to be out of sync with the fast development pace, chances are high that Pakfire will come back to the master mirror anyways.

To allow downloads when the master mirror is down we might want to add maybe a few selected mirrors, but currently this is not important enough to be implemented.

The build service is running on a different host to the master mirror, and so repositories will be generated on a different machine and regularly synced to the master mirror where all other mirror servers are pulling from.

## What about that second place you mentioned?

We will also have to steer people to the right mirror server for them when they are downloading an image - like an ISO file. This currently happens from the build service only, but I can see how this will also handle actual downloads from the main website.

For this, there is a special handler implemented in the build service that has a lot of features. Mainly it will redirect a downloader to a certain mirror based on where they are coming from - by their IP address that is. But this is a rather complicated algorithm. We will select all mirror servers and then order them by a priority. This priority is very different for each client, because it tries to estimate how “close” you are to the mirror.

Proximity on the Internet is hard to determine. The borders of a country don’t matter, so we don’t start here. The algorithm starts with checking if the client is in the same Autonomous System. If so, it will be most preferred, because you will be downloading either from the same provider or building. Nothing should be as fast as this. Then we consider the country code of the client and the mirror. If they match, the mirror will be preferred next. Lastly, we will check if a mirror is on the same continent.

Starting from the closest mirror, the build service will check if the file that we are looking for is available on the mirror and redirect the client. This all happens within milliseconds using IPFire Location, and we will cache whether a mirror had a file available or not.

The same algorithm is used when Pakfire clients will download a mirror list from the build service. This happens at least once every 24 hours and mirrors will be sorted by the closest first. That allows to have the downloading process as described above to be dumb and simply walk through the list from top to bottom. It would be too complicated to measure mirror distance in Pakfire itself.

Ah, and last, the download handler mentioned above can also answer HEAD requests to give download managers some extra meta information and avoid that images will be downloaded if the client thinks it might already have the right file.

You can see the code here - of course it uses all other features we have like rate limiting, etc:

https://git.ipfire.org/?p=pbs.git;a=blob;f=src/web/mirrors.py;h=b68fb6284ae88e6f47bb6bc481f77ad0cf88b28e;hb=HEAD#l129
https://git.ipfire.org/?p=pbs.git;a=blob;f=src/buildservice/mirrors.py;h=5456535c91051bb8c97e953ad11e3d21c101738a;hb=HEAD#l317

I believe that all this will set us up very nicely to ensure that people can download IPFire fast and are guaranteed to get the right software that has not been injected with malware or just accidentally corrupted.

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2025-03-14  8:01 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-03-14  8:00 This Week In Pakfire: Mirror Management Michael Tremer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox