Hello,
Hi,
[...]
A client could just move on to the next mirror, if the existance of a newer version is already known (mirrors can be out of sync, too). If not, I am not sure what the best practise is - DNS lookups might become handy...
Generally, I think Pakfire should *NOT* rely on DNS. DNS is blocked in a few networks of government agencies where we have IPFire installations and as a result they don't install any updates on anything.
For the records: We talked about this yesterday and decided to drop the DNS idea since it does not solve more problems than it creates. So I agree with you here.
If a system is only behind an upstream HTTP(S) proxy, that should be enough to download updates in the optimal way.
Yes. I think access to HTTPS services (either directly or via a proxy) can be safely added to IPFire's system requirements.
[...]
Should we publish the current update state (called "Core Update" in 2.x, not sure if it exists in 3.x) via DNS, too? That way, we could avoid pings to the mirrors, so installations only need to connect in case an update has been announced.
They would only download the metadata from the main service and there would be no need to redownload the database again which is large. We have to assume that people have a slow connection and bandwidth is expensive.
I did not get this. Which database are you talking about here?
The package database.
Since DNS does not seem to be a good idea, my point here became obsolete.
My idea was to publish a DNS TXT record (similar to ClamAV) containing the current Core Update version. Since DNSSEC is obligatory in IPFire, this information is secured. Clients can look up that record in a certain time period (twice a day?), and in case anything has changed, they try to reach a mirror in order to download the update.
It is not guaranteed that DNSSEC is always on. I am also not trusting DNSSEC being around for forever. People feel that DNS-over-TLS seems to be enough. Different debate.
They do that with the repository metadata just like you described. A small file that is being checked very often and the big database is only downloaded when it has changed.
See above.
This assumes that we will still have Core Updates in 3.x, and I remember you saying no. Second, for databases (libloc, ...), clients need to connect to mirrors sooner or later, so maybe the DNS stuff is not working here well.
For libloc we can do this in the same way. But a file on a server with a hash and signature should do the job just as well as DNS. It is easier to implement and HTTPS connectivity is required anyways.
Yes.
The question here is if we want a central redirect service like the download links or do we want to distribute a list of mirror servers?
My opinion is to use a distributed list of mirror servers. To avoid unnecessary and expensive DNS and libloc updates on clients, we can just include the libloc information to the distributed list since it won't matter who does the lookup.
[...]
A decentralised system is better, but I do not see how we can achieve this. A distributed list could of course not be signed.
By "distributed list" you mean the mirror list? Why can't it be signed?
If multiple parties agree on *the* mirror list, then there cannot be a key, because that would be shared with everyone. I am talking about a distributed group of people making the list and not a list that is generated and then being distributed.
I consider this being a different topic and would like to discuss this in case we actually settled on signing the mirror list.
After that, a client can use a cached list, and fetch updates from any mirror. In case we have a system at the other end of the world, we also avoid connectivitiy issues, as we currently observe them in connection with mirrors in Ecuador.
A client can use a cached list now. The list is only refreshed once a day (I think). Updates can then be fetched from any mirror as long as the repository data is recent.
I hate to say it, but this does not sound very good (signatures expire, mirrors go offline, and so on).
The signature would only be verified when the list is being received and those mirrors will be added to an internal list as well as any manually configured ones.
Good idea.
Mirrors that are unreachable will of course be skipped. But the client cannot know if a mirror is temporarily gone or for forever.>
(b) If might be a postmaster-disease, but I was never a fan of moving knowledge from client to server (my favorite example here are MX recors, which work much better than implementing fail-over and loadbalancing on the server side).
An individual list for every client is very hard to debug, since it becomes difficult to reproduce a connectivity scenario if you do not know which servers the client saw. Second, we have a server side bottleneck here (signing!) and need an always-online key if we decide to sign that list, anyway.
We do not really care about any connectivity issues. There might be many reasons for that and I do not want to debug any mirror issues. The client just needs to move on to the next one.
Okay, but then why bother doing all the signing and calculation at one server?
Well, the idea behind this was to let the client decide which mirrors will be used by him. By delivering libloc information with the mirror list itself, we solved most of the problems you saw here initially, leaving the task to determine it's public IP to the client, which works well in most cases (direct PPPoE dialin, etc.), and otherwise, we have a fallback.
In my point of view, this way solves more problem than it causes.
?!
I do not took a look at the algorithm, yet, but the idea is to prorise mirror servers located near the client, assuming that geographic distance correlates with network distance today (not sure if that is correct anyway, but it is definitely better than in the 90s).
It puts everything in the same country to the top and all the rest to the bottom.
It correlates, but that is it. We should have a list of countries nearby an other one. It would make sense to group them together by continent, etc. But that is for somewhere else.
Yes, but it sounds easy to implement:
- Determine my public IP address.
That problem is a lot harder than it sounds. Look at ddns.
But in the end, it works. :-)
Ultimately, there is a central service that responds with a public IP address from which the request came from. If you want to avoid contacting a central service at all, then this solution doesn't solve that.
Yes. In case a system is unable to determine its public IP address, we need to either randomly select a mirror (and ignore all selection logic for that client) or lose a bit of privacy by connecting to a central server.
I prefer the second option here.
- Determine country for that IP.
- Which countries are near mine?
- Determine preferred mirror servers from these countries.
Am I missing something here?
I guess you are underestimating that this is quite complex to implement especially in environments where DNS is not available or some other oddities happen. Pakfire needs to work like a clock.
In case we deliver the libloc information with the mirror list, we only have the "determine-my-public-IP"-problem left, which I consider to be solvable.
Things that spring to mind is Cisco appliances that truncate DNS packets when they are longer than 100 bytes or something (and the TXT record will be a lot longer than that). Then there needs to be fallback mechanism and I think it would make sense to directly use HTTPS only. If that doesn't work, there won't be any updates anyways.
See above. And about the Cisco devices stripping some DNS traffic... It's a commercial appliance, isn't it? *vomit*
Basically the client has no way to measure "distance" or "speed". And I do not think it is right to implement this in the client. Just a GeoIP lookup requires to resolve DNS for all mirrors and then perform the database lookup. That takes a long time and I do not see why this is much better than the server-side approach.
True, we need DNS and GeoIP/libloc database lookups here, but these information can be safely cached for N days. After that, the lookup procedure is repeated.
That can of course be in the downloaded mirror list.
Yep, that solves many problems. Let's do it. :-)
[...] Yes, I do not want to prolong certain aspects of this. We are wasting too much time and not getting anywhere with this and I think it is wiser to spend that time on coding :)
(a) I assume we agree for the privacy and security aspects (HTTPS only and maybe Tor services) in general.
HTTPS is being settled. You have been reaching out to the last remaining mirrors that do not support it yet and I am sure we can convince a few more to enable it.
Tor. I have no technical insight nor do I think that many users will be using it. So please consider contributing the technical implementation of this.
I can do so. Settled.
(b) Signed mirror list: Yes, but using a local mirror must be possible - which simply overrides the list but that is all right since the user requested to do so - and it is not a magic bullet.
The question that isn't answered for me here is which key should be used. The repo's key? Guess it would be that one.
I do not know the answer to that. Perhaps you have to give me a crash course in current Pakfire 3.x first (in a different mailing list topic). Or did you mean for 2.x? In that case, it will be the repository key.
(c) Individual mirror lists vs. one-size-fits-all: Both ideas have their pros and cons: If we introduce mirror lists generated for each client individually, we have a bottleneck (signing?) and a SPOF.
SPOF yes, but that is not a problem because clients will continue using an old list.
We will have to check how long signing takes. Cannot be ages. But we will need to do this for each client since we randomize all mirrors. Or we implement the randomization at the client and then we can sign one per country and cache it which is well feasible.
Signing _can_ take ages, especially when we use HSMs here. Mostly, they are not optimised for speed, but for security. Different mirror lists for several countries is what Ubuntu does (http://mirrors.ubuntu.com/), but this causes some other problems (we actually do not want countries, but world "zones", signing take, etc.), so I consider delivering one mirror list in general the best practise here.
What if a IPFire system move (mobile device, laptop, LTE uplink in a truck, ...)? It would need to connect to a central server - which is actually what we are trying to avoid - and fetch a new list suitable for its current location to benefit from faster mirrors. Does not sound very great to me. :-|
(Playing the devils advocate here, please do not take this personally.)
Further, some persons like me might argue that this leaks IPs since all clients must connect to a central server. If we distribute a signed mirror list via the mirrors (as we do at the moment), we need to implement an algorithm for selecting servers from that list on the clients. Further, we bump into the problem that a client needs to know its public IP and that we need to cache the selection results to avoid excessive DNS and GeoIP/libloc queries.
"Leaking" the client's IP address isn't solved when there is a fallback to another central service. That just solves it for a number of clients but not all.
As mentioned above, we have to lose some privacy if we want connectivity, I consider that being OK in special cases such as mentioned above.
Since we need to implement a selection algorithm _somewhere_, I only consider determine public IPs a real problem and would therefor prefer the second scenario.
Unless you really really object, I would like to cut this conversation short and would like to propose that we go with the current implementation. It does not have any severe disadvantages over the other approach which just has other disadvantages. Our users won't care that much about this tiny detail and we could potentially change it later.
Okay, I agree. You won. :-)
Best regards, Peter Müller
If you want to hide your IP address, you can use Tor and then the server-based approach should tick all your boxes.
[...]