From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail02.haj.ipfire.org (localhost [IPv6:::1]) by mail02.haj.ipfire.org (Postfix) with ESMTP id 4dlBT25MYYz3333 for ; Mon, 05 Jan 2026 11:11:50 +0000 (UTC) Received: from mail01.ipfire.org (mail01.haj.ipfire.org [172.28.1.202]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange x25519 server-signature ECDSA (secp384r1 raw public key) server-digest SHA384 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mail01.haj.ipfire.org", Issuer "R12" (verified OK)) by mail02.haj.ipfire.org (Postfix) with ESMTPS id 4dlBSz1Q0Wz2xLt for ; Mon, 05 Jan 2026 11:11:47 +0000 (UTC) Received: from [127.0.0.1] (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange x25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mail01.ipfire.org (Postfix) with ESMTPSA id 4dlBSw4YKPzhP; Mon, 05 Jan 2026 11:11:44 +0000 (UTC) DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=ipfire.org; s=202003ed25519; t=1767611505; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=nYd0jKc3k+hFL99B4aQa13xhA6PcbGuij4YviBIahrI=; b=IdCpQOpayj4DgqSctwyK/oi7bssQUrmsziw01Zk+mws67D4PCehteLd6Lq0Eh7UtHzvzqd 1cWfrTINLz5gloCQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ipfire.org; s=202003rsa; t=1767611505; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=nYd0jKc3k+hFL99B4aQa13xhA6PcbGuij4YviBIahrI=; b=AgKsanNEOd491jdmRS6AGMg/8PsdpByRf4CV2F6x+jxjDG6DXPgKGsmw3KJ7UzSfAfYozu fY9oqKBEyq8gcgXNOShJ5hrioYUvaL22K4c69DeIIlhBrO1w+QeQEyG+fL0kWgPyQbieeH E7ttK0fVMructHc9sfT5v0mHFnDxYVeyTAc3EE3E0LXFUHr6FrKDL8juiyR6Xm4mOUd7K0 XFt5iHOocDHje85hliS3qTFD1taaS5OQErxRWnsE76fQy205s1przfUAVJDumF6Bu6h4gJ hD+IuFcwWETMc06tLwQypGFAMJ1ECjDt5zVxeh+pl9Ln4mHZRXQmADvZkmOyeQ== Message-ID: <5936cb35-c243-4b0f-843f-e6354226f9be@ipfire.org> Date: Mon, 5 Jan 2026 12:11:40 +0100 Precedence: list List-Id: List-Subscribe: , List-Unsubscribe: , List-Post: List-Help: Sender: Mail-Followup-To: MIME-Version: 1.0 Subject: Re: Let's launch our own blocklists... From: Adolf Belka To: Michael Tremer References: <7EF00B55-81C0-493F-A70F-B1DDD45363E2@ipfire.org> <9ac9c734-51fb-4152-bc0b-d2442d03d42a@ipfire.org> Content-Language: en-GB Cc: "IPFire: Development-List" In-Reply-To: <9ac9c734-51fb-4152-bc0b-d2442d03d42a@ipfire.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Hi Michael, I have found that the malware list includes duckduckgo.com Regards, Adolf. On 02/01/2026 14:02, Adolf Belka wrote: > Hi, > > On 02/01/2026 12:09, Michael Tremer wrote: >> Hello, >> >>> On 30 Dec 2025, at 14:05, Adolf Belka wrote: >>> >>> Hi Michael, >>> >>> On 29/12/2025 13:05, Michael Tremer wrote: >>>> Hello everyone, >>>> >>>> I hope everyone had a great Christmas and a couple of quiet days to >>>> relax from all the stress that was the year 2025. >>> Still relaxing. >> >> Very good, so let’s have a strong start into 2026 now! > > Starting next week, yes. > >> >>>> Having a couple of quieter days, I have been working on a new, >>>> little (hopefully) side project that has probably been high up on >>>> our radar since the Shalla list has shut down in 2020, or maybe >>>> even earlier. The goal of the project is to provide good lists with >>>> categories of domain names which are usually used to block access >>>> to these domains. >>>> >>>> I simply call this IPFire DNSBL which is short for IPFire DNS >>>> Blocklists. >>>> >>>> How did we get here? >>>> >>>> As stated before, the URL filter feature in IPFire has the problem >>>> that there are not many good blocklists available any more. There >>>> used to be a couple more - most famously the Shalla list - but we >>>> are now down to a single list from the University of Toulouse. It >>>> is a great list, but it is not always the best fit for all users. >>>> >>>> Then there has been talk about whether we could implement more >>>> blocking features into IPFire that don’t involve the proxy. Most >>>> famously blocking over DNS. The problem here remains a the blocking >>>> feature is only as good as the data that is fed into it. Some >>>> people have been putting forward a number of lists that were >>>> suitable for them, but they would not have replaced the blocking >>>> functionality as we know it. Their aim is to provide “one list for >>>> everything” but that is not what people usually want. It is >>>> targeted at a classic home user and the only separation that is >>>> being made is any adult/porn/NSFW content which usually is put into >>>> a separate list. >>>> >>>> It would have been technically possible to include these lists and >>>> let the users decide, but that is not the aim of IPFire. We want to >>>> do the job for the user so that their job is getting easier. >>>> Including obscure lists that don’t have a clear outline of what >>>> they actually want to block (“bad content” is not a category) and >>>> passing the burden of figuring out whether they need the “Light”, >>>> “Normal”, “Pro”, “Pro++”, “Ultimate” or even a “Venti” list with >>>> cream on top is really not going to work. It is all confusing and >>>> will lead to a bad user experience. >>>> >>>> An even bigger problem that is however completely impossible to >>>> solve is bad licensing of these lists. A user has asked the >>>> publisher of the HaGeZi list whether they could be included in >>>> IPFire and under what terms. The response was that the list is >>>> available under the terms of the GNU General Public License v3, but >>>> that does not seem to be true. The list contains data from various >>>> sources. Many of them are licensed under incompatible licenses (CC >>>> BY-SA 4.0, MPL, Apache2, …) and unless there is a non-public >>>> agreement that this data may be redistributed, there is a huge >>>> legal issue here. We would expose our users to potential copyright >>>> infringement which we cannot do under any circumstances. >>>> Furthermore many lists are available under a non-commercial license >>>> which excludes them from being used in any kind of business. Plenty >>>> of IPFire systems are running in businesses, if not even the vast >>>> majority. >>>> >>>> In short, these lists are completely unusable for us. Apart from >>>> HaGeZi, I consider OISD to have the same problem. >>>> >>>> Enough about all the things that are bad. Let’s talk about the new, >>>> good things: >>>> >>>> Many blacklists on the internet are an amalgamation of other lists. >>>> These lists vary in quality with some of them being not that good >>>> and without a clear focus and others being excellent data. Since we >>>> don’t have the man power to start from scratch, I felt that we can >>>> copy the concept that HaGeZi and OISD have started and simply >>>> create a new list that is based on other lists at the beginning to >>>> have a good starting point. That way, we have much better control >>>> over what is going on these lists and we can shape and mould them >>>> as we need them. Most importantly, we don’t create a single lists, >>>> but many lists that have a clear focus and allow users to choose >>>> what they want to block and what not. >>>> >>>> So the current experimental stage that I am in has these lists: >>>> >>>>    * Ads >>>>    * Dating >>>>    * DoH >>>>    * Gambling >>>>    * Malware >>>>    * Porn >>>>    * Social >>>>    * Violence >>>> >>>> The categories have been determined by what source lists we have >>>> available with good data and are compatible with our chosen license >>>> CC BY-SA 4.0. This is the same license that we are using for the >>>> IPFire Location database, too. >>>> >>>> The main use-cases for any kind of blocking are to comply with >>>> legal requirements in networks with children (i.e. schools) to >>>> remove any kind of pornographic content, sometimes block social >>>> media as well. Gambling and violence are commonly blocked, too. >>>> Even more common would be filtering advertising and any malicious >>>> content. >>>> >>>> The latter is especially difficult because so many source lists >>>> throw phishing, spyware, malvertising, tracking and other things >>>> into the same bucket. Here this is currently all in the malware >>>> list which has therefore become quite large. I am not sure whether >>>> this will stay like this in the future or if we will have to make >>>> some adjustments, but that is exactly why this is now entering some >>>> larger testing. >>>> >>>> What has been built so far? In order to put these lists together >>>> properly, track any data about where it is coming from, I have >>>> built a tool in Python available here: >>>> >>>>    https://git.ipfire.org/?p=dnsbl.git;a=summary >>>> >>>> This tool will automatically update all lists once an hour if there >>>> have been any changes and export them in various formats. The >>>> exported lists are available for download here: >>>> >>>>    https://dnsbl.ipfire.org/lists/ >>> The download using dnsbl.ipfire.org/lists/squidguard.tar.gz as the >>> custom url works fine. >>> >>> However you need to remember not to put the https:// at the front of >>> the url otherwise the WUI page completes without any error messages >>> but leaves an error message in the system logs saying >>> >>> URL filter blacklist - ERROR: Not a valid URL filter blacklist >>> >>> I found this out the hard way. >> >> Oh yes, I forgot that there is a field on the web UI. If that does >> not accept https:// as a prefix, please file a bug and we will fix it. > > I will confirm it and raise a bug. > >> >>> The other thing I noticed is that if you already have the Toulouse >>> University list downloaded and you then change to the ipfire custom >>> url then all the existing Toulouse blocklists stay in the directory >>> on IPFire and so you end up with a huge number of category tick >>> boxes, most of which are the old Toulouse ones, which are still >>> available to select and it is not clear which ones are from Toulouse >>> and which ones from IPFire. >> >> Yes, I got the same thing, too. I think this is a bug, too, because >> otherwise you would have a lot of unused categories lying around that >> will never be updated. You cannot even tell which ones are from the >> current list and which ones from the old list. >> >> Long-term we could even consider to remove the Univ. Toulouse list >> entirely and only have our own lists available which would make the >> problem go away. >> >>> I think if the blocklist URL source is changed or a custom url is >>> provided the first step should be to remove the old ones already >>> existing. >>> That might be a problem because users can also create their own >>> blocklists and I believe those go into the same directory. >> >> Good thought. We of course cannot delete the custom lists. >> >>> Without clearing out the old blocklists you end up with a huge >>> number of checkboxes for lists but it is not clear what happens if >>> there is a category that has the same name for the Toulouse list and >>> the IPFire list such as gambling. I will have a look at that and see >>> what happens. >>> >>> Not sure what the best approach to this is. >> >> I believe it is removing all old content. >> >>> Manually deleting all contents of the urlfilter/blacklists/ >>> directory and then selecting the IPFire blocklist url for the custom >>> url I end up with only the 8 categories from the IPFire list. >>> >>> I have tested some gambling sites from the IPFire list and the block >>> worked on some. On others the site no longer exists so there is >>> nothing to block or has been changed to an https site and in that >>> case it went straight through. Also if I chose the http version of >>> the link, it was automatically changed to https and went through >>> without being blocked. >> >> The entire IPFire infrastructure always requires HTTPS. If you start >> using HTTP, you will be automatically redirected. It is 2026 and we >> don’t need to talk HTTP any more :) > > Some of the domains in the gambling list (maybe quite a lot) seem to > only have an http access. If I tried https it came back with the fact > that it couldn't find it. > >> >> I am glad to hear that the list is actually blocking. It would have >> been bad if it didn’t. Now we have the big task to check out the >> “quality” - however that can be determined. I think this is what >> needs some time… >> >> In the meantime I have set up a small page on our website: >> >>    https://www.ipfire.org/dnsbl >> >> I would like to run this as a first-class project inside IPFire like >> we are doing with IPFire Location. That means that we need to tell >> people about what we are doing. Hopefully this page is a little start. >> >> Initially it has a couple of high-level bullet points about what we >> are trying to achieve. I don’t think the text is very good, yet, but >> it is the best I had in that moment. There is then also a list of the >> lists that we currently offer. For each list, a detailed page will >> tell you about the license, how many domains are listed, when the >> last update has been, the sources and even there is a history page >> that shows all the changes whenever they have happened. >> >> Finally there is a section that explains “How To Use?” the list which >> I would love to extend to include AdGuard Plus and things like that >> as well as Pi-Hole and whatever else could use the list. In a later >> step we should go ahead and talk to any projects to include our >> list(s) into their dropdown so that people can enable them nice and >> easy. >> >> Behind the web page there is an API service that is running on the >> host that is running the DNSBL. The frontend web app that is running >> www.ipfire.org is connecting to that API >> service to fetch the current lists, any details and so on. That way, >> we can split the logic and avoid creating a huge monolith of a web >> app. This also means that page could be down a little as I am still >> working on the entire thing and will frequently restart it. >> >> The API documentation is available here and the API is publicly >> available: https://api.dnsbl.ipfire.org/docs >> >> The website/API allows to file reports for anything that does not >> seem to be right on any of the lists. I would like to keep it as an >> open process, however, long-term, this cannot cost us any time. In >> the current stage, the reports are getting filed and that is about >> it. I still need to build out some way for admins or moderators (I am >> not sure what kind of roles I want to have here) to accept or reject >> those reports. >> >> In case of us receiving a domain from a source list, I would rather >> like to submit a report to upstream for them to de-list. That way, we >> don’t have any admin to do and we are contributing back to other >> list. That would be a very good thing to do. We cannot however throw >> tons of emails at some random upstream projects without co-ordinating >> this first. By not reporting upstream, we will probably over time >> create large whitelists and I am not sure if that is a good thing to do. >> >> Finally, there is a search box that can be used to find out if a >> domain is listed on any of the lists. >> >>>> If you download and open any of the files, you will see a large >>>> header that includes copyright information and lists all sources >>>> that have been used to create the individual lists. This way we >>>> ensure maximum transparency, comply with the terms of the >>>> individual licenses of the source lists and give credit to the >>>> people who help us to put together the most perfect list for our >>>> users. >>>> >>>> I would like this to become a project that is not only being used >>>> in IPFire. We can and will be compatible with other solutions like >>>> AdGuard, PiHole so that people can use our lists if they would like >>>> to even though they are not using IPFire. Hopefully, these users >>>> will also feed back to us so that we can improve our lists over >>>> time and make them one of the best options out there. >>>> >>>> All lists are available as a simple text file that lists the >>>> domains. Then there is a hosts file available as well as a DNS zone >>>> file and an RPZ file. Each list is individually available to be >>>> used in squidGuard and there is a larger tarball available with all >>>> lists that can be used in IPFire’s URL Filter. I am planning to add >>>> Suricata/Snort signatures whenever I have time to do so. Even >>>> though it is not a good idea to filter pornographic content this >>>> way, I suppose that catching malware and blocking DoH are good >>>> use-cases for an IPS. Time will tell… >>>> >>>> As a start, we will make these lists available in IPFire’s URL >>>> Filter and collect some feedback about how we are doing. >>>> Afterwards, we can see where else we can take this project. >>>> >>>> If you want to enable this on your system, simply add the URL to >>>> your autoupdate.urls file like here: >>>> >>>> https://git.ipfire.org/?p=people/ms/ipfire-2.x.git;a=commitdiff;h=bf675bb937faa7617474b3cc84435af3b1f7f45f >>> I also tested out adding the IPFire url to autoupdate.urls and that >>> also worked fine for me. >> >> Very good. Should we include this already with Core Update 200? I >> don’t think we would break anything, but we might already gain a >> couple more people who are helping us to test this all? > > I think that would be a good idea. > >> >> The next step would be to build and test our DNS infrastructure. In >> the “How To Use?” Section on the pages of the individual lists, you >> can already see some instructions on how to use the lists as an RPZ. >> In comparison to other “providers”, I would prefer if people would be >> using DNS to fetch the lists. This is simply to push out updates in a >> cheap way for us and also do it very regularly. >> >> Initially, clients will pull the entire list using AXFR. There is no >> way around this as they need to have the data in the first place. >> After that, clients will only need the changes. As you can see in the >> history, the lists don’t actually change that often. Sometimes only >> once a day and therefore downloading the entire list again would be a >> huge waste of data, both on the client side, but also for us hosting >> then. >> >> Some other providers update their lists “every 10 minutes”, and there >> won't be any changes whatsoever. We don’t do that. We will only >> export the lists again when they have actually changed. The >> timestamps on the files that we offer using HTTPS can be checked by >> clients so that they won’t re-download the list again if it has not >> been changed. But using HTTPS still means that we would have to >> re-download the entire list and not only the changes. >> >> Using DNS and IXFR will update the lists by only transferring a few >> kilobytes and therefore we can have clients check once an hour if a >> list has actually changed and only send out the raw changes. That >> way, we will be able to serve millions of clients at very cheap cost >> and they will always have a very up to date list. >> >> As far as I can see any DNS software that supports RPZs supports >> AXFR/IXFR with exception of Knot Resolver which expects the zone to >> be downloaded externally. There is a ticket for AXFR/IXFR support >> (https://gitlab.nic.cz/knot/knot-resolver/-/issues/195). >> >> Initially, some of the lists have been *huge* which is why a simple >> HTTP download is not feasible. The porn list was over 100 MiB. We >> could have spent thousands on just traffic alone which I don’t have >> for this kind of project. It would also be unnecessary money being >> spent. There are simply better solutions out there. But then I built >> something that basically tests the data that we are receiving from >> upstream but simply checking if a listed domain still exists. The >> result was very astonishing to me. >> >> So whenever someone adds a domain to the list, we will (eventually, >> but not immediately) check if we can resolve the domain’s SOA record. >> If not, we mark the domain as non-active and will no longer include >> them in the exported data. This brought down the porn list from just >> under 5 million domains to just 421k. On the sources page >> (https://www.ipfire.org/dnsbl/lists/porn/sources) I am listing the >> percentage of dead domains from each of them and the UT1 list has 94% >> dead domains. Wow. >> >> If we cannot resolve the domain, neither can our users. So we would >> otherwise fill the lists with tons of domains that simply could never >> be reached. And if they cannot be reached, why would we block them? >> We would waste bandwidth and a lot of memory on each single client. >> >> The other sources have similarly high rations of dead domains. Most >> of them are in the 50-80% range. Therefore I am happy that we are >> doing some extra work here to give our users much better data for >> their filtering. > > Removing all dead entries sounds like an excellent step. > > Regards, > > Adolf. > >> >> So, if you like, please go and check out the RPZ blocking with >> Unbound. Instructions are on the page. I would be happy to hear how >> this is turning out. >> >> Please let me know if there are any more questions, and I would be >> glad to answer them. >> >> Happy New Year, >> -Michael >> >>> >>> Regards, >>> Adolf. >>>> This email is just a brain dump from me to this list. I would be >>>> happy to answer any questions about implementation details, etc. if >>>> people are interested. Right now, this email is long enough already… >>>> >>>> All the best, >>>> -Michael >>> >>> -- >>> Sent from my laptop >> >> >> > -- Sent from my laptop