From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail02.haj.ipfire.org (localhost [IPv6:::1]) by mail02.haj.ipfire.org (Postfix) with ESMTP id 4dlnHR1lX7z3332 for ; Tue, 06 Jan 2026 10:20:35 +0000 (UTC) Received: from mail01.ipfire.org (mail01.haj.ipfire.org [172.28.1.202]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange x25519) (Client CN "mail01.haj.ipfire.org", Issuer "R12" (verified OK)) by mail02.haj.ipfire.org (Postfix) with ESMTPS id 4dlnHM4kLLz2xQW for ; Tue, 06 Jan 2026 10:20:31 +0000 (UTC) Received: from [127.0.0.1] (localhost [127.0.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail01.ipfire.org (Postfix) with ESMTPSA id 4dlnHL5T2JzN7; Tue, 06 Jan 2026 10:20:30 +0000 (UTC) DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=ipfire.org; s=202003ed25519; t=1767694830; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=HA9PG4ZekvF+cgZJ29Y7mD6aKsxkszaRpqXJzsSWN4g=; b=h9yrBiXkRaBmUeRKpx3od1WmNP880FdpbhlAb1wHgQvDcgaQBpQkUp4ddHofMUOB+0TpOr Ig+nf0qX73dY6EDQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ipfire.org; s=202003rsa; t=1767694830; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=HA9PG4ZekvF+cgZJ29Y7mD6aKsxkszaRpqXJzsSWN4g=; b=uwkcxHBMsOCOl6z81bKpaRnACaIEwPJAkOy+SvsVjUq38wmawC+VrM2tBMwFxJpE4IYIO2 nPKFhOc4F6ox57JQHzSRkfhcNkjLt6tUb20M8gOOXwgk3ZiClON5g0cgw7fMIoBfgEZ5zQ G0kk0ISiLJgW2Mnh40zMUo2E57/HuFEgaAHp9O6E7utIGQSSQvOYmONypSkr4xMcXXVt+l e2T9hoFparoimrAkd+0alTAD1Pkek4hTtPitcWYCEj3gC7XD2NLEMJmd12ep62feszWnbU zsVxiCP1q6t0T/NDPjfLd3Z/sx9h2uQDLy04mwNg4YxRhO+kD3Gw4Ls4+oInUQ== Content-Type: text/plain; charset=utf-8 Precedence: list List-Id: List-Subscribe: , List-Unsubscribe: , List-Post: List-Help: Sender: Mail-Followup-To: Mime-Version: 1.0 Subject: Re: Let's launch our own blocklists... From: Michael Tremer In-Reply-To: <6AD00CB0-4937-4F7F-B67B-E88D870B4942@ipfire.org> Date: Tue, 6 Jan 2026 10:20:30 +0000 Cc: "IPFire: Development-List" Content-Transfer-Encoding: quoted-printable Message-Id: <92BFE2B7-549F-41EC-ADC9-D2D7A29BEC82@ipfire.org> References: <7EF00B55-81C0-493F-A70F-B1DDD45363E2@ipfire.org> <9ac9c734-51fb-4152-bc0b-d2442d03d42a@ipfire.org> <5936cb35-c243-4b0f-843f-e6354226f9be@ipfire.org> <0bc86e25-903a-42a5-a338-72defd31c606@ipfire.org> <6AD00CB0-4937-4F7F-B67B-E88D870B4942@ipfire.org> To: Adolf Belka Good Morning Adolf, I had a look at this problem yesterday and it seems that parsing the = format is becoming a little bit difficult this way. Since this is only = affecting very few domains, I have simply whitelisted them all manually = and duckduckgo.com and others should now be = easily reachable again. Please let me know if you have any more findings. All the best, -Michael > On 5 Jan 2026, at 11:48, Michael Tremer = wrote: >=20 > Hello Adolf, >=20 > This is a good find. >=20 > But if duckduckgo.com is blocked, we will = have to have a source somewhere that blocks that domain. Not only a = sub-domain of it. Otherwise we have a bug somewhere. >=20 > This is most likely as the domain is listed here, but with some stuff = afterwards: >=20 > = https://raw.githubusercontent.com/mtxadmin/ublock/refs/heads/master/hosts/= _malware_typo >=20 > We strip everything after a # away because we consider it a comment. = However, that causes that there is only a line with the domain left = which will cause it being listed. >=20 > The # sign is used as some special character but at the same time it = is being used for comments. >=20 > I will fix this and then refresh the list. >=20 > -Michael >=20 >> On 5 Jan 2026, at 11:31, Adolf Belka wrote: >>=20 >> Hi Michael, >>=20 >>=20 >> On 05/01/2026 12:11, Adolf Belka wrote: >>> Hi Michael, >>>=20 >>> I have found that the malware list includes duckduckgo.com >>>=20 >> I have checked through the various sources used for the malware list. >>=20 >> The ShadowWhisperer (Tracking) list has improving.duckduckgo.com in = its list. I suspect that this one is the one causing the problem. >>=20 >> The mtxadmin (_malware_typo) list has duckduckgo.com mentioned 3 = times but not directly as a domain name - looks more like a reference. >>=20 >> Regards, >>=20 >> Adolf. >>=20 >>=20 >>> Regards, >>> Adolf. >>>=20 >>>=20 >>> On 02/01/2026 14:02, Adolf Belka wrote: >>>> Hi, >>>>=20 >>>> On 02/01/2026 12:09, Michael Tremer wrote: >>>>> Hello, >>>>>=20 >>>>>> On 30 Dec 2025, at 14:05, Adolf Belka = wrote: >>>>>>=20 >>>>>> Hi Michael, >>>>>>=20 >>>>>> On 29/12/2025 13:05, Michael Tremer wrote: >>>>>>> Hello everyone, >>>>>>>=20 >>>>>>> I hope everyone had a great Christmas and a couple of quiet days = to relax from all the stress that was the year 2025. >>>>>> Still relaxing. >>>>>=20 >>>>> Very good, so let=E2=80=99s have a strong start into 2026 now! >>>>=20 >>>> Starting next week, yes. >>>>=20 >>>>>=20 >>>>>>> Having a couple of quieter days, I have been working on a new, = little (hopefully) side project that has probably been high up on our = radar since the Shalla list has shut down in 2020, or maybe even = earlier. The goal of the project is to provide good lists with = categories of domain names which are usually used to block access to = these domains. >>>>>>>=20 >>>>>>> I simply call this IPFire DNSBL which is short for IPFire DNS = Blocklists. >>>>>>>=20 >>>>>>> How did we get here? >>>>>>>=20 >>>>>>> As stated before, the URL filter feature in IPFire has the = problem that there are not many good blocklists available any more. = There used to be a couple more - most famously the Shalla list - but we = are now down to a single list from the University of Toulouse. It is a = great list, but it is not always the best fit for all users. >>>>>>>=20 >>>>>>> Then there has been talk about whether we could implement more = blocking features into IPFire that don=E2=80=99t involve the proxy. Most = famously blocking over DNS. The problem here remains a the blocking = feature is only as good as the data that is fed into it. Some people = have been putting forward a number of lists that were suitable for them, = but they would not have replaced the blocking functionality as we know = it. Their aim is to provide =E2=80=9Cone list for everything=E2=80=9D = but that is not what people usually want. It is targeted at a classic = home user and the only separation that is being made is any = adult/porn/NSFW content which usually is put into a separate list. >>>>>>>=20 >>>>>>> It would have been technically possible to include these lists = and let the users decide, but that is not the aim of IPFire. We want to = do the job for the user so that their job is getting easier. Including = obscure lists that don=E2=80=99t have a clear outline of what they = actually want to block (=E2=80=9Cbad content=E2=80=9D is not a category) = and passing the burden of figuring out whether they need the = =E2=80=9CLight=E2=80=9D, =E2=80=9CNormal=E2=80=9D, =E2=80=9CPro=E2=80=9D, = =E2=80=9CPro++=E2=80=9D, =E2=80=9CUltimate=E2=80=9D or even a = =E2=80=9CVenti=E2=80=9D list with cream on top is really not going to = work. It is all confusing and will lead to a bad user experience. >>>>>>>=20 >>>>>>> An even bigger problem that is however completely impossible to = solve is bad licensing of these lists. A user has asked the publisher of = the HaGeZi list whether they could be included in IPFire and under what = terms. The response was that the list is available under the terms of = the GNU General Public License v3, but that does not seem to be true. = The list contains data from various sources. Many of them are licensed = under incompatible licenses (CC BY-SA 4.0, MPL, Apache2, =E2=80=A6) and = unless there is a non-public agreement that this data may be = redistributed, there is a huge legal issue here. We would expose our = users to potential copyright infringement which we cannot do under any = circumstances. Furthermore many lists are available under a = non-commercial license which excludes them from being used in any kind = of business. Plenty of IPFire systems are running in businesses, if not = even the vast majority. >>>>>>>=20 >>>>>>> In short, these lists are completely unusable for us. Apart from = HaGeZi, I consider OISD to have the same problem. >>>>>>>=20 >>>>>>> Enough about all the things that are bad. Let=E2=80=99s talk = about the new, good things: >>>>>>>=20 >>>>>>> Many blacklists on the internet are an amalgamation of other = lists. These lists vary in quality with some of them being not that good = and without a clear focus and others being excellent data. Since we = don=E2=80=99t have the man power to start from scratch, I felt that we = can copy the concept that HaGeZi and OISD have started and simply create = a new list that is based on other lists at the beginning to have a good = starting point. That way, we have much better control over what is going = on these lists and we can shape and mould them as we need them. Most = importantly, we don=E2=80=99t create a single lists, but many lists that = have a clear focus and allow users to choose what they want to block and = what not. >>>>>>>=20 >>>>>>> So the current experimental stage that I am in has these lists: >>>>>>>=20 >>>>>>> * Ads >>>>>>> * Dating >>>>>>> * DoH >>>>>>> * Gambling >>>>>>> * Malware >>>>>>> * Porn >>>>>>> * Social >>>>>>> * Violence >>>>>>>=20 >>>>>>> The categories have been determined by what source lists we have = available with good data and are compatible with our chosen license CC = BY-SA 4.0. This is the same license that we are using for the IPFire = Location database, too. >>>>>>>=20 >>>>>>> The main use-cases for any kind of blocking are to comply with = legal requirements in networks with children (i.e. schools) to remove = any kind of pornographic content, sometimes block social media as well. = Gambling and violence are commonly blocked, too. Even more common would = be filtering advertising and any malicious content. >>>>>>>=20 >>>>>>> The latter is especially difficult because so many source lists = throw phishing, spyware, malvertising, tracking and other things into = the same bucket. Here this is currently all in the malware list which = has therefore become quite large. I am not sure whether this will stay = like this in the future or if we will have to make some adjustments, but = that is exactly why this is now entering some larger testing. >>>>>>>=20 >>>>>>> What has been built so far? In order to put these lists together = properly, track any data about where it is coming from, I have built a = tool in Python available here: >>>>>>>=20 >>>>>>> https://git.ipfire.org/?p=3Ddnsbl.git;a=3Dsummary >>>>>>>=20 >>>>>>> This tool will automatically update all lists once an hour if = there have been any changes and export them in various formats. The = exported lists are available for download here: >>>>>>>=20 >>>>>>> https://dnsbl.ipfire.org/lists/ >>>>>> The download using dnsbl.ipfire.org/lists/squidguard.tar.gz as = the custom url works fine. >>>>>>=20 >>>>>> However you need to remember not to put the https:// at the front = of the url otherwise the WUI page completes without any error messages = but leaves an error message in the system logs saying >>>>>>=20 >>>>>> URL filter blacklist - ERROR: Not a valid URL filter blacklist >>>>>>=20 >>>>>> I found this out the hard way. >>>>>=20 >>>>> Oh yes, I forgot that there is a field on the web UI. If that does = not accept https:// as a prefix, please file a bug and we will fix it. >>>>=20 >>>> I will confirm it and raise a bug. >>>>=20 >>>>>=20 >>>>>> The other thing I noticed is that if you already have the = Toulouse University list downloaded and you then change to the ipfire = custom url then all the existing Toulouse blocklists stay in the = directory on IPFire and so you end up with a huge number of category = tick boxes, most of which are the old Toulouse ones, which are still = available to select and it is not clear which ones are from Toulouse and = which ones from IPFire. >>>>>=20 >>>>> Yes, I got the same thing, too. I think this is a bug, too, = because otherwise you would have a lot of unused categories lying around = that will never be updated. You cannot even tell which ones are from the = current list and which ones from the old list. >>>>>=20 >>>>> Long-term we could even consider to remove the Univ. Toulouse list = entirely and only have our own lists available which would make the = problem go away. >>>>>=20 >>>>>> I think if the blocklist URL source is changed or a custom url is = provided the first step should be to remove the old ones already = existing. >>>>>> That might be a problem because users can also create their own = blocklists and I believe those go into the same directory. >>>>>=20 >>>>> Good thought. We of course cannot delete the custom lists. >>>>>=20 >>>>>> Without clearing out the old blocklists you end up with a huge = number of checkboxes for lists but it is not clear what happens if there = is a category that has the same name for the Toulouse list and the = IPFire list such as gambling. I will have a look at that and see what = happens. >>>>>>=20 >>>>>> Not sure what the best approach to this is. >>>>>=20 >>>>> I believe it is removing all old content. >>>>>=20 >>>>>> Manually deleting all contents of the urlfilter/blacklists/ = directory and then selecting the IPFire blocklist url for the custom url = I end up with only the 8 categories from the IPFire list. >>>>>>=20 >>>>>> I have tested some gambling sites from the IPFire list and the = block worked on some. On others the site no longer exists so there is = nothing to block or has been changed to an https site and in that case = it went straight through. Also if I chose the http version of the link, = it was automatically changed to https and went through without being = blocked. >>>>>=20 >>>>> The entire IPFire infrastructure always requires HTTPS. If you = start using HTTP, you will be automatically redirected. It is 2026 and = we don=E2=80=99t need to talk HTTP any more :) >>>>=20 >>>> Some of the domains in the gambling list (maybe quite a lot) seem = to only have an http access. If I tried https it came back with the fact = that it couldn't find it. >>>>=20 >>>>>=20 >>>>> I am glad to hear that the list is actually blocking. It would = have been bad if it didn=E2=80=99t. Now we have the big task to check = out the =E2=80=9Cquality=E2=80=9D - however that can be determined. I = think this is what needs some time=E2=80=A6 >>>>>=20 >>>>> In the meantime I have set up a small page on our website: >>>>>=20 >>>>> https://www.ipfire.org/dnsbl >>>>>=20 >>>>> I would like to run this as a first-class project inside IPFire = like we are doing with IPFire Location. That means that we need to tell = people about what we are doing. Hopefully this page is a little start. >>>>>=20 >>>>> Initially it has a couple of high-level bullet points about what = we are trying to achieve. I don=E2=80=99t think the text is very good, = yet, but it is the best I had in that moment. There is then also a list = of the lists that we currently offer. For each list, a detailed page = will tell you about the license, how many domains are listed, when the = last update has been, the sources and even there is a history page that = shows all the changes whenever they have happened. >>>>>=20 >>>>> Finally there is a section that explains =E2=80=9CHow To Use?=E2=80=9D= the list which I would love to extend to include AdGuard Plus and = things like that as well as Pi-Hole and whatever else could use the = list. In a later step we should go ahead and talk to any projects to = include our list(s) into their dropdown so that people can enable them = nice and easy. >>>>>=20 >>>>> Behind the web page there is an API service that is running on the = host that is running the DNSBL. The frontend web app that is running = www.ipfire.org is connecting to that API = service to fetch the current lists, any details and so on. That way, we = can split the logic and avoid creating a huge monolith of a web app. = This also means that page could be down a little as I am still working = on the entire thing and will frequently restart it. >>>>>=20 >>>>> The API documentation is available here and the API is publicly = available: https://api.dnsbl.ipfire.org/docs >>>>>=20 >>>>> The website/API allows to file reports for anything that does not = seem to be right on any of the lists. I would like to keep it as an open = process, however, long-term, this cannot cost us any time. In the = current stage, the reports are getting filed and that is about it. I = still need to build out some way for admins or moderators (I am not sure = what kind of roles I want to have here) to accept or reject those = reports. >>>>>=20 >>>>> In case of us receiving a domain from a source list, I would = rather like to submit a report to upstream for them to de-list. That = way, we don=E2=80=99t have any admin to do and we are contributing back = to other list. That would be a very good thing to do. We cannot however = throw tons of emails at some random upstream projects without = co-ordinating this first. By not reporting upstream, we will probably = over time create large whitelists and I am not sure if that is a good = thing to do. >>>>>=20 >>>>> Finally, there is a search box that can be used to find out if a = domain is listed on any of the lists. >>>>>=20 >>>>>>> If you download and open any of the files, you will see a large = header that includes copyright information and lists all sources that = have been used to create the individual lists. This way we ensure = maximum transparency, comply with the terms of the individual licenses = of the source lists and give credit to the people who help us to put = together the most perfect list for our users. >>>>>>>=20 >>>>>>> I would like this to become a project that is not only being = used in IPFire. We can and will be compatible with other solutions like = AdGuard, PiHole so that people can use our lists if they would like to = even though they are not using IPFire. Hopefully, these users will also = feed back to us so that we can improve our lists over time and make them = one of the best options out there. >>>>>>>=20 >>>>>>> All lists are available as a simple text file that lists the = domains. Then there is a hosts file available as well as a DNS zone file = and an RPZ file. Each list is individually available to be used in = squidGuard and there is a larger tarball available with all lists that = can be used in IPFire=E2=80=99s URL Filter. I am planning to add = Suricata/Snort signatures whenever I have time to do so. Even though it = is not a good idea to filter pornographic content this way, I suppose = that catching malware and blocking DoH are good use-cases for an IPS. = Time will tell=E2=80=A6 >>>>>>>=20 >>>>>>> As a start, we will make these lists available in IPFire=E2=80=99s= URL Filter and collect some feedback about how we are doing. = Afterwards, we can see where else we can take this project. >>>>>>>=20 >>>>>>> If you want to enable this on your system, simply add the URL to = your autoupdate.urls file like here: >>>>>>>=20 >>>>>>> = https://git.ipfire.org/?p=3Dpeople/ms/ipfire-2.x.git;a=3Dcommitdiff;h=3Dbf= 675bb937faa7617474b3cc84435af3b1f7f45f >>>>>> I also tested out adding the IPFire url to autoupdate.urls and = that also worked fine for me. >>>>>=20 >>>>> Very good. Should we include this already with Core Update 200? I = don=E2=80=99t think we would break anything, but we might already gain a = couple more people who are helping us to test this all? >>>>=20 >>>> I think that would be a good idea. >>>>=20 >>>>>=20 >>>>> The next step would be to build and test our DNS infrastructure. = In the =E2=80=9CHow To Use?=E2=80=9D Section on the pages of the = individual lists, you can already see some instructions on how to use = the lists as an RPZ. In comparison to other =E2=80=9Cproviders=E2=80=9D, = I would prefer if people would be using DNS to fetch the lists. This is = simply to push out updates in a cheap way for us and also do it very = regularly. >>>>>=20 >>>>> Initially, clients will pull the entire list using AXFR. There is = no way around this as they need to have the data in the first place. = After that, clients will only need the changes. As you can see in the = history, the lists don=E2=80=99t actually change that often. Sometimes = only once a day and therefore downloading the entire list again would be = a huge waste of data, both on the client side, but also for us hosting = then. >>>>>=20 >>>>> Some other providers update their lists =E2=80=9Cevery 10 = minutes=E2=80=9D, and there won't be any changes whatsoever. We don=E2=80=99= t do that. We will only export the lists again when they have actually = changed. The timestamps on the files that we offer using HTTPS can be = checked by clients so that they won=E2=80=99t re-download the list again = if it has not been changed. But using HTTPS still means that we would = have to re-download the entire list and not only the changes. >>>>>=20 >>>>> Using DNS and IXFR will update the lists by only transferring a = few kilobytes and therefore we can have clients check once an hour if a = list has actually changed and only send out the raw changes. That way, = we will be able to serve millions of clients at very cheap cost and they = will always have a very up to date list. >>>>>=20 >>>>> As far as I can see any DNS software that supports RPZs supports = AXFR/IXFR with exception of Knot Resolver which expects the zone to be = downloaded externally. There is a ticket for AXFR/IXFR support = (https://gitlab.nic.cz/knot/knot-resolver/-/issues/195). >>>>>=20 >>>>> Initially, some of the lists have been *huge* which is why a = simple HTTP download is not feasible. The porn list was over 100 MiB. We = could have spent thousands on just traffic alone which I don=E2=80=99t = have for this kind of project. It would also be unnecessary money being = spent. There are simply better solutions out there. But then I built = something that basically tests the data that we are receiving from = upstream but simply checking if a listed domain still exists. The result = was very astonishing to me. >>>>>=20 >>>>> So whenever someone adds a domain to the list, we will = (eventually, but not immediately) check if we can resolve the domain=E2=80= =99s SOA record. If not, we mark the domain as non-active and will no = longer include them in the exported data. This brought down the porn = list from just under 5 million domains to just 421k. On the sources page = (https://www.ipfire.org/dnsbl/lists/porn/sources) I am listing the = percentage of dead domains from each of them and the UT1 list has 94% = dead domains. Wow. >>>>>=20 >>>>> If we cannot resolve the domain, neither can our users. So we = would otherwise fill the lists with tons of domains that simply could = never be reached. And if they cannot be reached, why would we block = them? We would waste bandwidth and a lot of memory on each single = client. >>>>>=20 >>>>> The other sources have similarly high rations of dead domains. = Most of them are in the 50-80% range. Therefore I am happy that we are = doing some extra work here to give our users much better data for their = filtering. >>>>=20 >>>> Removing all dead entries sounds like an excellent step. >>>>=20 >>>> Regards, >>>>=20 >>>> Adolf. >>>>=20 >>>>>=20 >>>>> So, if you like, please go and check out the RPZ blocking with = Unbound. Instructions are on the page. I would be happy to hear how this = is turning out. >>>>>=20 >>>>> Please let me know if there are any more questions, and I would be = glad to answer them. >>>>>=20 >>>>> Happy New Year, >>>>> -Michael >>>>>=20 >>>>>>=20 >>>>>> Regards, >>>>>> Adolf. >>>>>>> This email is just a brain dump from me to this list. I would be = happy to answer any questions about implementation details, etc. if = people are interested. Right now, this email is long enough already=E2=80=A6= >>>>>>>=20 >>>>>>> All the best, >>>>>>> -Michael >>>>>>=20 >>>>>> --=20 >>>>>> Sent from my laptop >>>>>=20 >>>>>=20 >>>>>=20 >>>>=20 >>>=20 >>=20 >> --=20 >> Sent from my laptop >>=20 >>=20 >=20