Re: Let's launch our own blocklists...

public inbox for development@lists.ipfire.org
 help / color / mirror / Atom feed

From: Adolf Belka <adolf.belka@ipfire.org>
To: Michael Tremer <michael.tremer@ipfire.org>
Cc: "IPFire: Development-List" <development@lists.ipfire.org>
Subject: Re: Let's launch our own blocklists...
Date: Mon, 5 Jan 2026 12:31:17 +0100	[thread overview]
Message-ID: <0bc86e25-903a-42a5-a338-72defd31c606@ipfire.org> (raw)
In-Reply-To: <5936cb35-c243-4b0f-843f-e6354226f9be@ipfire.org>

Hi Michael,


On 05/01/2026 12:11, Adolf Belka wrote:
> Hi Michael,
>
> I have found that the malware list includes duckduckgo.com
>
I have checked through the various sources used for the malware list.

The ShadowWhisperer (Tracking) list has improving.duckduckgo.com in its 
list. I suspect that this one is the one causing the problem.

The mtxadmin (_malware_typo) list has duckduckgo.com mentioned 3 times 
but not directly as a domain name - looks more like a reference.

Regards,

Adolf.


> Regards,
> Adolf.
>
>
> On 02/01/2026 14:02, Adolf Belka wrote:
>> Hi,
>>
>> On 02/01/2026 12:09, Michael Tremer wrote:
>>> Hello,
>>>
>>>> On 30 Dec 2025, at 14:05, Adolf Belka <adolf.belka@ipfire.org> wrote:
>>>>
>>>> Hi Michael,
>>>>
>>>> On 29/12/2025 13:05, Michael Tremer wrote:
>>>>> Hello everyone,
>>>>>
>>>>> I hope everyone had a great Christmas and a couple of quiet days 
>>>>> to relax from all the stress that was the year 2025.
>>>> Still relaxing.
>>>
>>> Very good, so let’s have a strong start into 2026 now!
>>
>> Starting next week, yes.
>>
>>>
>>>>> Having a couple of quieter days, I have been working on a new, 
>>>>> little (hopefully) side project that has probably been high up on 
>>>>> our radar since the Shalla list has shut down in 2020, or maybe 
>>>>> even earlier. The goal of the project is to provide good lists 
>>>>> with categories of domain names which are usually used to block 
>>>>> access to these domains.
>>>>>
>>>>> I simply call this IPFire DNSBL which is short for IPFire DNS 
>>>>> Blocklists.
>>>>>
>>>>> How did we get here?
>>>>>
>>>>> As stated before, the URL filter feature in IPFire has the problem 
>>>>> that there are not many good blocklists available any more. There 
>>>>> used to be a couple more - most famously the Shalla list - but we 
>>>>> are now down to a single list from the University of Toulouse. It 
>>>>> is a great list, but it is not always the best fit for all users.
>>>>>
>>>>> Then there has been talk about whether we could implement more 
>>>>> blocking features into IPFire that don’t involve the proxy. Most 
>>>>> famously blocking over DNS. The problem here remains a the 
>>>>> blocking feature is only as good as the data that is fed into it. 
>>>>> Some people have been putting forward a number of lists that were 
>>>>> suitable for them, but they would not have replaced the blocking 
>>>>> functionality as we know it. Their aim is to provide “one list for 
>>>>> everything” but that is not what people usually want. It is 
>>>>> targeted at a classic home user and the only separation that is 
>>>>> being made is any adult/porn/NSFW content which usually is put 
>>>>> into a separate list.
>>>>>
>>>>> It would have been technically possible to include these lists and 
>>>>> let the users decide, but that is not the aim of IPFire. We want 
>>>>> to do the job for the user so that their job is getting easier. 
>>>>> Including obscure lists that don’t have a clear outline of what 
>>>>> they actually want to block (“bad content” is not a category) and 
>>>>> passing the burden of figuring out whether they need the “Light”, 
>>>>> “Normal”, “Pro”, “Pro++”, “Ultimate” or even a “Venti” list with 
>>>>> cream on top is really not going to work. It is all confusing and 
>>>>> will lead to a bad user experience.
>>>>>
>>>>> An even bigger problem that is however completely impossible to 
>>>>> solve is bad licensing of these lists. A user has asked the 
>>>>> publisher of the HaGeZi list whether they could be included in 
>>>>> IPFire and under what terms. The response was that the list is 
>>>>> available under the terms of the GNU General Public License v3, 
>>>>> but that does not seem to be true. The list contains data from 
>>>>> various sources. Many of them are licensed under incompatible 
>>>>> licenses (CC BY-SA 4.0, MPL, Apache2, …) and unless there is a 
>>>>> non-public agreement that this data may be redistributed, there is 
>>>>> a huge legal issue here. We would expose our users to potential 
>>>>> copyright infringement which we cannot do under any circumstances. 
>>>>> Furthermore many lists are available under a non-commercial 
>>>>> license which excludes them from being used in any kind of 
>>>>> business. Plenty of IPFire systems are running in businesses, if 
>>>>> not even the vast majority.
>>>>>
>>>>> In short, these lists are completely unusable for us. Apart from 
>>>>> HaGeZi, I consider OISD to have the same problem.
>>>>>
>>>>> Enough about all the things that are bad. Let’s talk about the 
>>>>> new, good things:
>>>>>
>>>>> Many blacklists on the internet are an amalgamation of other 
>>>>> lists. These lists vary in quality with some of them being not 
>>>>> that good and without a clear focus and others being excellent 
>>>>> data. Since we don’t have the man power to start from scratch, I 
>>>>> felt that we can copy the concept that HaGeZi and OISD have 
>>>>> started and simply create a new list that is based on other lists 
>>>>> at the beginning to have a good starting point. That way, we have 
>>>>> much better control over what is going on these lists and we can 
>>>>> shape and mould them as we need them. Most importantly, we don’t 
>>>>> create a single lists, but many lists that have a clear focus and 
>>>>> allow users to choose what they want to block and what not.
>>>>>
>>>>> So the current experimental stage that I am in has these lists:
>>>>>
>>>>>    * Ads
>>>>>    * Dating
>>>>>    * DoH
>>>>>    * Gambling
>>>>>    * Malware
>>>>>    * Porn
>>>>>    * Social
>>>>>    * Violence
>>>>>
>>>>> The categories have been determined by what source lists we have 
>>>>> available with good data and are compatible with our chosen 
>>>>> license CC BY-SA 4.0. This is the same license that we are using 
>>>>> for the IPFire Location database, too.
>>>>>
>>>>> The main use-cases for any kind of blocking are to comply with 
>>>>> legal requirements in networks with children (i.e. schools) to 
>>>>> remove any kind of pornographic content, sometimes block social 
>>>>> media as well. Gambling and violence are commonly blocked, too. 
>>>>> Even more common would be filtering advertising and any malicious 
>>>>> content.
>>>>>
>>>>> The latter is especially difficult because so many source lists 
>>>>> throw phishing, spyware, malvertising, tracking and other things 
>>>>> into the same bucket. Here this is currently all in the malware 
>>>>> list which has therefore become quite large. I am not sure whether 
>>>>> this will stay like this in the future or if we will have to make 
>>>>> some adjustments, but that is exactly why this is now entering 
>>>>> some larger testing.
>>>>>
>>>>> What has been built so far? In order to put these lists together 
>>>>> properly, track any data about where it is coming from, I have 
>>>>> built a tool in Python available here:
>>>>>
>>>>>    https://git.ipfire.org/?p=dnsbl.git;a=summary
>>>>>
>>>>> This tool will automatically update all lists once an hour if 
>>>>> there have been any changes and export them in various formats. 
>>>>> The exported lists are available for download here:
>>>>>
>>>>>    https://dnsbl.ipfire.org/lists/
>>>> The download using dnsbl.ipfire.org/lists/squidguard.tar.gz as the 
>>>> custom url works fine.
>>>>
>>>> However you need to remember not to put the https:// at the front 
>>>> of the url otherwise the WUI page completes without any error 
>>>> messages but leaves an error message in the system logs saying
>>>>
>>>> URL filter blacklist - ERROR: Not a valid URL filter blacklist
>>>>
>>>> I found this out the hard way.
>>>
>>> Oh yes, I forgot that there is a field on the web UI. If that does 
>>> not accept https:// as a prefix, please file a bug and we will fix it.
>>
>> I will confirm it and raise a bug.
>>
>>>
>>>> The other thing I noticed is that if you already have the Toulouse 
>>>> University list downloaded and you then change to the ipfire custom 
>>>> url then all the existing Toulouse blocklists stay in the directory 
>>>> on IPFire and so you end up with a huge number of category tick 
>>>> boxes, most of which are the old Toulouse ones, which are still 
>>>> available to select and it is not clear which ones are from 
>>>> Toulouse and which ones from IPFire.
>>>
>>> Yes, I got the same thing, too. I think this is a bug, too, because 
>>> otherwise you would have a lot of unused categories lying around 
>>> that will never be updated. You cannot even tell which ones are from 
>>> the current list and which ones from the old list.
>>>
>>> Long-term we could even consider to remove the Univ. Toulouse list 
>>> entirely and only have our own lists available which would make the 
>>> problem go away.
>>>
>>>> I think if the blocklist URL source is changed or a custom url is 
>>>> provided the first step should be to remove the old ones already 
>>>> existing.
>>>> That might be a problem because users can also create their own 
>>>> blocklists and I believe those go into the same directory.
>>>
>>> Good thought. We of course cannot delete the custom lists.
>>>
>>>> Without clearing out the old blocklists you end up with a huge 
>>>> number of checkboxes for lists but it is not clear what happens if 
>>>> there is a category that has the same name for the Toulouse list 
>>>> and the IPFire list such as gambling. I will have a look at that 
>>>> and see what happens.
>>>>
>>>> Not sure what the best approach to this is.
>>>
>>> I believe it is removing all old content.
>>>
>>>> Manually deleting all contents of the urlfilter/blacklists/ 
>>>> directory and then selecting the IPFire blocklist url for the 
>>>> custom url I end up with only the 8 categories from the IPFire list.
>>>>
>>>> I have tested some gambling sites from the IPFire list and the 
>>>> block worked on some. On others the site no longer exists so there 
>>>> is nothing to block or has been changed to an https site and in 
>>>> that case it went straight through. Also if I chose the http 
>>>> version of the link, it was automatically changed to https and went 
>>>> through without being blocked.
>>>
>>> The entire IPFire infrastructure always requires HTTPS. If you start 
>>> using HTTP, you will be automatically redirected. It is 2026 and we 
>>> don’t need to talk HTTP any more :)
>>
>> Some of the domains in the gambling list (maybe quite a lot) seem to 
>> only have an http access. If I tried https it came back with the fact 
>> that it couldn't find it.
>>
>>>
>>> I am glad to hear that the list is actually blocking. It would have 
>>> been bad if it didn’t. Now we have the big task to check out the 
>>> “quality” - however that can be determined. I think this is what 
>>> needs some time…
>>>
>>> In the meantime I have set up a small page on our website:
>>>
>>>    https://www.ipfire.org/dnsbl
>>>
>>> I would like to run this as a first-class project inside IPFire like 
>>> we are doing with IPFire Location. That means that we need to tell 
>>> people about what we are doing. Hopefully this page is a little start.
>>>
>>> Initially it has a couple of high-level bullet points about what we 
>>> are trying to achieve. I don’t think the text is very good, yet, but 
>>> it is the best I had in that moment. There is then also a list of 
>>> the lists that we currently offer. For each list, a detailed page 
>>> will tell you about the license, how many domains are listed, when 
>>> the last update has been, the sources and even there is a history 
>>> page that shows all the changes whenever they have happened.
>>>
>>> Finally there is a section that explains “How To Use?” the list 
>>> which I would love to extend to include AdGuard Plus and things like 
>>> that as well as Pi-Hole and whatever else could use the list. In a 
>>> later step we should go ahead and talk to any projects to include 
>>> our list(s) into their dropdown so that people can enable them nice 
>>> and easy.
>>>
>>> Behind the web page there is an API service that is running on the 
>>> host that is running the DNSBL. The frontend web app that is running 
>>> www.ipfire.org <http://www.ipfire.org/> is connecting to that API 
>>> service to fetch the current lists, any details and so on. That way, 
>>> we can split the logic and avoid creating a huge monolith of a web 
>>> app. This also means that page could be down a little as I am still 
>>> working on the entire thing and will frequently restart it.
>>>
>>> The API documentation is available here and the API is publicly 
>>> available: https://api.dnsbl.ipfire.org/docs
>>>
>>> The website/API allows to file reports for anything that does not 
>>> seem to be right on any of the lists. I would like to keep it as an 
>>> open process, however, long-term, this cannot cost us any time. In 
>>> the current stage, the reports are getting filed and that is about 
>>> it. I still need to build out some way for admins or moderators (I 
>>> am not sure what kind of roles I want to have here) to accept or 
>>> reject those reports.
>>>
>>> In case of us receiving a domain from a source list, I would rather 
>>> like to submit a report to upstream for them to de-list. That way, 
>>> we don’t have any admin to do and we are contributing back to other 
>>> list. That would be a very good thing to do. We cannot however throw 
>>> tons of emails at some random upstream projects without 
>>> co-ordinating this first. By not reporting upstream, we will 
>>> probably over time create large whitelists and I am not sure if that 
>>> is a good thing to do.
>>>
>>> Finally, there is a search box that can be used to find out if a 
>>> domain is listed on any of the lists.
>>>
>>>>> If you download and open any of the files, you will see a large 
>>>>> header that includes copyright information and lists all sources 
>>>>> that have been used to create the individual lists. This way we 
>>>>> ensure maximum transparency, comply with the terms of the 
>>>>> individual licenses of the source lists and give credit to the 
>>>>> people who help us to put together the most perfect list for our 
>>>>> users.
>>>>>
>>>>> I would like this to become a project that is not only being used 
>>>>> in IPFire. We can and will be compatible with other solutions like 
>>>>> AdGuard, PiHole so that people can use our lists if they would 
>>>>> like to even though they are not using IPFire. Hopefully, these 
>>>>> users will also feed back to us so that we can improve our lists 
>>>>> over time and make them one of the best options out there.
>>>>>
>>>>> All lists are available as a simple text file that lists the 
>>>>> domains. Then there is a hosts file available as well as a DNS 
>>>>> zone file and an RPZ file. Each list is individually available to 
>>>>> be used in squidGuard and there is a larger tarball available with 
>>>>> all lists that can be used in IPFire’s URL Filter. I am planning 
>>>>> to add Suricata/Snort signatures whenever I have time to do so. 
>>>>> Even though it is not a good idea to filter pornographic content 
>>>>> this way, I suppose that catching malware and blocking DoH are 
>>>>> good use-cases for an IPS. Time will tell…
>>>>>
>>>>> As a start, we will make these lists available in IPFire’s URL 
>>>>> Filter and collect some feedback about how we are doing. 
>>>>> Afterwards, we can see where else we can take this project.
>>>>>
>>>>> If you want to enable this on your system, simply add the URL to 
>>>>> your autoupdate.urls file like here:
>>>>>
>>>>> https://git.ipfire.org/?p=people/ms/ipfire-2.x.git;a=commitdiff;h=bf675bb937faa7617474b3cc84435af3b1f7f45f 
>>>>>
>>>> I also tested out adding the IPFire url to autoupdate.urls and that 
>>>> also worked fine for me.
>>>
>>> Very good. Should we include this already with Core Update 200? I 
>>> don’t think we would break anything, but we might already gain a 
>>> couple more people who are helping us to test this all?
>>
>> I think that would be a good idea.
>>
>>>
>>> The next step would be to build and test our DNS infrastructure. In 
>>> the “How To Use?” Section on the pages of the individual lists, you 
>>> can already see some instructions on how to use the lists as an RPZ. 
>>> In comparison to other “providers”, I would prefer if people would 
>>> be using DNS to fetch the lists. This is simply to push out updates 
>>> in a cheap way for us and also do it very regularly.
>>>
>>> Initially, clients will pull the entire list using AXFR. There is no 
>>> way around this as they need to have the data in the first place. 
>>> After that, clients will only need the changes. As you can see in 
>>> the history, the lists don’t actually change that often. Sometimes 
>>> only once a day and therefore downloading the entire list again 
>>> would be a huge waste of data, both on the client side, but also for 
>>> us hosting then.
>>>
>>> Some other providers update their lists “every 10 minutes”, and 
>>> there won't be any changes whatsoever. We don’t do that. We will 
>>> only export the lists again when they have actually changed. The 
>>> timestamps on the files that we offer using HTTPS can be checked by 
>>> clients so that they won’t re-download the list again if it has not 
>>> been changed. But using HTTPS still means that we would have to 
>>> re-download the entire list and not only the changes.
>>>
>>> Using DNS and IXFR will update the lists by only transferring a few 
>>> kilobytes and therefore we can have clients check once an hour if a 
>>> list has actually changed and only send out the raw changes. That 
>>> way, we will be able to serve millions of clients at very cheap cost 
>>> and they will always have a very up to date list.
>>>
>>> As far as I can see any DNS software that supports RPZs supports 
>>> AXFR/IXFR with exception of Knot Resolver which expects the zone to 
>>> be downloaded externally. There is a ticket for AXFR/IXFR support 
>>> (https://gitlab.nic.cz/knot/knot-resolver/-/issues/195).
>>>
>>> Initially, some of the lists have been *huge* which is why a simple 
>>> HTTP download is not feasible. The porn list was over 100 MiB. We 
>>> could have spent thousands on just traffic alone which I don’t have 
>>> for this kind of project. It would also be unnecessary money being 
>>> spent. There are simply better solutions out there. But then I built 
>>> something that basically tests the data that we are receiving from 
>>> upstream but simply checking if a listed domain still exists. The 
>>> result was very astonishing to me.
>>>
>>> So whenever someone adds a domain to the list, we will (eventually, 
>>> but not immediately) check if we can resolve the domain’s SOA 
>>> record. If not, we mark the domain as non-active and will no longer 
>>> include them in the exported data. This brought down the porn list 
>>> from just under 5 million domains to just 421k. On the sources page 
>>> (https://www.ipfire.org/dnsbl/lists/porn/sources) I am listing the 
>>> percentage of dead domains from each of them and the UT1 list has 
>>> 94% dead domains. Wow.
>>>
>>> If we cannot resolve the domain, neither can our users. So we would 
>>> otherwise fill the lists with tons of domains that simply could 
>>> never be reached. And if they cannot be reached, why would we block 
>>> them? We would waste bandwidth and a lot of memory on each single 
>>> client.
>>>
>>> The other sources have similarly high rations of dead domains. Most 
>>> of them are in the 50-80% range. Therefore I am happy that we are 
>>> doing some extra work here to give our users much better data for 
>>> their filtering.
>>
>> Removing all dead entries sounds like an excellent step.
>>
>> Regards,
>>
>> Adolf.
>>
>>>
>>> So, if you like, please go and check out the RPZ blocking with 
>>> Unbound. Instructions are on the page. I would be happy to hear how 
>>> this is turning out.
>>>
>>> Please let me know if there are any more questions, and I would be 
>>> glad to answer them.
>>>
>>> Happy New Year,
>>> -Michael
>>>
>>>>
>>>> Regards,
>>>> Adolf.
>>>>> This email is just a brain dump from me to this list. I would be 
>>>>> happy to answer any questions about implementation details, etc. 
>>>>> if people are interested. Right now, this email is long enough 
>>>>> already…
>>>>>
>>>>> All the best,
>>>>> -Michael
>>>>
>>>> -- 
>>>> Sent from my laptop
>>>
>>>
>>>
>>
>

-- 
Sent from my laptop

next prev parent reply	other threads:[~2026-01-05 11:31 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-29 12:05 Michael Tremer
2025-12-30 14:05 ` Adolf Belka
2025-12-30 15:49   ` Re[2]: " Jon Murphy
2026-01-02 11:13     ` Michael Tremer
2026-01-02 11:09   ` Michael Tremer
2026-01-02 13:02     ` Adolf Belka
2026-01-05 11:11       ` Adolf Belka
2026-01-05 11:31         ` Adolf Belka [this message]
2026-01-05 11:48           ` Michael Tremer
2026-01-06 10:20             ` Michael Tremer
2026-01-22 11:33               ` Michael Tremer
2026-01-23 15:02                 ` Matthias Fischer
2026-01-23 16:39                   ` Michael Tremer
2026-01-23 18:05                     ` Matthias Fischer
2026-01-24 23:41                     ` Matthias Fischer
2026-01-25 14:40                       ` Michael Tremer
2026-01-25 17:50                         ` Matthias Fischer
2026-01-26 17:18                           ` Michael Tremer
2026-01-28 16:25                             ` Matthias Fischer
2026-01-28 16:33                             ` Matthias Fischer
2026-01-28 16:59                               ` Michael Tremer
2026-01-28 20:25                                 ` Matthias Fischer
2026-01-29 18:20                                   ` Michael Tremer
2026-01-23 19:31                 ` Adam Gibbons
2026-01-25 14:42                   ` Michael Tremer
2025-12-30 15:52 Re[2]: " Jon Murphy
2026-01-02 11:14 ` Michael Tremer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0bc86e25-903a-42a5-a338-72defd31c606@ipfire.org \
    --to=adolf.belka@ipfire.org \
    --cc=development@lists.ipfire.org \
    --cc=michael.tremer@ipfire.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox