From: Adolf Belka <adolf.belka@ipfire.org>
To: Michael Tremer <michael.tremer@ipfire.org>
Cc: "IPFire: Development-List" <development@lists.ipfire.org>
Subject: Re: Let's launch our own blocklists...
Date: Mon, 5 Jan 2026 12:11:40 +0100 [thread overview]
Message-ID: <5936cb35-c243-4b0f-843f-e6354226f9be@ipfire.org> (raw)
In-Reply-To: <9ac9c734-51fb-4152-bc0b-d2442d03d42a@ipfire.org>
Hi Michael,
I have found that the malware list includes duckduckgo.com
Regards,
Adolf.
On 02/01/2026 14:02, Adolf Belka wrote:
> Hi,
>
> On 02/01/2026 12:09, Michael Tremer wrote:
>> Hello,
>>
>>> On 30 Dec 2025, at 14:05, Adolf Belka <adolf.belka@ipfire.org> wrote:
>>>
>>> Hi Michael,
>>>
>>> On 29/12/2025 13:05, Michael Tremer wrote:
>>>> Hello everyone,
>>>>
>>>> I hope everyone had a great Christmas and a couple of quiet days to
>>>> relax from all the stress that was the year 2025.
>>> Still relaxing.
>>
>> Very good, so let’s have a strong start into 2026 now!
>
> Starting next week, yes.
>
>>
>>>> Having a couple of quieter days, I have been working on a new,
>>>> little (hopefully) side project that has probably been high up on
>>>> our radar since the Shalla list has shut down in 2020, or maybe
>>>> even earlier. The goal of the project is to provide good lists with
>>>> categories of domain names which are usually used to block access
>>>> to these domains.
>>>>
>>>> I simply call this IPFire DNSBL which is short for IPFire DNS
>>>> Blocklists.
>>>>
>>>> How did we get here?
>>>>
>>>> As stated before, the URL filter feature in IPFire has the problem
>>>> that there are not many good blocklists available any more. There
>>>> used to be a couple more - most famously the Shalla list - but we
>>>> are now down to a single list from the University of Toulouse. It
>>>> is a great list, but it is not always the best fit for all users.
>>>>
>>>> Then there has been talk about whether we could implement more
>>>> blocking features into IPFire that don’t involve the proxy. Most
>>>> famously blocking over DNS. The problem here remains a the blocking
>>>> feature is only as good as the data that is fed into it. Some
>>>> people have been putting forward a number of lists that were
>>>> suitable for them, but they would not have replaced the blocking
>>>> functionality as we know it. Their aim is to provide “one list for
>>>> everything” but that is not what people usually want. It is
>>>> targeted at a classic home user and the only separation that is
>>>> being made is any adult/porn/NSFW content which usually is put into
>>>> a separate list.
>>>>
>>>> It would have been technically possible to include these lists and
>>>> let the users decide, but that is not the aim of IPFire. We want to
>>>> do the job for the user so that their job is getting easier.
>>>> Including obscure lists that don’t have a clear outline of what
>>>> they actually want to block (“bad content” is not a category) and
>>>> passing the burden of figuring out whether they need the “Light”,
>>>> “Normal”, “Pro”, “Pro++”, “Ultimate” or even a “Venti” list with
>>>> cream on top is really not going to work. It is all confusing and
>>>> will lead to a bad user experience.
>>>>
>>>> An even bigger problem that is however completely impossible to
>>>> solve is bad licensing of these lists. A user has asked the
>>>> publisher of the HaGeZi list whether they could be included in
>>>> IPFire and under what terms. The response was that the list is
>>>> available under the terms of the GNU General Public License v3, but
>>>> that does not seem to be true. The list contains data from various
>>>> sources. Many of them are licensed under incompatible licenses (CC
>>>> BY-SA 4.0, MPL, Apache2, …) and unless there is a non-public
>>>> agreement that this data may be redistributed, there is a huge
>>>> legal issue here. We would expose our users to potential copyright
>>>> infringement which we cannot do under any circumstances.
>>>> Furthermore many lists are available under a non-commercial license
>>>> which excludes them from being used in any kind of business. Plenty
>>>> of IPFire systems are running in businesses, if not even the vast
>>>> majority.
>>>>
>>>> In short, these lists are completely unusable for us. Apart from
>>>> HaGeZi, I consider OISD to have the same problem.
>>>>
>>>> Enough about all the things that are bad. Let’s talk about the new,
>>>> good things:
>>>>
>>>> Many blacklists on the internet are an amalgamation of other lists.
>>>> These lists vary in quality with some of them being not that good
>>>> and without a clear focus and others being excellent data. Since we
>>>> don’t have the man power to start from scratch, I felt that we can
>>>> copy the concept that HaGeZi and OISD have started and simply
>>>> create a new list that is based on other lists at the beginning to
>>>> have a good starting point. That way, we have much better control
>>>> over what is going on these lists and we can shape and mould them
>>>> as we need them. Most importantly, we don’t create a single lists,
>>>> but many lists that have a clear focus and allow users to choose
>>>> what they want to block and what not.
>>>>
>>>> So the current experimental stage that I am in has these lists:
>>>>
>>>> * Ads
>>>> * Dating
>>>> * DoH
>>>> * Gambling
>>>> * Malware
>>>> * Porn
>>>> * Social
>>>> * Violence
>>>>
>>>> The categories have been determined by what source lists we have
>>>> available with good data and are compatible with our chosen license
>>>> CC BY-SA 4.0. This is the same license that we are using for the
>>>> IPFire Location database, too.
>>>>
>>>> The main use-cases for any kind of blocking are to comply with
>>>> legal requirements in networks with children (i.e. schools) to
>>>> remove any kind of pornographic content, sometimes block social
>>>> media as well. Gambling and violence are commonly blocked, too.
>>>> Even more common would be filtering advertising and any malicious
>>>> content.
>>>>
>>>> The latter is especially difficult because so many source lists
>>>> throw phishing, spyware, malvertising, tracking and other things
>>>> into the same bucket. Here this is currently all in the malware
>>>> list which has therefore become quite large. I am not sure whether
>>>> this will stay like this in the future or if we will have to make
>>>> some adjustments, but that is exactly why this is now entering some
>>>> larger testing.
>>>>
>>>> What has been built so far? In order to put these lists together
>>>> properly, track any data about where it is coming from, I have
>>>> built a tool in Python available here:
>>>>
>>>> https://git.ipfire.org/?p=dnsbl.git;a=summary
>>>>
>>>> This tool will automatically update all lists once an hour if there
>>>> have been any changes and export them in various formats. The
>>>> exported lists are available for download here:
>>>>
>>>> https://dnsbl.ipfire.org/lists/
>>> The download using dnsbl.ipfire.org/lists/squidguard.tar.gz as the
>>> custom url works fine.
>>>
>>> However you need to remember not to put the https:// at the front of
>>> the url otherwise the WUI page completes without any error messages
>>> but leaves an error message in the system logs saying
>>>
>>> URL filter blacklist - ERROR: Not a valid URL filter blacklist
>>>
>>> I found this out the hard way.
>>
>> Oh yes, I forgot that there is a field on the web UI. If that does
>> not accept https:// as a prefix, please file a bug and we will fix it.
>
> I will confirm it and raise a bug.
>
>>
>>> The other thing I noticed is that if you already have the Toulouse
>>> University list downloaded and you then change to the ipfire custom
>>> url then all the existing Toulouse blocklists stay in the directory
>>> on IPFire and so you end up with a huge number of category tick
>>> boxes, most of which are the old Toulouse ones, which are still
>>> available to select and it is not clear which ones are from Toulouse
>>> and which ones from IPFire.
>>
>> Yes, I got the same thing, too. I think this is a bug, too, because
>> otherwise you would have a lot of unused categories lying around that
>> will never be updated. You cannot even tell which ones are from the
>> current list and which ones from the old list.
>>
>> Long-term we could even consider to remove the Univ. Toulouse list
>> entirely and only have our own lists available which would make the
>> problem go away.
>>
>>> I think if the blocklist URL source is changed or a custom url is
>>> provided the first step should be to remove the old ones already
>>> existing.
>>> That might be a problem because users can also create their own
>>> blocklists and I believe those go into the same directory.
>>
>> Good thought. We of course cannot delete the custom lists.
>>
>>> Without clearing out the old blocklists you end up with a huge
>>> number of checkboxes for lists but it is not clear what happens if
>>> there is a category that has the same name for the Toulouse list and
>>> the IPFire list such as gambling. I will have a look at that and see
>>> what happens.
>>>
>>> Not sure what the best approach to this is.
>>
>> I believe it is removing all old content.
>>
>>> Manually deleting all contents of the urlfilter/blacklists/
>>> directory and then selecting the IPFire blocklist url for the custom
>>> url I end up with only the 8 categories from the IPFire list.
>>>
>>> I have tested some gambling sites from the IPFire list and the block
>>> worked on some. On others the site no longer exists so there is
>>> nothing to block or has been changed to an https site and in that
>>> case it went straight through. Also if I chose the http version of
>>> the link, it was automatically changed to https and went through
>>> without being blocked.
>>
>> The entire IPFire infrastructure always requires HTTPS. If you start
>> using HTTP, you will be automatically redirected. It is 2026 and we
>> don’t need to talk HTTP any more :)
>
> Some of the domains in the gambling list (maybe quite a lot) seem to
> only have an http access. If I tried https it came back with the fact
> that it couldn't find it.
>
>>
>> I am glad to hear that the list is actually blocking. It would have
>> been bad if it didn’t. Now we have the big task to check out the
>> “quality” - however that can be determined. I think this is what
>> needs some time…
>>
>> In the meantime I have set up a small page on our website:
>>
>> https://www.ipfire.org/dnsbl
>>
>> I would like to run this as a first-class project inside IPFire like
>> we are doing with IPFire Location. That means that we need to tell
>> people about what we are doing. Hopefully this page is a little start.
>>
>> Initially it has a couple of high-level bullet points about what we
>> are trying to achieve. I don’t think the text is very good, yet, but
>> it is the best I had in that moment. There is then also a list of the
>> lists that we currently offer. For each list, a detailed page will
>> tell you about the license, how many domains are listed, when the
>> last update has been, the sources and even there is a history page
>> that shows all the changes whenever they have happened.
>>
>> Finally there is a section that explains “How To Use?” the list which
>> I would love to extend to include AdGuard Plus and things like that
>> as well as Pi-Hole and whatever else could use the list. In a later
>> step we should go ahead and talk to any projects to include our
>> list(s) into their dropdown so that people can enable them nice and
>> easy.
>>
>> Behind the web page there is an API service that is running on the
>> host that is running the DNSBL. The frontend web app that is running
>> www.ipfire.org <http://www.ipfire.org/> is connecting to that API
>> service to fetch the current lists, any details and so on. That way,
>> we can split the logic and avoid creating a huge monolith of a web
>> app. This also means that page could be down a little as I am still
>> working on the entire thing and will frequently restart it.
>>
>> The API documentation is available here and the API is publicly
>> available: https://api.dnsbl.ipfire.org/docs
>>
>> The website/API allows to file reports for anything that does not
>> seem to be right on any of the lists. I would like to keep it as an
>> open process, however, long-term, this cannot cost us any time. In
>> the current stage, the reports are getting filed and that is about
>> it. I still need to build out some way for admins or moderators (I am
>> not sure what kind of roles I want to have here) to accept or reject
>> those reports.
>>
>> In case of us receiving a domain from a source list, I would rather
>> like to submit a report to upstream for them to de-list. That way, we
>> don’t have any admin to do and we are contributing back to other
>> list. That would be a very good thing to do. We cannot however throw
>> tons of emails at some random upstream projects without co-ordinating
>> this first. By not reporting upstream, we will probably over time
>> create large whitelists and I am not sure if that is a good thing to do.
>>
>> Finally, there is a search box that can be used to find out if a
>> domain is listed on any of the lists.
>>
>>>> If you download and open any of the files, you will see a large
>>>> header that includes copyright information and lists all sources
>>>> that have been used to create the individual lists. This way we
>>>> ensure maximum transparency, comply with the terms of the
>>>> individual licenses of the source lists and give credit to the
>>>> people who help us to put together the most perfect list for our
>>>> users.
>>>>
>>>> I would like this to become a project that is not only being used
>>>> in IPFire. We can and will be compatible with other solutions like
>>>> AdGuard, PiHole so that people can use our lists if they would like
>>>> to even though they are not using IPFire. Hopefully, these users
>>>> will also feed back to us so that we can improve our lists over
>>>> time and make them one of the best options out there.
>>>>
>>>> All lists are available as a simple text file that lists the
>>>> domains. Then there is a hosts file available as well as a DNS zone
>>>> file and an RPZ file. Each list is individually available to be
>>>> used in squidGuard and there is a larger tarball available with all
>>>> lists that can be used in IPFire’s URL Filter. I am planning to add
>>>> Suricata/Snort signatures whenever I have time to do so. Even
>>>> though it is not a good idea to filter pornographic content this
>>>> way, I suppose that catching malware and blocking DoH are good
>>>> use-cases for an IPS. Time will tell…
>>>>
>>>> As a start, we will make these lists available in IPFire’s URL
>>>> Filter and collect some feedback about how we are doing.
>>>> Afterwards, we can see where else we can take this project.
>>>>
>>>> If you want to enable this on your system, simply add the URL to
>>>> your autoupdate.urls file like here:
>>>>
>>>> https://git.ipfire.org/?p=people/ms/ipfire-2.x.git;a=commitdiff;h=bf675bb937faa7617474b3cc84435af3b1f7f45f
>>> I also tested out adding the IPFire url to autoupdate.urls and that
>>> also worked fine for me.
>>
>> Very good. Should we include this already with Core Update 200? I
>> don’t think we would break anything, but we might already gain a
>> couple more people who are helping us to test this all?
>
> I think that would be a good idea.
>
>>
>> The next step would be to build and test our DNS infrastructure. In
>> the “How To Use?” Section on the pages of the individual lists, you
>> can already see some instructions on how to use the lists as an RPZ.
>> In comparison to other “providers”, I would prefer if people would be
>> using DNS to fetch the lists. This is simply to push out updates in a
>> cheap way for us and also do it very regularly.
>>
>> Initially, clients will pull the entire list using AXFR. There is no
>> way around this as they need to have the data in the first place.
>> After that, clients will only need the changes. As you can see in the
>> history, the lists don’t actually change that often. Sometimes only
>> once a day and therefore downloading the entire list again would be a
>> huge waste of data, both on the client side, but also for us hosting
>> then.
>>
>> Some other providers update their lists “every 10 minutes”, and there
>> won't be any changes whatsoever. We don’t do that. We will only
>> export the lists again when they have actually changed. The
>> timestamps on the files that we offer using HTTPS can be checked by
>> clients so that they won’t re-download the list again if it has not
>> been changed. But using HTTPS still means that we would have to
>> re-download the entire list and not only the changes.
>>
>> Using DNS and IXFR will update the lists by only transferring a few
>> kilobytes and therefore we can have clients check once an hour if a
>> list has actually changed and only send out the raw changes. That
>> way, we will be able to serve millions of clients at very cheap cost
>> and they will always have a very up to date list.
>>
>> As far as I can see any DNS software that supports RPZs supports
>> AXFR/IXFR with exception of Knot Resolver which expects the zone to
>> be downloaded externally. There is a ticket for AXFR/IXFR support
>> (https://gitlab.nic.cz/knot/knot-resolver/-/issues/195).
>>
>> Initially, some of the lists have been *huge* which is why a simple
>> HTTP download is not feasible. The porn list was over 100 MiB. We
>> could have spent thousands on just traffic alone which I don’t have
>> for this kind of project. It would also be unnecessary money being
>> spent. There are simply better solutions out there. But then I built
>> something that basically tests the data that we are receiving from
>> upstream but simply checking if a listed domain still exists. The
>> result was very astonishing to me.
>>
>> So whenever someone adds a domain to the list, we will (eventually,
>> but not immediately) check if we can resolve the domain’s SOA record.
>> If not, we mark the domain as non-active and will no longer include
>> them in the exported data. This brought down the porn list from just
>> under 5 million domains to just 421k. On the sources page
>> (https://www.ipfire.org/dnsbl/lists/porn/sources) I am listing the
>> percentage of dead domains from each of them and the UT1 list has 94%
>> dead domains. Wow.
>>
>> If we cannot resolve the domain, neither can our users. So we would
>> otherwise fill the lists with tons of domains that simply could never
>> be reached. And if they cannot be reached, why would we block them?
>> We would waste bandwidth and a lot of memory on each single client.
>>
>> The other sources have similarly high rations of dead domains. Most
>> of them are in the 50-80% range. Therefore I am happy that we are
>> doing some extra work here to give our users much better data for
>> their filtering.
>
> Removing all dead entries sounds like an excellent step.
>
> Regards,
>
> Adolf.
>
>>
>> So, if you like, please go and check out the RPZ blocking with
>> Unbound. Instructions are on the page. I would be happy to hear how
>> this is turning out.
>>
>> Please let me know if there are any more questions, and I would be
>> glad to answer them.
>>
>> Happy New Year,
>> -Michael
>>
>>>
>>> Regards,
>>> Adolf.
>>>> This email is just a brain dump from me to this list. I would be
>>>> happy to answer any questions about implementation details, etc. if
>>>> people are interested. Right now, this email is long enough already…
>>>>
>>>> All the best,
>>>> -Michael
>>>
>>> --
>>> Sent from my laptop
>>
>>
>>
>
--
Sent from my laptop
next prev parent reply other threads:[~2026-01-05 11:11 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-29 12:05 Michael Tremer
2025-12-30 14:05 ` Adolf Belka
2025-12-30 15:49 ` Re[2]: " Jon Murphy
2026-01-02 11:13 ` Michael Tremer
2026-01-02 11:09 ` Michael Tremer
2026-01-02 13:02 ` Adolf Belka
2026-01-05 11:11 ` Adolf Belka [this message]
2026-01-05 11:31 ` Adolf Belka
2026-01-05 11:48 ` Michael Tremer
2026-01-06 10:20 ` Michael Tremer
2026-01-22 11:33 ` Michael Tremer
2026-01-23 15:02 ` Matthias Fischer
2026-01-23 16:39 ` Michael Tremer
2026-01-23 18:05 ` Matthias Fischer
2026-01-24 23:41 ` Matthias Fischer
2026-01-25 14:40 ` Michael Tremer
2026-01-25 17:50 ` Matthias Fischer
2026-01-26 17:18 ` Michael Tremer
2026-01-28 16:25 ` Matthias Fischer
2026-01-28 16:33 ` Matthias Fischer
2026-01-28 16:59 ` Michael Tremer
2026-01-28 20:25 ` Matthias Fischer
2026-01-29 18:20 ` Michael Tremer
2026-01-23 19:31 ` Adam Gibbons
2026-01-25 14:42 ` Michael Tremer
2025-12-30 15:52 Re[2]: " Jon Murphy
2026-01-02 11:14 ` Michael Tremer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5936cb35-c243-4b0f-843f-e6354226f9be@ipfire.org \
--to=adolf.belka@ipfire.org \
--cc=development@lists.ipfire.org \
--cc=michael.tremer@ipfire.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox