Using datasets like this one could make obsolete DNS resolution (5 hours of resolution -> 20 minutes of download) and offer a bigger list of known subdomains (6 M -> 150M+).
Those are very big and probably won't fit in the disk space I have available for this project and certainly not on memory. I will need to get creative for multi-layer redirection, here's some ideas:
Store them in a sorted and/or indexed file (easy since they're provided sorted), generate the blocklist by fake-resolving them
Multiple passes: only store the blocking ones, and on the next pass consider what was blocked previously as another blocklist. Need some pruning mechanism.
Using datasets like [this one](https://opendata.rapid7.com/sonar.fdns_v2/) could make obsolete DNS resolution (5 hours of resolution -> 20 minutes of download) and offer a bigger list of known subdomains (6 M -> 150M+).
Those are very big and probably won't fit in the disk space I have available for this project and certainly not on memory. I will need to get creative for multi-layer redirection, here's some ideas:
- Store them in a sorted and/or indexed file (easy since they're provided sorted), generate the blocklist by fake-resolving them
- Multiple passes: only store the blocking ones, and on the next pass consider what was blocked previously as another blocklist. Need some pruning mechanism.
Sorry, but where might this tool be relevant for this project?
I could certainly filter the output domains that this program produces, however:
An unavailable domain in the block list won't cause any harm. Removing them won't help a lot for blocking performance.
Sometimes the domains are soon to be in production, filtering them out early might causes misses the time the list takes to update.
The logic to determine if a domain is available seems complex, trackers could make PyFunceble think a tracking domain is not available while the tracking scripts still works (especially since I only track the domain name, not the script URLs)
I could be wrong since I don't really know the project, but I don't see the point anyway :/
Sorry, but where might this tool be relevant for this project?
I could certainly filter the output domains that this program produces, however:
1. An unavailable domain in the block list won't cause any harm. Removing them won't help a lot for blocking performance.
2. Sometimes the domains are soon to be in production, filtering them out early might causes misses the time the list takes to update.
3. The logic to determine if a domain is available seems complex, trackers could make PyFunceble think a tracking domain is not available while the tracking scripts still works (especially since I only track the domain name, not the script URLs)
I could be wrong since I don't really know the project, but I don't see the point anyway :/
Using datasets like this one could make obsolete DNS resolution (5 hours of resolution -> 20 minutes of download) and offer a bigger list of known subdomains (6 M -> 150M+).
Those are very big and probably won't fit in the disk space I have available for this project and certainly not on memory. I will need to get creative for multi-layer redirection, here's some ideas:
A tool you may want to check out is PyFunceble
https://github.com/funilrys/PyFunceble
Sorry, but where might this tool be relevant for this project?
I could certainly filter the output domains that this program produces, however:
I could be wrong since I don't really know the project, but I don't see the point anyway :/
(I closed because I fixed the original issue but it's still open for discussion :) )