Use public DNS datasets #7
Using datasets like this one could make obsolete DNS resolution (5 hours of resolution -> 20 minutes of download) and offer a bigger list of known subdomains (6 M -> 150M+).
Those are very big and probably won't fit in the disk space I have available for this project and certainly not on memory. I will need to get creative for multi-layer redirection, here's some ideas:
- Store them in a sorted and/or indexed file (easy since they're provided sorted), generate the blocklist by fake-resolving them
- Multiple passes: only store the blocking ones, and on the next pass consider what was blocked previously as another blocklist. Need some pruning mechanism.
A tool you may want to check out is PyFunceble
Sorry, but where might this tool be relevant for this project?
I could certainly filter the output domains that this program produces, however:
- An unavailable domain in the block list won't cause any harm. Removing them won't help a lot for blocking performance.
- Sometimes the domains are soon to be in production, filtering them out early might causes misses the time the list takes to update.
- The logic to determine if a domain is available seems complex, trackers could make PyFunceble think a tracking domain is not available while the tracking scripts still works (especially since I only track the domain name, not the script URLs)
I could be wrong since I don't really know the project, but I don't see the point anyway :/
No due date set.
Deleting a branch is permanent. Although the deleted branch may exist for a short time before cleaning up, in most cases it CANNOT be undone. Continue?