Use public DNS datasets #7

New issue

Closed

opened 2019-12-05 16:01:16 +01:00 by geoffrey · 3 comments

geoffrey commented

2019-12-05 16:01:16 +01:00

Owner

Using datasets like this one could make obsolete DNS resolution (5 hours of resolution -> 20 minutes of download) and offer a bigger list of known subdomains (6 M -> 150M+).

Those are very big and probably won't fit in the disk space I have available for this project and certainly not on memory. I will need to get creative for multi-layer redirection, here's some ideas:

Store them in a sorted and/or indexed file (easy since they're provided sorted), generate the blocklist by fake-resolving them
Multiple passes: only store the blocking ones, and on the next pass consider what was blocked previously as another blocklist. Need some pruning mechanism.

Using datasets like [this one](https://opendata.rapid7.com/sonar.fdns_v2/) could make obsolete DNS resolution (5 hours of resolution -> 20 minutes of download) and offer a bigger list of known subdomains (6 M -> 150M+). Those are very big and probably won't fit in the disk space I have available for this project and certainly not on memory. I will need to get creative for multi-layer redirection, here's some ideas: - Store them in a sorted and/or indexed file (easy since they're provided sorted), generate the blocklist by fake-resolving them - Multiple passes: only store the blocking ones, and on the next pass consider what was blocked previously as another blocklist. Need some pruning mechanism.

geoffrey added the

enhancement

label 2019-12-05 16:01:16 +01:00

geoffrey referenced this issue from a commit

2019-12-13 18:38:44 +01:00

Workflow: Various optimisations and fixes I forgot to close this one earlier, so: Closes #7

geoffrey added a new dependency 2019-12-13 18:52:47 +01:00

#11 Sign up for up-to-date datasets

Ghost commented

2019-12-18 22:48:20 +01:00

A tool you may want to check out is PyFunceble

https://github.com/funilrys/PyFunceble

A tool you may want to check out is PyFunceble https://github.com/funilrys/PyFunceble

geoffrey commented

2019-12-20 13:43:44 +01:00

Author

Owner

Sorry, but where might this tool be relevant for this project?

I could certainly filter the output domains that this program produces, however:

An unavailable domain in the block list won't cause any harm. Removing them won't help a lot for blocking performance.
Sometimes the domains are soon to be in production, filtering them out early might causes misses the time the list takes to update.
The logic to determine if a domain is available seems complex, trackers could make PyFunceble think a tracking domain is not available while the tracking scripts still works (especially since I only track the domain name, not the script URLs)

I could be wrong since I don't really know the project, but I don't see the point anyway :/

Sorry, but where might this tool be relevant for this project? I could certainly filter the output domains that this program produces, however: 1. An unavailable domain in the block list won't cause any harm. Removing them won't help a lot for blocking performance. 2. Sometimes the domains are soon to be in production, filtering them out early might causes misses the time the list takes to update. 3. The logic to determine if a domain is available seems complex, trackers could make PyFunceble think a tracking domain is not available while the tracking scripts still works (especially since I only track the domain name, not the script URLs) I could be wrong since I don't really know the project, but I don't see the point anyway :/

geoffrey closed this issue

2019-12-20 17:18:52 +01:00

geoffrey commented

2019-12-20 17:35:04 +01:00

Author

Owner

(I closed because I fixed the original issue but it's still open for discussion :) )