Use public DNS datasets #7

Closed
opened 2019-12-05 15:01:16 +00:00 by geoffrey · 3 comments

Using datasets like this one could make obsolete DNS resolution (5 hours of resolution -> 20 minutes of download) and offer a bigger list of known subdomains (6 M -> 150M+).

Those are very big and probably won't fit in the disk space I have available for this project and certainly not on memory. I will need to get creative for multi-layer redirection, here's some ideas:

  • Store them in a sorted and/or indexed file (easy since they're provided sorted), generate the blocklist by fake-resolving them
  • Multiple passes: only store the blocking ones, and on the next pass consider what was blocked previously as another blocklist. Need some pruning mechanism.
Using datasets like [this one](https://opendata.rapid7.com/sonar.fdns_v2/) could make obsolete DNS resolution (5 hours of resolution -> 20 minutes of download) and offer a bigger list of known subdomains (6 M -> 150M+). Those are very big and probably won't fit in the disk space I have available for this project and certainly not on memory. I will need to get creative for multi-layer redirection, here's some ideas: - Store them in a sorted and/or indexed file (easy since they're provided sorted), generate the blocklist by fake-resolving them - Multiple passes: only store the blocking ones, and on the next pass consider what was blocked previously as another blocklist. Need some pruning mechanism.
geoffrey added the
enhancement
label 2019-12-05 15:01:16 +00:00
geoffrey added a new dependency 2019-12-13 17:52:47 +00:00

A tool you may want to check out is PyFunceble

https://github.com/funilrys/PyFunceble

A tool you may want to check out is PyFunceble https://github.com/funilrys/PyFunceble
Poster
Owner

Sorry, but where might this tool be relevant for this project?

I could certainly filter the output domains that this program produces, however:

  1. An unavailable domain in the block list won't cause any harm. Removing them won't help a lot for blocking performance.
  2. Sometimes the domains are soon to be in production, filtering them out early might causes misses the time the list takes to update.
  3. The logic to determine if a domain is available seems complex, trackers could make PyFunceble think a tracking domain is not available while the tracking scripts still works (especially since I only track the domain name, not the script URLs)

I could be wrong since I don't really know the project, but I don't see the point anyway :/

Sorry, but where might this tool be relevant for this project? I could certainly filter the output domains that this program produces, however: 1. An unavailable domain in the block list won't cause any harm. Removing them won't help a lot for blocking performance. 2. Sometimes the domains are soon to be in production, filtering them out early might causes misses the time the list takes to update. 3. The logic to determine if a domain is available seems complex, trackers could make PyFunceble think a tracking domain is not available while the tracking scripts still works (especially since I only track the domain name, not the script URLs) I could be wrong since I don't really know the project, but I don't see the point anyway :/
Poster
Owner

(I closed because I fixed the original issue but it's still open for discussion :) )

(I closed because I fixed the original issue but it's still open for discussion :) )
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Blocks
Reference: geoffrey/eulaurarien#7
There is no content yet.