||3 years ago|
|dist||3 years ago|
|nameservers||3 years ago|
|rules||3 years ago|
|rules_adblock||3 years ago|
|rules_asn||3 years ago|
|rules_hosts||3 years ago|
|rules_ip||3 years ago|
|subdomains||3 years ago|
|temp||3 years ago|
|tests||3 years ago|
|websites||3 years ago|
|.gitignore||3 years ago|
|LICENSE||3 years ago|
|README.md||3 years ago|
|adblock_to_domain_list.py||3 years ago|
|collect_subdomains.py||3 years ago|
|collect_subdomains.sh||3 years ago|
|database.py||3 years ago|
|db.py||3 years ago|
|eulaurarien.sh||3 years ago|
|export.py||3 years ago|
|export_lists.sh||3 years ago|
|feed_asn.py||3 years ago|
|feed_dns.py||3 years ago|
|feed_rules.py||3 years ago|
|fetch_resources.sh||3 years ago|
|import_rapid7.sh||3 years ago|
|import_rules.sh||3 years ago|
|resolve_subdomains.sh||3 years ago|
|run_tests.py||3 years ago|
|validate_list.py||3 years ago|
This program is able to generate a list of every hostnames being a DNS redirection to a list of DNS zones and IP networks.
It is primarilyy used to generate Geoffrey Frogeye's block list of first-party trackers (learn about first-party trackers by following this link).
If you want to contribute but don't want to create an account on this forge, contact me the way you like: https://geoffrey.frogeye.fr
How does this work
This program takes as input:
- Lists of hostnames to match
- Lists of DNS zone to match (a domain and their subdomains)
- Lists of IP address / IP networks to match
- Lists of Autonomous System numbers to match
- An enormous quantity of DNS records
It will be able to output hostnames being a DNS redirection to any item in the lists provided.
DNS records can either come from Rapid7 Open Data Sets or can be locally resolved from a list of subdomains using MassDNS.
Those subdomains can either be provided as is, come from Cisco Umbrella Popularity List, from your browsing history, or from analyzing the traffic a web browser makes when opening an URL (the program provides utility to do all that).
Remember you can get an already generated and up-to-date list of first-party trackers from here.
The following is for the people wanting to build their own list.
Depending on the sources you'll be using to generate the list, you'll need to install some of the following:
- Python 3.4+
- coloredlogs (sorry I can't help myself)
- massdns in your
$PATH(only if you have subdomains as a source)
- Firefox (only if you have websites as a source)
- selenium (Python bindings) (only if you have websites as a source)
- selenium-wire (only if you have websites as a source)
Create a new database
The so-called database (in the form of
blocking.p) is a file storing all the matching entities (ASN, IPs, hostnames, zones…) and every entity leading to it.
For now there's no way to remove data from it, so here's the command to recreate it:
Gather external sources
External sources are not stored in this repository.
You'll need to fetch them by running
- Third-party trackers lists
- TLD lists (used to test the validity of hostnames)
- List of public DNS resolvers (for DNS resolving from subdomains)
- Top 1M subdomains
Import rules into the database
You need to put the lists of rules for matching in the different subfolders:
rules: Lists of DNS zones
rules_ip: Lists of IP networks (for IP addresses append
rules_asn: Lists of Autonomous Systems numbers (IP ranges will be deducted from them)
rules_adblock: Lists of DNS zones, but in the form of AdBlock lists (only the ones concerning domains will be extracted)
rules_hosts: Lists of DNS zones, but in the form of hosts lists
See the provided examples for syntax.
In each folder:
first-party.extwill be the only files considered for the first-party variant of the list
*.cache.extare from external sources, and thus might be deleted / overwrote
*.custom.extare for sources that you don't want commited
If you plan to resolve DNS records yourself (as the DNS records datasets are not exhaustive), the top 1M subdomains provided might not be enough.
You can add them into the
It follows the same specificities as the rules folder for
Add personal sources
Adding your own browsing history will help create a more suited subdomains list. Here's reference command for possible sources:
sqlite3 /etc/pihole-FTL.db "select distinct domain from queries" > /path/to/eulaurarien/subdomains/my-pihole.custom.list
cp ~/.mozilla/firefox/<your_profile>.default/places.sqlite temp; sqlite3 temp "select distinct rev_host from moz_places" | rev | sed 's|^\.||' > /path/to/eulaurarien/subdomains/my-firefox.custom.list; rm temp
Collect subdomains from websites
You can add the websites URLs into the
It follows the same specificities as the rules folder for
This is a long step, and might be memory-intensive from time to time.
Note: For first-party tracking, a list of subdomains issued from the websites in the repository is avaliable here: https://hostfiles.frogeye.fr/from_websites.cache.list
Resolve DNS records
Once you've added subdomains, you'll need to resolve them to get their DNS records.
The program will use a list of public nameservers to do that, but you can add your own in the
Note that this is a network intensive process, not in term of bandwith, but in terms of packet number.
Note: Some VPS providers might detect this as a DDoS attack and cut the network access. Some Wi-Fi connections can be rendered unusable for other uses, some routers might cease to work. Since massdns does not support yet rate limiting, my best bet was a Raspberry Pi with a slow ethernet link (Raspberry Pi < 4).
The DNS records will automatically be imported into the database.
If you want to re-import the records without re-doing the resolving, just run the last line of the
Import DNS records from Rapid7
This will download about 35 GiB of data, but only the matching records will be stored (about a few MiB for the tracking rules).
Note the download speed will most likely be limited by the database operation thoughput (a quick RAM will help).
Export the lists
For the tracking list, use
./export_lists.sh, the output will be in the
dist forlder (please change the links before distributing them).
For other purposes, tinker with the