Generates a host list of first-party trackers for ad-blocking.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
Geoffrey Frogeye 80b23e2d5c Initial commit 2 years ago
.gitignore Initial commit 2 years ago
README.md Initial commit 2 years ago
collect_subdomains.py Initial commit 2 years ago
eulaurarien.sh Initial commit 2 years ago
filter_subdomains.py Initial commit 2 years ago
regexes.py Initial commit 2 years ago
websites.list Initial commit 2 years ago

README.md

eulaurarien

Generates a host list of first-party trackers for ad-blocking.

DISCLAIMER: I'm by no way an expert on this subject so my vocabulary or other stuff might be wrong. Use at your own risk.

What's a first-party tracker?

Traditionally, websites load trackers scripts directly. For example, website1.com and website2.com both load https://trackercompany.com/trackerscript.js to track their users. In order to block those, one can simply block the host trackercompany.com.

However, to circumvent this easy block, tracker companies made the website using them load trackers from somethingirelevant.website1.com. The latter being a DNS redirection to website1.trackercompany.com, directly pointing to a server serving the tracking script. Those are the first-party trackers.

Blocking trackercompany.com doesn't work any more, and blocking *.trackercompany.com isn't really possible since:

  1. Most ad-blocker don't support wildcards
  2. It's a DNS redirection, meaning that most ad-blockers will only see somethingirelevant.website1.com

So the only solution is to block every somethingirelevant.website1.com-like subdomains known, which is a lot. That's where this scripts comes in, to generate a list of such subdomains.

How does this script work

It takes an input a list of websites with trackers included. So far, this list is manually-generated from the list of clients of such first-party trackers (latter we should use a general list of websites to be more exhaustive).

It open each ones of those websites (just the homepage) in a web browser, and record the domains of the network requests the page makes. It then find the DNS redirections of those domains, and compare with regexes of known tracking domains. It finally outputs the matching ones.

Requirements

Just to build the list, you can find an already-built list in the releases.

  • Bash
  • Python 3.4+
  • Firefox
  • Selenium
  • seleniumwire
  • dnspython

Contributing

Adding websites

Just add them to websites.list.

Adding first-party trackers regex

Just add them to regexes.py.