2019-11-10 18:14:25 +01:00
# eulaurarien
Generates a host list of first-party trackers for ad-blocking.
2019-11-11 12:10:46 +01:00
The latest list is available here: < https: / / hostfiles . frogeye . fr / firstparty-trackers-hosts . txt >
2019-11-10 18:14:25 +01:00
**DISCLAIMER:** I'm by no way an expert on this subject so my vocabulary or other stuff might be wrong. Use at your own risk.
## What's a first-party tracker?
Traditionally, websites load trackers scripts directly.
For example, `website1.com` and `website2.com` both load `https://trackercompany.com/trackerscript.js` to track their users.
In order to block those, one can simply block the host `trackercompany.com` .
However, to circumvent this easy block, tracker companies made the website using them load trackers from `somethingirelevant.website1.com` .
The latter being a DNS redirection to `website1.trackercompany.com` , directly pointing to a server serving the tracking script.
Those are the first-party trackers.
Blocking `trackercompany.com` doesn't work any more, and blocking `*.trackercompany.com` isn't really possible since:
1. Most ad-blocker don't support wildcards
2. It's a DNS redirection, meaning that most ad-blockers will only see `somethingirelevant.website1.com`
So the only solution is to block every `somethingirelevant.website1.com` -like subdomains known, which is a lot.
That's where this scripts comes in, to generate a list of such subdomains.
## How does this script work
It takes an input a list of websites with trackers included.
So far, this list is manually-generated from the list of clients of such first-party trackers
(latter we should use a general list of websites to be more exhaustive).
It open each ones of those websites (just the homepage) in a web browser, and record the domains of the network requests the page makes.
2019-11-11 11:19:46 +01:00
Additionaly, or alternatively, you can feed the script some browsing history and get domains from there.
2019-11-10 18:14:25 +01:00
It then find the DNS redirections of those domains, and compare with regexes of known tracking domains.
It finally outputs the matching ones.
## Requirements
Just to build the list, you can find an already-built list in the releases.
- Bash
2019-11-11 12:41:22 +01:00
- [Python 3.4+ ](https://www.python.org/ )
2019-11-11 11:19:46 +01:00
- [progressbar2 ](https://pypi.org/project/progressbar2/ )
- dnspython
2019-11-15 08:57:31 +01:00
- [A Python wrapper for re2 ](https://pypi.org/project/google-re2/ ) (optional, just speeds things up)
2019-11-11 11:19:46 +01:00
(if you don't want to collect the subdomains, you can skip the following)
2019-11-10 18:14:25 +01:00
- Firefox
- Selenium
- seleniumwire
2019-11-11 11:19:46 +01:00
## Usage
2019-11-11 12:10:46 +01:00
This is only if you want to build the list yourself.
If you just want to use the list, the latest build is available here: < https: / / hostfiles . frogeye . fr / firstparty-trackers-hosts . txt >
It was build using additional sources not included in this repository for privacy reasons.
2019-11-11 11:19:46 +01:00
### Add personal sources
The list of websites provided in this script is by no mean exhaustive,
so adding your own browsing history will help create a better list.
Here's reference command for possible sources:
- **Pi-hole**: `sqlite3 /etc/pihole-FTL.db "select distinct domain from queries" > /path/to/eulaurarien/subdomains/my-pihole.custom.list`
2019-11-11 12:10:46 +01:00
- **Firefox**: `cp ~/.mozilla/firefox/<your_profile>.default/places.sqlite temp; sqlite3 temp "select distinct rev_host from moz_places" | rev | sed 's|^\.||' > /path/to/eulaurarien/subdomains/my-firefox.custom.list; rm temp`
2019-11-11 11:19:46 +01:00
### Collect subdomains from websites
Just run `collect_subdomain.sh` .
This is a long step, and might be memory-intensive from time to time.
2019-11-11 12:10:46 +01:00
This step is optional if you already added personal sources.
Alternatively, you can get just download the list of subdomains used to generate the official block list here: < https: // hostfiles . frogeye . fr / from_websites . cache . list > (put it in the `subdomains` folder).
2019-11-11 11:19:46 +01:00
### Extract tracking domains
Make sure your system is configured with a DNS server without limitation.
Then, run `filter_subdomain.sh` .
The files you need will be in the folder `dist` .
2019-11-10 18:29:16 +01:00
2019-11-10 18:14:25 +01:00
## Contributing
### Adding websites
2019-11-11 11:19:46 +01:00
Just add the URL to the relevant list: `websites/<source>.list` .
2019-11-10 18:14:25 +01:00
### Adding first-party trackers regex
Just add them to `regexes.py` .