eulaurarien/README.md

84 lines
3.3 KiB
Markdown
Raw Normal View History

2019-11-10 17:14:25 +00:00
# eulaurarien
Generates a host list of first-party trackers for ad-blocking.
**DISCLAIMER:** I'm by no way an expert on this subject so my vocabulary or other stuff might be wrong. Use at your own risk.
## What's a first-party tracker?
Traditionally, websites load trackers scripts directly.
For example, `website1.com` and `website2.com` both load `https://trackercompany.com/trackerscript.js` to track their users.
In order to block those, one can simply block the host `trackercompany.com`.
However, to circumvent this easy block, tracker companies made the website using them load trackers from `somethingirelevant.website1.com`.
The latter being a DNS redirection to `website1.trackercompany.com`, directly pointing to a server serving the tracking script.
Those are the first-party trackers.
Blocking `trackercompany.com` doesn't work any more, and blocking `*.trackercompany.com` isn't really possible since:
1. Most ad-blocker don't support wildcards
2. It's a DNS redirection, meaning that most ad-blockers will only see `somethingirelevant.website1.com`
So the only solution is to block every `somethingirelevant.website1.com`-like subdomains known, which is a lot.
That's where this scripts comes in, to generate a list of such subdomains.
## How does this script work
It takes an input a list of websites with trackers included.
So far, this list is manually-generated from the list of clients of such first-party trackers
(latter we should use a general list of websites to be more exhaustive).
It open each ones of those websites (just the homepage) in a web browser, and record the domains of the network requests the page makes.
Additionaly, or alternatively, you can feed the script some browsing history and get domains from there.
2019-11-10 17:14:25 +00:00
It then find the DNS redirections of those domains, and compare with regexes of known tracking domains.
It finally outputs the matching ones.
## Requirements
Just to build the list, you can find an already-built list in the releases.
- Bash
- Python 3.4+
- [progressbar2](https://pypi.org/project/progressbar2/)
- dnspython
(if you don't want to collect the subdomains, you can skip the following)
2019-11-10 17:14:25 +00:00
- Firefox
- Selenium
- seleniumwire
## Usage
### Add personal sources
The list of websites provided in this script is by no mean exhaustive,
so adding your own browsing history will help create a better list.
Here's reference command for possible sources:
- **Pi-hole**: `sqlite3 /etc/pihole-FTL.db "select distinct domain from queries" > /path/to/eulaurarien/subdomains/my-pihole.custom.list`
- **Firefox**: `cp ~/.mozilla/firefox/<your_profile>.default/places.sqlite temp; sqlite3 temp "select distinct rev_host from moz_places" | rev | sed 's|^\.||' > /path/to/eulaurarien/subdomains/my-firefox.custom.list`
### Collect subdomains from websites
This step is optional if you already added personal sources.
Just run `collect_subdomain.sh`.
This is a long step, and might be memory-intensive from time to time.
### Extract tracking domains
Make sure your system is configured with a DNS server without limitation.
Then, run `filter_subdomain.sh`.
The files you need will be in the folder `dist`.
2019-11-10 17:29:16 +00:00
2019-11-10 17:14:25 +00:00
## Contributing
### Adding websites
Just add the URL to the relevant list: `websites/<source>.list`.
2019-11-10 17:14:25 +00:00
### Adding first-party trackers regex
Just add them to `regexes.py`.