Added possibility to add personal sources
This commit is contained in:
parent
333ae4eb66
commit
a0a2af281f
2
.gitignore
vendored
2
.gitignore
vendored
|
@ -1,3 +1 @@
|
||||||
*.list
|
|
||||||
!websites.list
|
|
||||||
*.log
|
*.log
|
||||||
|
|
36
README.md
36
README.md
|
@ -27,8 +27,10 @@ That's where this scripts comes in, to generate a list of such subdomains.
|
||||||
It takes an input a list of websites with trackers included.
|
It takes an input a list of websites with trackers included.
|
||||||
So far, this list is manually-generated from the list of clients of such first-party trackers
|
So far, this list is manually-generated from the list of clients of such first-party trackers
|
||||||
(latter we should use a general list of websites to be more exhaustive).
|
(latter we should use a general list of websites to be more exhaustive).
|
||||||
|
|
||||||
It open each ones of those websites (just the homepage) in a web browser, and record the domains of the network requests the page makes.
|
It open each ones of those websites (just the homepage) in a web browser, and record the domains of the network requests the page makes.
|
||||||
|
|
||||||
|
Additionaly, or alternatively, you can feed the script some browsing history and get domains from there.
|
||||||
|
|
||||||
It then find the DNS redirections of those domains, and compare with regexes of known tracking domains.
|
It then find the DNS redirections of those domains, and compare with regexes of known tracking domains.
|
||||||
It finally outputs the matching ones.
|
It finally outputs the matching ones.
|
||||||
|
|
||||||
|
@ -38,19 +40,43 @@ Just to build the list, you can find an already-built list in the releases.
|
||||||
|
|
||||||
- Bash
|
- Bash
|
||||||
- Python 3.4+
|
- Python 3.4+
|
||||||
|
- [progressbar2](https://pypi.org/project/progressbar2/)
|
||||||
|
- dnspython
|
||||||
|
|
||||||
|
(if you don't want to collect the subdomains, you can skip the following)
|
||||||
|
|
||||||
- Firefox
|
- Firefox
|
||||||
- Selenium
|
- Selenium
|
||||||
- seleniumwire
|
- seleniumwire
|
||||||
- dnspython
|
|
||||||
- [progressbar2](https://pypi.org/project/progressbar2/)
|
|
||||||
|
|
||||||
And then just run `eulaurarien.sh`.
|
## Usage
|
||||||
|
|
||||||
|
### Add personal sources
|
||||||
|
|
||||||
|
The list of websites provided in this script is by no mean exhaustive,
|
||||||
|
so adding your own browsing history will help create a better list.
|
||||||
|
Here's reference command for possible sources:
|
||||||
|
|
||||||
|
- **Pi-hole**: `sqlite3 /etc/pihole-FTL.db "select distinct domain from queries" > /path/to/eulaurarien/subdomains/my-pihole.custom.list`
|
||||||
|
- **Firefox**: `cp ~/.mozilla/firefox/<your_profile>.default/places.sqlite temp; sqlite3 temp "select distinct rev_host from moz_places" | rev | sed 's|^\.||' > /path/to/eulaurarien/subdomains/my-firefox.custom.list`
|
||||||
|
|
||||||
|
### Collect subdomains from websites
|
||||||
|
|
||||||
|
This step is optional if you already added personal sources.
|
||||||
|
Just run `collect_subdomain.sh`.
|
||||||
|
This is a long step, and might be memory-intensive from time to time.
|
||||||
|
|
||||||
|
### Extract tracking domains
|
||||||
|
|
||||||
|
Make sure your system is configured with a DNS server without limitation.
|
||||||
|
Then, run `filter_subdomain.sh`.
|
||||||
|
The files you need will be in the folder `dist`.
|
||||||
|
|
||||||
## Contributing
|
## Contributing
|
||||||
|
|
||||||
### Adding websites
|
### Adding websites
|
||||||
|
|
||||||
Just add them to `websites.list`.
|
Just add the URL to the relevant list: `websites/<source>.list`.
|
||||||
|
|
||||||
### Adding first-party trackers regex
|
### Adding first-party trackers regex
|
||||||
|
|
||||||
|
|
7
collect_subdomains.sh
Executable file
7
collect_subdomains.sh
Executable file
|
@ -0,0 +1,7 @@
|
||||||
|
#!/usr/bin/env bash
|
||||||
|
|
||||||
|
# Get all subdomains accessed by each website in the website list
|
||||||
|
|
||||||
|
cat websites/*.list | sort -u > temp/all_websites.list
|
||||||
|
./collect_subdomains.py temp/all_websites.list > temp/subdomains_from_websites.list
|
||||||
|
sort -u temp/subdomains_from_websites.list > subdomains/from_websites.cache.list
|
1
dist/.gitignore
vendored
Normal file
1
dist/.gitignore
vendored
Normal file
|
@ -0,0 +1 @@
|
||||||
|
*.txt
|
|
@ -2,21 +2,6 @@
|
||||||
|
|
||||||
# Main script for eulaurarien
|
# Main script for eulaurarien
|
||||||
|
|
||||||
# Get all subdomains accessed by each website in the website list
|
./collect_subdomains.sh
|
||||||
./collect_subdomains.py websites.list > subdomains.list
|
./filter_subdomains.sh
|
||||||
sort -u subdomains.list > subdomains.sorted.list
|
|
||||||
|
|
||||||
# Filter out the subdomains not pointing to a first-party tracker
|
|
||||||
./filter_subdomains.py subdomains.sorted.list > toblock.list
|
|
||||||
sort -u toblock.list > toblock.sorted.list
|
|
||||||
|
|
||||||
# Format the blocklist so it can be used as a hostlist
|
|
||||||
|
|
||||||
(
|
|
||||||
echo "# First party trackers"
|
|
||||||
echo "# List generated on $(date -Isec) by eulaurarien $(git describe --tags --dirty)"
|
|
||||||
cat toblock.sorted.list | while read host;
|
|
||||||
do
|
|
||||||
echo "0.0.0.0 $host"
|
|
||||||
done
|
|
||||||
) > toblock.hosts.list
|
|
||||||
|
|
18
filter_subdomains.sh
Executable file
18
filter_subdomains.sh
Executable file
|
@ -0,0 +1,18 @@
|
||||||
|
#!/usr/bin/env bash
|
||||||
|
|
||||||
|
# Filter out the subdomains not pointing to a first-party tracker
|
||||||
|
|
||||||
|
cat subdomains/*.list | sort -u > temp/all_subdomains.list
|
||||||
|
./filter_subdomains.py temp/all_subdomains.list > temp/all_toblock.list
|
||||||
|
sort -u temp/all_toblock.list > dist/firstparty-trackers.txt
|
||||||
|
|
||||||
|
# Format the blocklist so it can be used as a hostlist
|
||||||
|
|
||||||
|
(
|
||||||
|
echo "# First-party trackers"
|
||||||
|
echo "# List generated on $(date -Isec) by eulaurarien $(git describe --tags --dirty)"
|
||||||
|
cat dist/firstparty-trackers.txt | while read host;
|
||||||
|
do
|
||||||
|
echo "0.0.0.0 $host"
|
||||||
|
done
|
||||||
|
) > dist/firstparty-trackers-hosts.txt
|
|
@ -4,6 +4,8 @@
|
||||||
List of regex matching first-party trackers.
|
List of regex matching first-party trackers.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
|
# Syntax: https://docs.python.org/3/library/re.html#regular-expression-syntax
|
||||||
|
|
||||||
REGEXES = [
|
REGEXES = [
|
||||||
r'^.+\.eulerian\.net\.$',
|
r'^.+\.eulerian\.net\.$',
|
||||||
r'^.+\.criteo\.com\.$',
|
r'^.+\.criteo\.com\.$',
|
||||||
|
|
2
subdomains/.gitignore
vendored
Normal file
2
subdomains/.gitignore
vendored
Normal file
|
@ -0,0 +1,2 @@
|
||||||
|
*.custom.list
|
||||||
|
*.cache.list
|
1
temp/.gitignore
vendored
Normal file
1
temp/.gitignore
vendored
Normal file
|
@ -0,0 +1 @@
|
||||||
|
*.list
|
1
websites/.gitignore
vendored
Normal file
1
websites/.gitignore
vendored
Normal file
|
@ -0,0 +1 @@
|
||||||
|
*.custom.list
|
Loading…
Reference in a new issue