Added possibility to add personal sources
This commit is contained in:
parent
333ae4eb66
commit
a0a2af281f
2
.gitignore
vendored
2
.gitignore
vendored
|
@ -1,3 +1 @@
|
|||
*.list
|
||||
!websites.list
|
||||
*.log
|
||||
|
|
36
README.md
36
README.md
|
@ -27,8 +27,10 @@ That's where this scripts comes in, to generate a list of such subdomains.
|
|||
It takes an input a list of websites with trackers included.
|
||||
So far, this list is manually-generated from the list of clients of such first-party trackers
|
||||
(latter we should use a general list of websites to be more exhaustive).
|
||||
|
||||
It open each ones of those websites (just the homepage) in a web browser, and record the domains of the network requests the page makes.
|
||||
|
||||
Additionaly, or alternatively, you can feed the script some browsing history and get domains from there.
|
||||
|
||||
It then find the DNS redirections of those domains, and compare with regexes of known tracking domains.
|
||||
It finally outputs the matching ones.
|
||||
|
||||
|
@ -38,19 +40,43 @@ Just to build the list, you can find an already-built list in the releases.
|
|||
|
||||
- Bash
|
||||
- Python 3.4+
|
||||
- [progressbar2](https://pypi.org/project/progressbar2/)
|
||||
- dnspython
|
||||
|
||||
(if you don't want to collect the subdomains, you can skip the following)
|
||||
|
||||
- Firefox
|
||||
- Selenium
|
||||
- seleniumwire
|
||||
- dnspython
|
||||
- [progressbar2](https://pypi.org/project/progressbar2/)
|
||||
|
||||
And then just run `eulaurarien.sh`.
|
||||
## Usage
|
||||
|
||||
### Add personal sources
|
||||
|
||||
The list of websites provided in this script is by no mean exhaustive,
|
||||
so adding your own browsing history will help create a better list.
|
||||
Here's reference command for possible sources:
|
||||
|
||||
- **Pi-hole**: `sqlite3 /etc/pihole-FTL.db "select distinct domain from queries" > /path/to/eulaurarien/subdomains/my-pihole.custom.list`
|
||||
- **Firefox**: `cp ~/.mozilla/firefox/<your_profile>.default/places.sqlite temp; sqlite3 temp "select distinct rev_host from moz_places" | rev | sed 's|^\.||' > /path/to/eulaurarien/subdomains/my-firefox.custom.list`
|
||||
|
||||
### Collect subdomains from websites
|
||||
|
||||
This step is optional if you already added personal sources.
|
||||
Just run `collect_subdomain.sh`.
|
||||
This is a long step, and might be memory-intensive from time to time.
|
||||
|
||||
### Extract tracking domains
|
||||
|
||||
Make sure your system is configured with a DNS server without limitation.
|
||||
Then, run `filter_subdomain.sh`.
|
||||
The files you need will be in the folder `dist`.
|
||||
|
||||
## Contributing
|
||||
|
||||
### Adding websites
|
||||
|
||||
Just add them to `websites.list`.
|
||||
Just add the URL to the relevant list: `websites/<source>.list`.
|
||||
|
||||
### Adding first-party trackers regex
|
||||
|
||||
|
|
7
collect_subdomains.sh
Executable file
7
collect_subdomains.sh
Executable file
|
@ -0,0 +1,7 @@
|
|||
#!/usr/bin/env bash
|
||||
|
||||
# Get all subdomains accessed by each website in the website list
|
||||
|
||||
cat websites/*.list | sort -u > temp/all_websites.list
|
||||
./collect_subdomains.py temp/all_websites.list > temp/subdomains_from_websites.list
|
||||
sort -u temp/subdomains_from_websites.list > subdomains/from_websites.cache.list
|
1
dist/.gitignore
vendored
Normal file
1
dist/.gitignore
vendored
Normal file
|
@ -0,0 +1 @@
|
|||
*.txt
|
|
@ -2,21 +2,6 @@
|
|||
|
||||
# Main script for eulaurarien
|
||||
|
||||
# Get all subdomains accessed by each website in the website list
|
||||
./collect_subdomains.py websites.list > subdomains.list
|
||||
sort -u subdomains.list > subdomains.sorted.list
|
||||
./collect_subdomains.sh
|
||||
./filter_subdomains.sh
|
||||
|
||||
# Filter out the subdomains not pointing to a first-party tracker
|
||||
./filter_subdomains.py subdomains.sorted.list > toblock.list
|
||||
sort -u toblock.list > toblock.sorted.list
|
||||
|
||||
# Format the blocklist so it can be used as a hostlist
|
||||
|
||||
(
|
||||
echo "# First party trackers"
|
||||
echo "# List generated on $(date -Isec) by eulaurarien $(git describe --tags --dirty)"
|
||||
cat toblock.sorted.list | while read host;
|
||||
do
|
||||
echo "0.0.0.0 $host"
|
||||
done
|
||||
) > toblock.hosts.list
|
||||
|
|
18
filter_subdomains.sh
Executable file
18
filter_subdomains.sh
Executable file
|
@ -0,0 +1,18 @@
|
|||
#!/usr/bin/env bash
|
||||
|
||||
# Filter out the subdomains not pointing to a first-party tracker
|
||||
|
||||
cat subdomains/*.list | sort -u > temp/all_subdomains.list
|
||||
./filter_subdomains.py temp/all_subdomains.list > temp/all_toblock.list
|
||||
sort -u temp/all_toblock.list > dist/firstparty-trackers.txt
|
||||
|
||||
# Format the blocklist so it can be used as a hostlist
|
||||
|
||||
(
|
||||
echo "# First-party trackers"
|
||||
echo "# List generated on $(date -Isec) by eulaurarien $(git describe --tags --dirty)"
|
||||
cat dist/firstparty-trackers.txt | while read host;
|
||||
do
|
||||
echo "0.0.0.0 $host"
|
||||
done
|
||||
) > dist/firstparty-trackers-hosts.txt
|
|
@ -4,6 +4,8 @@
|
|||
List of regex matching first-party trackers.
|
||||
"""
|
||||
|
||||
# Syntax: https://docs.python.org/3/library/re.html#regular-expression-syntax
|
||||
|
||||
REGEXES = [
|
||||
r'^.+\.eulerian\.net\.$',
|
||||
r'^.+\.criteo\.com\.$',
|
||||
|
|
2
subdomains/.gitignore
vendored
Normal file
2
subdomains/.gitignore
vendored
Normal file
|
@ -0,0 +1,2 @@
|
|||
*.custom.list
|
||||
*.cache.list
|
1
temp/.gitignore
vendored
Normal file
1
temp/.gitignore
vendored
Normal file
|
@ -0,0 +1 @@
|
|||
*.list
|
1
websites/.gitignore
vendored
Normal file
1
websites/.gitignore
vendored
Normal file
|
@ -0,0 +1 @@
|
|||
*.custom.list
|
Loading…
Reference in a new issue