Improved rules handling

Rules can now come in 3 different formats:
- AdBlock rules
- Host lists
- Domains lists
All will be converted into domain lists and aggregated
(only AdBlock rules matching a whole domain will be kept).

Subdomains will now be matched if it is a subdomain of any domain of the
rule.
It is way faster (seconds rather than hours!) but less flexible
(although it shouldn't be a problem).
This commit is contained in:
Geoffrey Frogeye 2019-12-03 08:48:12 +01:00
parent c23004fbff
commit 69b82d29fd
Signed by: geoffrey
GPG key ID: D8A7ECA00A8CD3DD
11 changed files with 130 additions and 28 deletions

7
resolve_subdomains.sh Normal file
View file

@ -0,0 +1,7 @@
#!/usr/bin/env bash
# Resolve the CNAME chain of all the known subdomains for later analysis
cat subdomains/*.list | sort -u > temp/all_subdomains.list
./resolve_subdomains.py --input temp/all_subdomains.list --output temp/all_resolved.csv
sort -u temp/all_resolved.csv > temp/all_resolved_sorted.csv