Commit graph

11 commits

Author SHA1 Message Date
b310ca2fc2
Clever pruning mechanism 2019-12-25 14:54:57 +01:00
c65ae94892
Added ability to use Rapid7 API
Closes #11
2019-12-24 15:08:18 +01:00
7d1c1a1d54
Implement pruning 2019-12-21 19:38:20 +01:00
aca5023c3f
Fixed scripting around 2019-12-18 13:01:32 +01:00
5023b85d7c
Added intermediate representation for DNS datasets
It's just CSV.
The DNS from the datasets are not ordered consistently,
so we need to parse it completly.
It seems that converting to an IR before sending data to ./feed_dns.py
through a pipe is faster than decoding the JSON in ./feed_dns.py.
This will also reduce the storage of the resolved subdomains by
about 15% (compressed).
2019-12-13 21:59:35 +01:00
8d94b80fd0
Integrated DNS resolving to workflow
Since the bigger datasets are only updated once a month,
this might help for quick updates.
2019-12-13 13:38:23 +01:00
2b0a723c30
Fix log in scripts
Closes #8
2019-12-07 18:45:48 +01:00
fe5f0c6c05
Added more rule sources 2019-12-03 17:33:46 +01:00
0159c6037c
Improved DNS resolving performances
Also various fixes.
Also some debug stuff, make sure to remove that later.
2019-12-03 15:35:21 +01:00
c609b90390 Append top 1M subdomains rather than replacing it 2019-12-03 09:04:19 +01:00
69b82d29fd
Improved rules handling
Rules can now come in 3 different formats:
- AdBlock rules
- Host lists
- Domains lists
All will be converted into domain lists and aggregated
(only AdBlock rules matching a whole domain will be kept).

Subdomains will now be matched if it is a subdomain of any domain of the
rule.
It is way faster (seconds rather than hours!) but less flexible
(although it shouldn't be a problem).
2019-12-03 08:48:12 +01:00