Commit Graph

35 Commits (fe5f0c6c05613ec47e48086d6eed12797a825bce)

Author SHA1 Message Date
Geoffrey Frogeye fe5f0c6c05
Added more rule sources 2019-12-03 17:33:46 +01:00
Geoffrey Frogeye 0159c6037c
Improved DNS resolving performances
Also various fixes.
Also some debug stuff, make sure to remove that later.
2019-12-03 15:35:21 +01:00
Geoffrey Frogeye c609b90390 Append top 1M subdomains rather than replacing it 2019-12-03 09:04:19 +01:00
Geoffrey Frogeye 69b82d29fd
Improved rules handling
Rules can now come in 3 different formats:
- AdBlock rules
- Host lists
- Domains lists
All will be converted into domain lists and aggregated
(only AdBlock rules matching a whole domain will be kept).

Subdomains will now be matched if it is a subdomain of any domain of the
rule.
It is way faster (seconds rather than hours!) but less flexible
(although it shouldn't be a problem).
2019-12-03 08:48:12 +01:00
Geoffrey Frogeye c23004fbff
Separated DNS resolution from filtering
This effectively removes the parallelism of filtering,
which doubles the processing time (5->8 hours),
but this allows me to toy around with the performances of this step,
which I aim to improve drastically.
2019-12-02 19:03:08 +01:00
Geoffrey Frogeye 7d01d016a5 Can now use AdBlock lists for tracking matching
It's not very performant by itself, especially since pyre2 isn't
maintained nor really compilableinstallable anymore.

The performance seems to have decreased from 200 req/s to 0.2 req/s when
using 512 threads, and to 80 req/s using 64 req/s.
This might or might not be related,as the CPU doesn't seem to be the
bottleneck.

I will probably add support for host-based rules, matching the
subdomains of such hosts (as for now there doesn't seem to be any other
pattern for first-party trackers than subdomains, and this would be a
very broad performace / compatibility with existing lists improvement),
and convert the AdBlock lists to this format, only keeping domains-only
rules.
2019-11-15 08:57:31 +01:00
Geoffrey Frogeye 87bb24c511 Shell typo 2019-11-14 15:40:25 +01:00
Geoffrey Frogeye 300fe8e15e Added real argument parser
Just so we can have color output when running the script :)
2019-11-14 15:37:32 +01:00
Geoffrey Frogeye 88f0bcc648 Refactored for correct retry logic 2019-11-14 15:03:20 +01:00
Geoffrey Frogeye b343893c72 Merge branch 'master' of git.frogeye.fr:geoffrey/eulaurarien 2019-11-14 13:45:42 +01:00
Geoffrey Frogeye ae93593930 Statistics about explicit first-parties 2019-11-14 13:31:39 +01:00
Geoffrey Frogeye bdc691e647 Upped timeout 2019-11-14 13:10:14 +01:00
Geoffrey Frogeye 08a8eaaada Use threads not subprocesses
You dumbo
2019-11-14 12:57:06 +01:00
Geoffrey Frogeye 32377229db Retry failed requests 2019-11-14 11:35:05 +01:00
Geoffrey Frogeye 04fe454d99 Automatically get top 1M subdomains 2019-11-14 11:23:59 +01:00
Geoffrey Frogeye 7df00fc859 Automatically download nameserver list 2019-11-14 10:56:53 +01:00
Geoffrey Frogeye 1bbc17a8ec Greatly optimized subdomain filtering 2019-11-14 10:45:06 +01:00
Geoffrey Frogeye 00a0020914 Added some delay for websites subdomains collecting
Some websites load their trackers after the page is done loading.
2019-11-14 06:29:24 +01:00
Geoffrey Frogeye 56374e3223 Added RED by SFR website 2019-11-13 18:14:56 +01:00
Geoffrey Frogeye b17a24c047 Added more trackers and their clients 2019-11-12 13:58:17 +01:00
Geoffrey Frogeye 1c86255bb9 Added list of websites containing EA_data 2019-11-11 15:44:03 +01:00
Geoffrey Frogeye 7a7a3642a5 Added number of trackers in output 2019-11-11 13:00:14 +01:00
Geoffrey Frogeye 4e69bdbfc3 CI Test commit 2 2019-11-11 12:41:22 +01:00
Geoffrey Frogeye aab8e93abe CI Test commit 1 2019-11-11 12:31:32 +01:00
Geoffrey Frogeye e0f28d41d2 Added public updated list link 2019-11-11 12:10:46 +01:00
Geoffrey Frogeye a0a2af281f Added possibility to add personal sources 2019-11-11 11:19:46 +01:00
Geoffrey Frogeye 333ae4eb66 Fixed tracker list 2019-11-10 23:58:49 +01:00
Geoffrey Frogeye 0df749f1e0 Added more trackers 2019-11-10 23:29:30 +01:00
Geoffrey Frogeye b81c7c17ee Loosely error-proofed subdomain collection 2019-11-10 23:22:21 +01:00
Geoffrey Frogeye ed72f643fd Updated website list 2019-11-10 23:16:18 +01:00
Geoffrey Frogeye c409c2cf9b More error-proofing 2019-11-10 23:07:21 +01:00
Geoffrey Frogeye 0801bd9e44 Error-proofed DNS-resolution 2019-11-10 22:18:27 +01:00
Geoffrey Frogeye 2f1af3c850 Added progressbar and ETA 2019-11-10 21:59:06 +01:00
Geoffrey Frogeye d49a7803e9 Fixed typos 2019-11-10 18:29:16 +01:00
Geoffrey Frogeye 80b23e2d5c Initial commit 2019-11-10 18:14:25 +01:00