Commit graph

26 commits

Author SHA1 Message Date
Geoffrey Frogeye cec96b7e50
Add Fukuda & co research paper to test suite 2020-12-06 22:13:05 +01:00
Geoffrey Frogeye 0ecb431728 Add AdGuard for multiparty 2020-12-06 21:01:24 +01:00
Geoffrey Frogeye 171fa93873
Force pv output
Even if redirected to a file
Allow to see progress when ran in a cron or something
2019-12-26 15:38:56 +01:00
Geoffrey Frogeye 1a6e64da3d
Forgot numpy dependency 2019-12-20 21:08:21 +01:00
Geoffrey Frogeye cd46b39756
Merge branch 'newworkflow' 2019-12-20 17:18:42 +01:00
Geoffrey Frogeye 38cf532854
Updated README
Split in two actually (program and list).

Closes #3

Also,
Closes #1
Because I forgot to do it earlier.
2019-12-20 17:15:39 +01:00
Geoffrey Frogeye 4a22054796
Added optional cache for faster IP matching 2019-12-18 21:40:24 +01:00
Geoffrey Frogeye aca5023c3f
Fixed scripting around 2019-12-18 13:01:32 +01:00
Geoffrey Frogeye dce35cb299
Harder verficiation before adding entries to DB 2019-12-17 19:53:05 +01:00
Geoffrey Frogeye 040ce4c14e
Typo in source 2019-12-15 01:52:45 +01:00
Geoffrey Frogeye b50c01f740 Merge branch 'master' into newworkflow 2019-12-15 01:30:03 +01:00
Geoffrey Frogeye d7c239a6f6 Workflow: Some modifications 2019-12-14 16:04:19 +01:00
Geoffrey Frogeye b076fa6c34 Typo in new source URL 2019-12-12 23:28:00 +01:00
Geoffrey Frogeye 12dcafe606
Added alternate source of Eulerian CNAMES
It was requested so.
It should be temporary, once I have a bigger subdomain list
that shouldn't be required.
2019-12-12 19:13:54 +01:00
Geoffrey Frogeye 2b0a723c30
Fix log in scripts
Closes #8
2019-12-07 18:45:48 +01:00
Geoffrey Frogeye 95d4535abd
Nitpicking 2019-12-05 19:38:26 +01:00
Geoffrey Frogeye 188a8f7455
Removed another source of false-positives 2019-12-05 00:50:32 +01:00
Geoffrey Frogeye 08f25e26ba
Removed false-positive source
Also had edgekey.net for blocking.

Thanks @TorchedPoseidon for the report!
2019-12-03 21:27:37 +01:00
Geoffrey Frogeye 8c744d621e
Removed too restrictive source
Was blocking ssl.ovh.net and akaimi.net
2019-12-03 18:43:23 +01:00
Geoffrey Frogeye fe5f0c6c05
Added more rule sources 2019-12-03 17:33:46 +01:00
Geoffrey Frogeye 0159c6037c
Improved DNS resolving performances
Also various fixes.
Also some debug stuff, make sure to remove that later.
2019-12-03 15:35:21 +01:00
Geoffrey Frogeye c609b90390 Append top 1M subdomains rather than replacing it 2019-12-03 09:04:19 +01:00
Geoffrey Frogeye 69b82d29fd
Improved rules handling
Rules can now come in 3 different formats:
- AdBlock rules
- Host lists
- Domains lists
All will be converted into domain lists and aggregated
(only AdBlock rules matching a whole domain will be kept).

Subdomains will now be matched if it is a subdomain of any domain of the
rule.
It is way faster (seconds rather than hours!) but less flexible
(although it shouldn't be a problem).
2019-12-03 08:48:12 +01:00
Geoffrey Frogeye 7d01d016a5 Can now use AdBlock lists for tracking matching
It's not very performant by itself, especially since pyre2 isn't
maintained nor really compilableinstallable anymore.

The performance seems to have decreased from 200 req/s to 0.2 req/s when
using 512 threads, and to 80 req/s using 64 req/s.
This might or might not be related,as the CPU doesn't seem to be the
bottleneck.

I will probably add support for host-based rules, matching the
subdomains of such hosts (as for now there doesn't seem to be any other
pattern for first-party trackers than subdomains, and this would be a
very broad performace / compatibility with existing lists improvement),
and convert the AdBlock lists to this format, only keeping domains-only
rules.
2019-11-15 08:57:31 +01:00
Geoffrey Frogeye 08a8eaaada Use threads not subprocesses
You dumbo
2019-11-14 12:57:06 +01:00
Geoffrey Frogeye 04fe454d99 Automatically get top 1M subdomains 2019-11-14 11:23:59 +01:00