66ac52c5db
Workflow: JSON parser acceleration
...
Sadly is even worse because of the ctypes-induced conversions.
2019-12-09 10:42:37 +01:00
55877be891
IP parsing C accelerated, use bytes everywhere
2019-12-09 09:47:48 +01:00
7937496882
Workflow: Base for new one
...
While I'm automating this you'll need to download the A set from
https://opendata.rapid7.com/sonar.fdns_v2/ to the file a.json.gz.
2019-12-09 08:12:48 +01:00
62e6c9005b
Tracker: intendmedia?
2019-12-08 01:32:49 +01:00
dc44dea505
Optimized IP matching
2019-12-08 01:23:36 +01:00
b634ae5bbd
Updated IP ranges for Criteo
2019-12-07 23:23:39 +01:00
16f8bed887
Tracker: Otto Group
2019-12-07 21:30:15 +01:00
d6df0fd4f9
Tracker: Webtrekk
2019-12-07 21:21:33 +01:00
4dd3d4a64b
Preliminary structure for testing
...
In preparation of #4
2019-12-07 19:19:37 +01:00
ae71d6b204
Tracker: 2o7
2019-12-07 19:17:18 +01:00
2b0a723c30
Fix log in scripts
...
Closes #8
2019-12-07 18:45:48 +01:00
0b2eb000c3
FP: ThreatMetrix
2019-12-07 18:23:11 +01:00
cbb0cc6f3b
Rules lists are optional
2019-12-07 18:22:20 +01:00
a5e768fe00
Filtering by IP range
...
Closes #5
2019-12-07 13:56:04 +01:00
28e33dcc7a
Fixed description generation
2019-12-05 20:51:53 +01:00
95d4535abd
Nitpicking
2019-12-05 19:38:26 +01:00
025370bbbe
Splitted list with curated and not curated
...
Closes #2
2019-12-05 19:15:24 +01:00
1c20963ffd
Removed third-parties from easyprivacy
2019-12-05 01:19:10 +01:00
188a8f7455
Removed another source of false-positives
2019-12-05 00:50:32 +01:00
f2bab3ca3f
Added contact information
2019-12-03 21:45:29 +01:00
08f25e26ba
Removed false-positive source
...
Also had edgekey.net for blocking.
Thanks @TorchedPoseidon for the report!
2019-12-03 21:27:37 +01:00
8c744d621e
Removed too restrictive source
...
Was blocking ssl.ovh.net and akaimi.net
2019-12-03 18:43:23 +01:00
fe5f0c6c05
Added more rule sources
2019-12-03 17:33:46 +01:00
0159c6037c
Improved DNS resolving performances
...
Also various fixes.
Also some debug stuff, make sure to remove that later.
2019-12-03 15:35:21 +01:00
c609b90390
Append top 1M subdomains rather than replacing it
2019-12-03 09:04:19 +01:00
69b82d29fd
Improved rules handling
...
Rules can now come in 3 different formats:
- AdBlock rules
- Host lists
- Domains lists
All will be converted into domain lists and aggregated
(only AdBlock rules matching a whole domain will be kept).
Subdomains will now be matched if it is a subdomain of any domain of the
rule.
It is way faster (seconds rather than hours!) but less flexible
(although it shouldn't be a problem).
2019-12-03 08:48:12 +01:00
c23004fbff
Separated DNS resolution from filtering
...
This effectively removes the parallelism of filtering,
which doubles the processing time (5->8 hours),
but this allows me to toy around with the performances of this step,
which I aim to improve drastically.
2019-12-02 19:03:08 +01:00
7d01d016a5
Can now use AdBlock lists for tracking matching
...
It's not very performant by itself, especially since pyre2 isn't
maintained nor really compilableinstallable anymore.
The performance seems to have decreased from 200 req/s to 0.2 req/s when
using 512 threads, and to 80 req/s using 64 req/s.
This might or might not be related,as the CPU doesn't seem to be the
bottleneck.
I will probably add support for host-based rules, matching the
subdomains of such hosts (as for now there doesn't seem to be any other
pattern for first-party trackers than subdomains, and this would be a
very broad performace / compatibility with existing lists improvement),
and convert the AdBlock lists to this format, only keeping domains-only
rules.
2019-11-15 08:57:31 +01:00
87bb24c511
Shell typo
2019-11-14 15:40:25 +01:00
300fe8e15e
Added real argument parser
...
Just so we can have color output when running the script :)
2019-11-14 15:37:32 +01:00
88f0bcc648
Refactored for correct retry logic
2019-11-14 15:03:20 +01:00
b343893c72
Merge branch 'master' of git.frogeye.fr:geoffrey/eulaurarien
2019-11-14 13:45:42 +01:00
ae93593930
Statistics about explicit first-parties
2019-11-14 13:31:39 +01:00
bdc691e647
Upped timeout
2019-11-14 13:10:14 +01:00
08a8eaaada
Use threads not subprocesses
...
You dumbo
2019-11-14 12:57:06 +01:00
32377229db
Retry failed requests
2019-11-14 11:35:05 +01:00
04fe454d99
Automatically get top 1M subdomains
2019-11-14 11:23:59 +01:00
7df00fc859
Automatically download nameserver list
2019-11-14 10:56:53 +01:00
1bbc17a8ec
Greatly optimized subdomain filtering
2019-11-14 10:45:06 +01:00
00a0020914
Added some delay for websites subdomains collecting
...
Some websites load their trackers after the page is done loading.
2019-11-14 06:29:24 +01:00
56374e3223
Added RED by SFR website
2019-11-13 18:14:56 +01:00
b17a24c047
Added more trackers and their clients
2019-11-12 13:58:17 +01:00
1c86255bb9
Added list of websites containing EA_data
2019-11-11 15:44:03 +01:00
7a7a3642a5
Added number of trackers in output
2019-11-11 13:00:14 +01:00
4e69bdbfc3
CI Test commit 2
2019-11-11 12:41:22 +01:00
aab8e93abe
CI Test commit 1
2019-11-11 12:31:32 +01:00
e0f28d41d2
Added public updated list link
2019-11-11 12:10:46 +01:00
a0a2af281f
Added possibility to add personal sources
2019-11-11 11:19:46 +01:00
333ae4eb66
Fixed tracker list
2019-11-10 23:58:49 +01:00
0df749f1e0
Added more trackers
2019-11-10 23:29:30 +01:00