Geoffrey Frogeye
269b8278b5
Worflow: Fixed rules counts
2 years ago
Geoffrey Frogeye
ab7ef609dd
Workflow: Various optimisations and fixes
I forgot to close this one earlier, so:
Closes #7
2 years ago
Geoffrey Frogeye
f3eedcba22
Updated now based on timestamp
Did I forget to add feed_asn.py a few commits ago?
Oh well...
2 years ago
Geoffrey Frogeye
8d94b80fd0
Integrated DNS resolving to workflow
Since the bigger datasets are only updated once a month,
this might help for quick updates.
2 years ago
Geoffrey Frogeye
231bb83667
Threaded feed_dns
Largely disapointing
2 years ago
Geoffrey Frogeye
9050a84670
Read-only mode
2 years ago
Geoffrey Frogeye
e19f666331
Workflow: Automatically import IP ranges from ASN
Closes #9
2 years ago
Geoffrey Frogeye
57416b6e2c
Workflow: POO and individual tables per types
Mostly for performances reasons.
First one to implement threading later.
Second one to speed up the dichotomy,
but it doesn't seem that much better so far.
2 years ago
Geoffrey Frogeye
b076fa6c34
Typo in new source URL
2 years ago
Geoffrey Frogeye
12dcafe606
Added alternate source of Eulerian CNAMES
It was requested so.
It should be temporary, once I have a bigger subdomain list
that shouldn't be required.
2 years ago
Geoffrey Frogeye
1484733a90
Workflow: Small tweaks
2 years ago
Geoffrey Frogeye
55877be891
IP parsing C accelerated, use bytes everywhere
2 years ago
Geoffrey Frogeye
7937496882
Workflow: Base for new one
While I'm automating this you'll need to download the A set from
https://opendata.rapid7.com/sonar.fdns_v2/ to the file a.json.gz.
2 years ago
Geoffrey Frogeye
62e6c9005b
Tracker: intendmedia?
2 years ago
Geoffrey Frogeye
dc44dea505
Optimized IP matching
2 years ago
Geoffrey Frogeye
b634ae5bbd
Updated IP ranges for Criteo
2 years ago
Geoffrey Frogeye
16f8bed887
Tracker: Otto Group
2 years ago
Geoffrey Frogeye
d6df0fd4f9
Tracker: Webtrekk
2 years ago
Geoffrey Frogeye
4dd3d4a64b
Preliminary structure for testing
In preparation of #4
2 years ago
Geoffrey Frogeye
ae71d6b204
Tracker: 2o7
2 years ago
Geoffrey Frogeye
2b0a723c30
Fix log in scripts
Closes #8
2 years ago
Geoffrey Frogeye
0b2eb000c3
FP: ThreatMetrix
2 years ago
Geoffrey Frogeye
cbb0cc6f3b
Rules lists are optional
2 years ago
Geoffrey Frogeye
a5e768fe00
Filtering by IP range
Closes #5
2 years ago
Geoffrey Frogeye
28e33dcc7a
Fixed description generation
2 years ago
Geoffrey Frogeye
95d4535abd
Nitpicking
2 years ago
Geoffrey Frogeye
025370bbbe
Splitted list with curated and not curated
Closes #2
2 years ago
Geoffrey Frogeye
1c20963ffd
Removed third-parties from easyprivacy
2 years ago
Geoffrey Frogeye
188a8f7455
Removed another source of false-positives
2 years ago
Geoffrey Frogeye
f2bab3ca3f
Added contact information
2 years ago
Geoffrey Frogeye
08f25e26ba
Removed false-positive source
Also had edgekey.net for blocking.
Thanks @TorchedPoseidon for the report!
2 years ago
Geoffrey Frogeye
8c744d621e
Removed too restrictive source
Was blocking ssl.ovh.net and akaimi.net
2 years ago
Geoffrey Frogeye
fe5f0c6c05
Added more rule sources
2 years ago
Geoffrey Frogeye
0159c6037c
Improved DNS resolving performances
Also various fixes.
Also some debug stuff, make sure to remove that later.
2 years ago
Geoffrey Frogeye
c609b90390
Append top 1M subdomains rather than replacing it
2 years ago
Geoffrey Frogeye
69b82d29fd
Improved rules handling
Rules can now come in 3 different formats:
- AdBlock rules
- Host lists
- Domains lists
All will be converted into domain lists and aggregated
(only AdBlock rules matching a whole domain will be kept).
Subdomains will now be matched if it is a subdomain of any domain of the
rule.
It is way faster (seconds rather than hours!) but less flexible
(although it shouldn't be a problem).
2 years ago
Geoffrey Frogeye
c23004fbff
Separated DNS resolution from filtering
This effectively removes the parallelism of filtering,
which doubles the processing time (5->8 hours),
but this allows me to toy around with the performances of this step,
which I aim to improve drastically.
3 years ago
Geoffrey Frogeye
7d01d016a5
Can now use AdBlock lists for tracking matching
It's not very performant by itself, especially since pyre2 isn't
maintained nor really compilableinstallable anymore.
The performance seems to have decreased from 200 req/s to 0.2 req/s when
using 512 threads, and to 80 req/s using 64 req/s.
This might or might not be related,as the CPU doesn't seem to be the
bottleneck.
I will probably add support for host-based rules, matching the
subdomains of such hosts (as for now there doesn't seem to be any other
pattern for first-party trackers than subdomains, and this would be a
very broad performace / compatibility with existing lists improvement),
and convert the AdBlock lists to this format, only keeping domains-only
rules.
3 years ago
Geoffrey Frogeye
87bb24c511
Shell typo
3 years ago
Geoffrey Frogeye
300fe8e15e
Added real argument parser
Just so we can have color output when running the script :)
3 years ago
Geoffrey Frogeye
88f0bcc648
Refactored for correct retry logic
3 years ago
Geoffrey Frogeye
b343893c72
Merge branch 'master' of git.frogeye.fr:geoffrey/eulaurarien
3 years ago
Geoffrey Frogeye
ae93593930
Statistics about explicit first-parties
3 years ago
Geoffrey Frogeye
bdc691e647
Upped timeout
3 years ago
Geoffrey Frogeye
08a8eaaada
Use threads not subprocesses
You dumbo
3 years ago
Geoffrey Frogeye
32377229db
Retry failed requests
3 years ago
Geoffrey Frogeye
04fe454d99
Automatically get top 1M subdomains
3 years ago
Geoffrey Frogeye
7df00fc859
Automatically download nameserver list
3 years ago
Geoffrey Frogeye
1bbc17a8ec
Greatly optimized subdomain filtering
3 years ago
Geoffrey Frogeye
00a0020914
Added some delay for websites subdomains collecting
Some websites load their trackers after the page is done loading.
3 years ago