Compare commits

...

168 commits

Author SHA1 Message Date
Geoffrey Frogeye 3b6f7a58b3
Remove support for Rapid7
They changed their privacy / pricing model and as such I don't have
access to their massive DNS dataset anymore,
even after asking.

Since 2022-01-02, I put the list on freeze while looking for an alternative,
but couldn't find any.
To make the list update again with the remaining DNS sources I have,
I put the last version of the list generated with the Rapid7 dataset
as an input for subdomains, that will now get resolved with MassDNS.
2022-11-13 20:10:27 +01:00
Geoffrey Frogeye 49a36f32f2
Add requirements.txt file 2022-02-26 13:01:11 +01:00
Geoffrey Frogeye 29cf72ae92 Fix most of the README being bold
Why did I go with this Markdown generator again?
2021-08-28 20:58:34 +02:00
Geoffrey Frogeye 998c3faf8f
Add SAS.com 2021-08-22 18:02:37 +02:00
Geoffrey Frogeye c8a14a4e21
Add DataUnlocker 2021-08-22 17:07:25 +02:00
Geoffrey Frogeye 1ec26e7f96
Add Plausible.io 2021-08-22 16:53:58 +02:00
Geoffrey Frogeye 5b49441bc0 Add Branch.io tracker 2021-08-22 16:37:31 +02:00
Geoffrey Frogeye afd122f2ab
Update usage recommendations 2021-08-15 13:04:55 +02:00
Geoffrey Frogeye 6ae3d5fb55
Add Lead Forensics tracker 2021-08-15 11:39:37 +02:00
Geoffrey Frogeye 10a505d84f
Add Fathom 2021-08-15 11:18:35 +02:00
Geoffrey Frogeye c06648da53
Added Pardot tracker 2021-08-15 11:06:53 +02:00
Geoffrey Frogeye f165e5a094
Fix (most) mypy / flake8 errors 2021-08-14 23:35:51 +02:00
Geoffrey Frogeye 3dcccad39a
Black pass 2021-08-14 23:27:28 +02:00
Geoffrey Frogeye a023dc8322
Fix deprecated np.bool 2021-08-14 23:21:03 +02:00
Geoffrey Frogeye 389e83d492
Fix database maximum cache size cap 2021-08-14 23:19:12 +02:00
Geoffrey Frogeye edf444cc28
Add ad-cloud.jp and improve names of Japanese trackers
Closes #19

Names from https://github.com/AdguardTeam/cname-trackers/issues/1
2021-08-14 22:55:58 +02:00
Geoffrey Frogeye fa23d466d2
Actually remove ThreatMetrix
Forgot -i when grepping
2021-08-14 21:55:44 +02:00
Geoffrey Frogeye f5f9f88c42
Remove ThreatMetrix
I received a lot of false positives for this one,
and while I wasn't able to reproduce the issue in most of the cases,
I trust the community.
It's also not in any other CNAME tracker list, probably for the same reason.
Plus, it's apparently not very nasty.
So I'll let it go.

Closes #17
2021-08-14 21:24:48 +02:00
Geoffrey Frogeye 2997e41f98
Investigated >0.5% trackers from Fukuda paper 2020-12-19 13:41:07 +01:00
Geoffrey Frogeye 6cf1028174
Added other tracking source for Adobe
Found on the Adobe documentation and in the wild
https://experienceleague.adobe.com/docs/analytics/implementation/vars/config-vars/trackingserversecure.html?lang=en#s.trackingserversecure-in-appmeasurement-and-launch-custom-code-editor
2020-12-19 13:15:38 +01:00
Geoffrey Frogeye b98a37f9da
Add 1st chain Act-On
To unclobber -only lists
2020-12-07 08:27:20 +01:00
Geoffrey Frogeye 8828d4cf24
Investigated >1% trackers from Fukuda paper 2020-12-07 00:03:58 +01:00
Geoffrey Frogeye 04205dd9fc
Add AdGuard in the distribution README 2020-12-06 23:18:27 +01:00
Geoffrey Frogeye cec96b7e50
Add Fukuda & co research paper to test suite 2020-12-06 22:13:05 +01:00
Geoffrey Frogeye eb1fcefd49
Use more correct terms 2020-12-06 21:29:48 +01:00
Geoffrey Frogeye 0ecb431728 Add AdGuard for multiparty 2020-12-06 21:01:24 +01:00
Geoffrey Frogeye c1619b3cff Add more sources and acknowledgement 2020-12-06 21:01:20 +01:00
Geoffrey Frogeye 2c0286e36b
Add genieesspv.jp CNAME tracker
Closes #18
2020-08-22 10:46:43 +02:00
Geoffrey Frogeye 954bc86eaa
More Tracedock domains
From https://gist.github.com/pietvanzoen/ed7b8322a552542bc00a83ced7332d33
2020-08-08 09:14:09 +02:00
Geoffrey Frogeye b09f861c27
README: Added more reasons the browsers trust first party 2020-01-11 13:01:51 +01:00
Geoffrey Frogeye 9326dc6aca
Added similar projects 2020-01-11 11:43:14 +01:00
Geoffrey Frogeye c803a714fa
I don't know how to write the word “explanation“... 2020-01-11 11:31:16 +01:00
Geoffrey Frogeye b3a3219f93
Improved usage scenarios for different lists 2020-01-11 11:26:54 +01:00
Geoffrey Frogeye fbc06f71bb
Added symlink to latest explaination 2020-01-07 14:37:01 +01:00
Geoffrey Frogeye 63ab7651fc
Disabled RDNS import due to #15 2020-01-07 14:17:38 +01:00
Geoffrey Frogeye 0724feed26
README: Removed help message and fixed category for finder 2020-01-06 16:44:45 +01:00
Geoffrey Frogeye adb07417f5
Fixed import_rapid7 script typo 2020-01-05 22:35:12 +01:00
Geoffrey Frogeye 0cc18303fd
Re-import Rapid7 datasets when rules have been updated 2020-01-04 10:54:46 +01:00
Geoffrey Frogeye 708c53041e
Added two japanese trackers 2020-01-03 22:09:16 +01:00
Geoffrey Frogeye 808e36dde3
Improvements to subdomain collection
I use this for tracker identification so it's not perfect but still it's
a bit better.
2020-01-03 22:08:06 +01:00
Geoffrey Frogeye 2b97ee4cb9
Better list output 2019-12-27 21:46:57 +01:00
Geoffrey Frogeye fd8bfee088
Improved -only variants descriptions 2019-12-27 15:58:20 +01:00
Geoffrey Frogeye e93807142c
Explanations folder 2019-12-27 15:35:30 +01:00
Geoffrey Frogeye a4a908955a
Added index webpage 2019-12-27 15:21:33 +01:00
Geoffrey Frogeye 7e06e98808
Added TraceDock FP tracker
Thought they did change the URL of their load balancers,
guess I was wrong.
2019-12-27 13:43:38 +01:00
Geoffrey Frogeye 4fca68c6f0
Fixed handling of unknown field error 2019-12-27 01:10:21 +01:00
Geoffrey Frogeye 54a9c78534
Handled another error 2019-12-26 20:38:35 +01:00
Geoffrey Frogeye 171fa93873
Force pv output
Even if redirected to a file
Allow to see progress when ran in a cron or something
2019-12-26 15:38:56 +01:00
Geoffrey Frogeye 095e51fad9
Ensure massdns output is lower case
For some reason some server output part of their response as upper case.
This fails the reading process as it's designed to only work on lower
case for performance reasons.
2019-12-26 15:32:24 +01:00
Geoffrey Frogeye 883942ba55
Allow custom massdns path 2019-12-26 00:33:23 +01:00
Geoffrey Frogeye d3b244f317
Forgot one dependency 2019-12-26 00:16:18 +01:00
Geoffrey Frogeye 018f6548ea
Fixed feed_dns not saving in single-threaded mode
Would you believe it, seven hours of processing for nothing
2019-12-26 00:02:01 +01:00
Geoffrey Frogeye 0b9e2d0975
Validate also lower the case of domains 2019-12-25 15:31:20 +01:00
Geoffrey Frogeye 2bcf6cbbf7
Added SINGLE_PROCESS environment variable 2019-12-25 15:15:49 +01:00
Geoffrey Frogeye b310ca2fc2
Clever pruning mechanism 2019-12-25 14:54:57 +01:00
Geoffrey Frogeye bb9e6de62f
Profiling is now optional 2019-12-25 13:52:19 +01:00
Geoffrey Frogeye c543e0eab6
Make multi-processing optional for feed_dns 2019-12-25 13:04:15 +01:00
Geoffrey Frogeye 195f41bd9f
Use smaller cache if it cannot allocate 2019-12-25 13:03:55 +01:00
Geoffrey Frogeye 0e7479e23e
Added handling for IPs too big 2019-12-25 12:35:06 +01:00
Geoffrey Frogeye 9f343ed296
Removed debug print 2019-12-24 15:12:38 +01:00
Geoffrey Frogeye c65ae94892
Added ability to use Rapid7 API
Closes #11
2019-12-24 15:08:18 +01:00
Geoffrey Frogeye 7d1c1a1d54
Implement pruning 2019-12-21 19:38:20 +01:00
Geoffrey Frogeye 1a6e64da3d
Forgot numpy dependency 2019-12-20 21:08:21 +01:00
Geoffrey Frogeye d66040a7b6
Added some litterature
Well not really litterature in the scientific term but still something
to read
2019-12-20 18:22:15 +01:00
Geoffrey Frogeye 57e2919f25
Added information about CORS security issue 2019-12-20 17:58:53 +01:00
Geoffrey Frogeye 94acd106da
Acknwoledgments
Gesundheit
2019-12-20 17:46:24 +01:00
Geoffrey Frogeye 885d92dd77
Added LICENSE 2019-12-20 17:38:26 +01:00
Geoffrey Frogeye 8b7e538677
Updated links
(could not bother guessing them)
2019-12-20 17:24:05 +01:00
Geoffrey Frogeye cd46b39756
Merge branch 'newworkflow' 2019-12-20 17:18:42 +01:00
Geoffrey Frogeye 38cf532854
Updated README
Split in two actually (program and list).

Closes #3

Also,
Closes #1
Because I forgot to do it earlier.
2019-12-20 17:15:39 +01:00
Geoffrey Frogeye 53b14c6ffa
Removed TODO placeholders in commands description
It's better than nothing but not by that much
2019-12-19 08:07:01 +01:00
Geoffrey Frogeye c81be4825c
Automated tests
Very rudimentary but should do the trick

Closes #4
2019-12-18 22:46:00 +01:00
Geoffrey Frogeye 4a22054796
Added optional cache for faster IP matching 2019-12-18 21:40:24 +01:00
Geoffrey Frogeye 06b745890c
Added other first-party trackers 2019-12-18 17:03:05 +01:00
Geoffrey Frogeye aca5023c3f
Fixed scripting around 2019-12-18 13:01:32 +01:00
Geoffrey Frogeye dce35cb299
Harder verficiation before adding entries to DB 2019-12-17 19:53:05 +01:00
Geoffrey Frogeye 747fe46ad0
Script to automatically download from Rapid7 datasets 2019-12-17 15:04:19 +01:00
Geoffrey Frogeye b43cb1725c
Autosave
Not needed but since the import may take multiple hour I get frustrated
if this gets interrupted for some reason.
2019-12-17 15:02:42 +01:00
Geoffrey Frogeye f5c60c482a Merge branch 'master' of git.frogeye.fr:geoffrey/eulaurarien 2019-12-17 14:28:38 +01:00
Geoffrey Frogeye 12ecfa1a5d Added outdated documentation warning in README 2019-12-17 14:28:23 +01:00
Geoffrey Frogeye e882e09b37
Added outdated documentation warning in README 2019-12-17 14:27:43 +01:00
Geoffrey Frogeye d65107f849
Save dupplicates too
Maybe I won't publish them but this will help me for tracking trackers.
2019-12-17 14:10:41 +01:00
Geoffrey Frogeye ea0855bd00
Forgot to push this little guy
Good thing I cleaned up my working directory.
It only exists because pickles created from database.py itself
won't be openable from a file simply importing databse.py.
So we create it when in 'imported state'.
2019-12-17 13:50:39 +01:00
Geoffrey Frogeye 7851b038f5
Reworked rule export 2019-12-17 13:30:24 +01:00
Geoffrey Frogeye 8f6e01c857
Added first_party tracking
Well, tracking if a rule is from a first or a multi rule...
Hope I did not do any mistake
2019-12-16 19:09:02 +01:00
Geoffrey Frogeye c3bf102289
Made references work 2019-12-16 14:18:03 +01:00
Geoffrey Frogeye 03a4042238
Added level
Also fixed IP logic because this was real messed up
2019-12-16 09:31:29 +01:00
Geoffrey Frogeye 3197fa1663
Remove list usage for IpTreeNode 2019-12-16 06:54:18 +01:00
Geoffrey Frogeye a0e68f0848
Reworked match and node system
For level, and first_party later
Next: add get_match to retrieve level of source and have correct levels

... am I going somewhere with all this?
2019-12-15 23:13:25 +01:00
Geoffrey Frogeye aec8d3f8de
Reworked how paths work
Get those tuples out of my eyes
2019-12-15 22:21:05 +01:00
Geoffrey Frogeye 7af2074c7a
Small optimisation of feed_switch 2019-12-15 17:12:44 +01:00
Geoffrey Frogeye 45325782d2
Multi-processed parser 2019-12-15 17:05:41 +01:00
Geoffrey Frogeye ce52897d30
Smol fixes 2019-12-15 16:48:17 +01:00
Geoffrey Frogeye 954b33b2a6
Slightly better Rapid7 parser 2019-12-15 16:38:01 +01:00
Geoffrey Frogeye d976752797
Store Ip4Path as int instead of List[int] 2019-12-15 16:26:18 +01:00
Geoffrey Frogeye 4d966371b2
Workflow: SQL -> Tree
Welp. All that for this.
2019-12-15 15:56:26 +01:00
Geoffrey Frogeye 040ce4c14e
Typo in source 2019-12-15 01:52:45 +01:00
Geoffrey Frogeye b50c01f740 Merge branch 'master' into newworkflow 2019-12-15 01:30:03 +01:00
Geoffrey Frogeye ddceed3d25
Workflow: Can now import DnsMass output
Well, in a specific format but DnsMass nonetheless
2019-12-15 00:28:08 +01:00
Geoffrey Frogeye 189deeb559
Workflow: Multiprocess
Still trying.
It's better than multithread though.

Merge branch 'newworkflow' into newworkflow_threaded
2019-12-14 17:27:46 +01:00
Geoffrey Frogeye d7c239a6f6 Workflow: Some modifications 2019-12-14 16:04:19 +01:00
Geoffrey Frogeye 5023b85d7c
Added intermediate representation for DNS datasets
It's just CSV.
The DNS from the datasets are not ordered consistently,
so we need to parse it completly.
It seems that converting to an IR before sending data to ./feed_dns.py
through a pipe is faster than decoding the JSON in ./feed_dns.py.
This will also reduce the storage of the resolved subdomains by
about 15% (compressed).
2019-12-13 21:59:35 +01:00
Geoffrey Frogeye 269b8278b5
Worflow: Fixed rules counts 2019-12-13 18:36:08 +01:00
Geoffrey Frogeye ab7ef609dd
Workflow: Various optimisations and fixes
I forgot to close this one earlier, so:
Closes #7
2019-12-13 18:08:22 +01:00
Geoffrey Frogeye f3eedcba22
Updated now based on timestamp
Did I forget to add feed_asn.py a few commits ago?
Oh well...
2019-12-13 13:54:00 +01:00
Geoffrey Frogeye 8d94b80fd0
Integrated DNS resolving to workflow
Since the bigger datasets are only updated once a month,
this might help for quick updates.
2019-12-13 13:38:23 +01:00
Geoffrey Frogeye 231bb83667
Threaded feed_dns
Largely disapointing
2019-12-13 12:36:11 +01:00
Geoffrey Frogeye 9050a84670
Read-only mode 2019-12-13 12:35:05 +01:00
Geoffrey Frogeye e19f666331
Workflow: Automatically import IP ranges from ASN
Closes #9
2019-12-13 08:23:38 +01:00
Geoffrey Frogeye 57416b6e2c
Workflow: POO and individual tables per types
Mostly for performances reasons.
First one to implement threading later.
Second one to speed up the dichotomy,
but it doesn't seem that much better so far.
2019-12-13 00:11:21 +01:00
Geoffrey Frogeye b076fa6c34 Typo in new source URL 2019-12-12 23:28:00 +01:00
Geoffrey Frogeye 12dcafe606
Added alternate source of Eulerian CNAMES
It was requested so.
It should be temporary, once I have a bigger subdomain list
that shouldn't be required.
2019-12-12 19:13:54 +01:00
Geoffrey Frogeye 1484733a90 Workflow: Small tweaks 2019-12-09 18:21:08 +01:00
Geoffrey Frogeye 55877be891
IP parsing C accelerated, use bytes everywhere 2019-12-09 09:47:48 +01:00
Geoffrey Frogeye 7937496882
Workflow: Base for new one
While I'm automating this you'll need to download the A set from
https://opendata.rapid7.com/sonar.fdns_v2/ to the file a.json.gz.
2019-12-09 08:12:48 +01:00
Geoffrey Frogeye 62e6c9005b
Tracker: intendmedia? 2019-12-08 01:32:49 +01:00
Geoffrey Frogeye dc44dea505
Optimized IP matching 2019-12-08 01:23:36 +01:00
Geoffrey Frogeye b634ae5bbd
Updated IP ranges for Criteo 2019-12-07 23:23:39 +01:00
Geoffrey Frogeye 16f8bed887
Tracker: Otto Group 2019-12-07 21:30:15 +01:00
Geoffrey Frogeye d6df0fd4f9
Tracker: Webtrekk 2019-12-07 21:21:33 +01:00
Geoffrey Frogeye 4dd3d4a64b
Preliminary structure for testing
In preparation of #4
2019-12-07 19:19:37 +01:00
Geoffrey Frogeye ae71d6b204 Tracker: 2o7 2019-12-07 19:17:18 +01:00
Geoffrey Frogeye 2b0a723c30
Fix log in scripts
Closes #8
2019-12-07 18:45:48 +01:00
Geoffrey Frogeye 0b2eb000c3
FP: ThreatMetrix 2019-12-07 18:23:11 +01:00
Geoffrey Frogeye cbb0cc6f3b Rules lists are optional 2019-12-07 18:22:20 +01:00
Geoffrey Frogeye a5e768fe00
Filtering by IP range
Closes #5
2019-12-07 13:56:04 +01:00
Geoffrey Frogeye 28e33dcc7a
Fixed description generation 2019-12-05 20:51:53 +01:00
Geoffrey Frogeye 95d4535abd
Nitpicking 2019-12-05 19:38:26 +01:00
Geoffrey Frogeye 025370bbbe
Splitted list with curated and not curated
Closes #2
2019-12-05 19:15:24 +01:00
Geoffrey Frogeye 1c20963ffd
Removed third-parties from easyprivacy 2019-12-05 01:19:10 +01:00
Geoffrey Frogeye 188a8f7455
Removed another source of false-positives 2019-12-05 00:50:32 +01:00
Geoffrey Frogeye f2bab3ca3f Added contact information 2019-12-03 21:45:29 +01:00
Geoffrey Frogeye 08f25e26ba
Removed false-positive source
Also had edgekey.net for blocking.

Thanks @TorchedPoseidon for the report!
2019-12-03 21:27:37 +01:00
Geoffrey Frogeye 8c744d621e
Removed too restrictive source
Was blocking ssl.ovh.net and akaimi.net
2019-12-03 18:43:23 +01:00
Geoffrey Frogeye fe5f0c6c05
Added more rule sources 2019-12-03 17:33:46 +01:00
Geoffrey Frogeye 0159c6037c
Improved DNS resolving performances
Also various fixes.
Also some debug stuff, make sure to remove that later.
2019-12-03 15:35:21 +01:00
Geoffrey Frogeye c609b90390 Append top 1M subdomains rather than replacing it 2019-12-03 09:04:19 +01:00
Geoffrey Frogeye 69b82d29fd
Improved rules handling
Rules can now come in 3 different formats:
- AdBlock rules
- Host lists
- Domains lists
All will be converted into domain lists and aggregated
(only AdBlock rules matching a whole domain will be kept).

Subdomains will now be matched if it is a subdomain of any domain of the
rule.
It is way faster (seconds rather than hours!) but less flexible
(although it shouldn't be a problem).
2019-12-03 08:48:12 +01:00
Geoffrey Frogeye c23004fbff
Separated DNS resolution from filtering
This effectively removes the parallelism of filtering,
which doubles the processing time (5->8 hours),
but this allows me to toy around with the performances of this step,
which I aim to improve drastically.
2019-12-02 19:03:08 +01:00
Geoffrey Frogeye 7d01d016a5 Can now use AdBlock lists for tracking matching
It's not very performant by itself, especially since pyre2 isn't
maintained nor really compilableinstallable anymore.

The performance seems to have decreased from 200 req/s to 0.2 req/s when
using 512 threads, and to 80 req/s using 64 req/s.
This might or might not be related,as the CPU doesn't seem to be the
bottleneck.

I will probably add support for host-based rules, matching the
subdomains of such hosts (as for now there doesn't seem to be any other
pattern for first-party trackers than subdomains, and this would be a
very broad performace / compatibility with existing lists improvement),
and convert the AdBlock lists to this format, only keeping domains-only
rules.
2019-11-15 08:57:31 +01:00
Geoffrey Frogeye 87bb24c511 Shell typo 2019-11-14 15:40:25 +01:00
Geoffrey Frogeye 300fe8e15e Added real argument parser
Just so we can have color output when running the script :)
2019-11-14 15:37:32 +01:00
Geoffrey Frogeye 88f0bcc648 Refactored for correct retry logic 2019-11-14 15:03:20 +01:00
Geoffrey Frogeye b343893c72 Merge branch 'master' of git.frogeye.fr:geoffrey/eulaurarien 2019-11-14 13:45:42 +01:00
Geoffrey Frogeye ae93593930 Statistics about explicit first-parties 2019-11-14 13:31:39 +01:00
Geoffrey Frogeye bdc691e647 Upped timeout 2019-11-14 13:10:14 +01:00
Geoffrey Frogeye 08a8eaaada Use threads not subprocesses
You dumbo
2019-11-14 12:57:06 +01:00
Geoffrey Frogeye 32377229db Retry failed requests 2019-11-14 11:35:05 +01:00
Geoffrey Frogeye 04fe454d99 Automatically get top 1M subdomains 2019-11-14 11:23:59 +01:00
Geoffrey Frogeye 7df00fc859 Automatically download nameserver list 2019-11-14 10:56:53 +01:00
Geoffrey Frogeye 1bbc17a8ec Greatly optimized subdomain filtering 2019-11-14 10:45:06 +01:00
Geoffrey Frogeye 00a0020914 Added some delay for websites subdomains collecting
Some websites load their trackers after the page is done loading.
2019-11-14 06:29:24 +01:00
Geoffrey Frogeye 56374e3223 Added RED by SFR website 2019-11-13 18:14:56 +01:00
Geoffrey Frogeye b17a24c047 Added more trackers and their clients 2019-11-12 13:58:17 +01:00
Geoffrey Frogeye 1c86255bb9 Added list of websites containing EA_data 2019-11-11 15:44:03 +01:00
Geoffrey Frogeye 7a7a3642a5 Added number of trackers in output 2019-11-11 13:00:14 +01:00
Geoffrey Frogeye 4e69bdbfc3 CI Test commit 2 2019-11-11 12:41:22 +01:00
Geoffrey Frogeye aab8e93abe CI Test commit 1 2019-11-11 12:31:32 +01:00
Geoffrey Frogeye e0f28d41d2 Added public updated list link 2019-11-11 12:10:46 +01:00
Geoffrey Frogeye a0a2af281f Added possibility to add personal sources 2019-11-11 11:19:46 +01:00
Geoffrey Frogeye 333ae4eb66 Fixed tracker list 2019-11-10 23:58:49 +01:00
Geoffrey Frogeye 0df749f1e0 Added more trackers 2019-11-10 23:29:30 +01:00
Geoffrey Frogeye b81c7c17ee Loosely error-proofed subdomain collection 2019-11-10 23:22:21 +01:00
Geoffrey Frogeye ed72f643fd Updated website list 2019-11-10 23:16:18 +01:00
Geoffrey Frogeye c409c2cf9b More error-proofing 2019-11-10 23:07:21 +01:00
Geoffrey Frogeye 0801bd9e44 Error-proofed DNS-resolution 2019-11-10 22:18:27 +01:00
Geoffrey Frogeye 2f1af3c850 Added progressbar and ETA 2019-11-10 21:59:06 +01:00
Geoffrey Frogeye d49a7803e9 Fixed typos 2019-11-10 18:29:16 +01:00
54 changed files with 3182 additions and 164 deletions

5
.env.default Normal file
View file

@ -0,0 +1,5 @@
CACHE_SIZE=536870912
MASSDNS_HASHMAP_SIZE=1000
PROFILE=0
SINGLE_PROCESS=0
MASSDNS_BINARY=massdns

6
.gitignore vendored
View file

@ -1,3 +1,5 @@
*.list
!websites.list
*.log
*.p
.env
__pycache__
explanations

21
LICENSE Normal file
View file

@ -0,0 +1,21 @@
MIT License
Copyright (c) 2019 Geoffrey 'Frogeye' Preud'homme
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

176
README.md
View file

@ -1,54 +1,162 @@
# eulaurarien
Generates a host list of first-party trackers for ad-blocking.
This program is able to generate a list of every hostnames being a DNS redirection to a list of DNS zones and IP networks.
**DISCLAIMER:** I'm by no way an expert on this subject so my vocabulary or other stuff might be wrong. Use at your own risk.
It is primarilyy used to generate [Geoffrey Frogeye's block list of first-party trackers](https://hostfiles.frogeye.fr) (learn about first-party trackers by following this link).
## What's a first-party tracker?
If you want to contribute but don't want to create an account on this forge, contact me the way you like: <https://geoffrey.frogeye.fr>
Traditionally, websites load trackers scripts directly.
For example, `website1.com` and `website2.com` both load `https://trackercompany.com/trackerscript.js` to track their users.
In order to block those, one can simply block the host `trackercompany.com`.
## How does this work
However, to circumvent this easy block, tracker companies made the website using them load trackers from `somethingirelevant.website1.com`.
The latter being a DNS redirection to `website1.trackercompany.com`, directly pointing to a server serving the tracking script.
Those are the first-party trackers.
This program takes as input:
Blocking `trackercompany.com` doesn't work any more, and blocking `*.trackercompany.com` isn't really possible since:
- Lists of hostnames to match
- Lists of DNS zone to match (a domain and their subdomains)
- Lists of IP address / IP networks to match
- Lists of Autonomous System numbers to match
- An enormous quantity of DNS records
1. Most ad-blocker don't support wildcards
2. It's a DNS redirection, meaning that most ad-blockers will only see `somethingirelevant.website1.com`
It will be able to output hostnames being a DNS redirection to any item in the lists provided.
So the only solution is to block every `somethingirelevant.website1.com`-like subdomains known, which is a lot.
That's where this scripts comes in, to generate a list of such subdomains.
DNS records can be locally resolved from a list of subdomains using [MassDNS](https://github.com/blechschmidt/massdns).
## How does this script work
Those subdomains can either be provided as is, come from [Cisco Umbrella Popularity List](http://s3-us-west-1.amazonaws.com/umbrella-static/index.html), from your browsing history, or from analyzing the traffic a web browser makes when opening an URL (the program provides utility to do all that).
It takes an input a list of websites with trackers included.
So far, this list is manually-generated from the list of clients of such first-party trackers
(latter we should use a general list of websites to be more exhaustive).
## Usage
It open each ones of those websites (just the homepage) in a web browser, and record the domains of the network requests the page makes.
It then find the DNS redirections of those domains, and compare with regexes of known tracking domains.
It finally outputs the matching ones.
Remember you can get an already generated and up-to-date list of first-party trackers from [here](https://hostfiles.frogeye.fr).
## Requirements
The following is for the people wanting to build their own list.
Just to build the list, you can find an already-built list in the releases.
### Requirements
- Bash
- Python 3.4+
- Firefox
- Selenium
- seleniumwire
- dnspython
Depending on the sources you'll be using to generate the list, you'll need to install some of the following:
## Contributing
- [Bash](https://www.gnu.org/software/bash/bash.html)
- [Coreutils](https://www.gnu.org/software/coreutils/)
- [Gawk](https://www.gnu.org/software/gawk/)
- [curl](https://curl.haxx.se)
- [pv](http://www.ivarch.com/programs/pv.shtml)
- [Python 3.4+](https://www.python.org/)
- [coloredlogs](https://pypi.org/project/coloredlogs/) (sorry I can't help myself)
- [numpy](https://www.numpy.org/)
- [python-abp](https://pypi.org/project/python-abp/) (only if you intend to use AdBlock rules as a rule source)
- [massdns](https://github.com/blechschmidt/massdns) in your `$PATH` (only if you have subdomains as a source)
- [Firefox](https://www.mozilla.org/firefox/) (only if you have websites as a source)
- [selenium (Python bindings)](https://pypi.python.org/pypi/selenium) (only if you have websites as a source)
- [selenium-wire](https://pypi.org/project/selenium-wire/) (only if you have websites as a source)
- [markdown2](https://pypi.org/project/markdown2/) (only if you intend to generate the index webpage)
### Adding websites
### Create a new database
Just add them to `websites.list`.
The so-called database (in the form of `blocking.p`) is a file storing all the matching entities (ASN, IPs, hostnames, zones…) and every entity leading to it.
It exists because the list cannot be generated in one pass, as DNS redirections chain links do not have to be inputed in order.
### Adding first-party trackers regex
You can purge of old records the database by running `./prune.sh`.
When you remove a source of data, remove its corresponding file in `last_updates` to fix the pruning process.
Just add them to `regexes.py`.
### Gather external sources
External sources are not stored in this repository.
You'll need to fetch them by running `./fetch_resources.sh`.
Those include:
- Third-party trackers lists
- TLD lists (used to test the validity of hostnames)
- List of public DNS resolvers (for DNS resolving from subdomains)
- Top 1M subdomains
### Import rules into the database
You need to put the lists of rules for matching in the different subfolders:
- `rules`: Lists of DNS zones
- `rules_ip`: Lists of IP networks (for IP addresses append `/32`)
- `rules_asn`: Lists of Autonomous Systems numbers (IP ranges will be deducted from them)
- `rules_adblock`: Lists of DNS zones, but in the form of AdBlock lists (only the ones concerning domains will be extracted)
- `rules_hosts`: Lists of DNS zones, but in the form of hosts lists
See the provided examples for syntax.
In each folder:
- `first-party.ext` will be the only files considered for the first-party variant of the list
- `*.cache.ext` are from external sources, and thus might be deleted / overwrote
- `*.custom.ext` are for sources that you don't want commited
Then, run `./import_rules.sh`.
If you removed rules and you want to remove every record depending on those rules immediately,
run the following command:
```
./db.py --prune --prune-before "$(cat "last_updates/rules.txt")" --prune-base
```
### Add subdomains
If you plan to resolve DNS records yourself (as the DNS records datasets are not exhaustive),
the top 1M subdomains provided might not be enough.
You can add them into the `subdomains` folder.
It follows the same specificities as the rules folder for `*.cache.ext` and `*.custom.ext` files.
#### Add personal sources
Adding your own browsing history will help create a more suited subdomains list.
Here's reference command for possible sources:
- **Pi-hole**: `sqlite3 /etc/pihole-FTL.db "select distinct domain from queries" > /path/to/eulaurarien/subdomains/my-pihole.custom.list`
- **Firefox**: `cp ~/.mozilla/firefox/<your_profile>.default/places.sqlite temp; sqlite3 temp "select distinct rev_host from moz_places" | rev | sed 's|^\.||' > /path/to/eulaurarien/subdomains/my-firefox.custom.list; rm temp`
#### Collect subdomains from websites
You can add the websites URLs into the `websites` folder.
It follows the same specificities as the rules folder for `*.cache.ext` and `*.custom.ext` files.
Then, run `collect_subdomain.sh`.
This is a long step, and might be memory-intensive from time to time.
> **Note:** For first-party tracking, a list of subdomains issued from the websites in the repository is avaliable here: <https://hostfiles.frogeye.fr/from_websites.cache.list>
### Resolve DNS records
Once you've added subdomains, you'll need to resolve them to get their DNS records.
The program will use a list of public nameservers to do that, but you can add your own in the `nameservers` directory.
Then, run `./resolve_subdomains.sh`.
Note that this is a network intensive process, not in term of bandwith, but in terms of packet number.
> **Note:** Some VPS providers might detect this as a DDoS attack and cut the network access.
> Some Wi-Fi connections can be rendered unusable for other uses, some routers might cease to work.
> Since massdns does not support yet rate limiting, my best bet was a Raspberry Pi with a slow ethernet link (Raspberry Pi < 4).
The DNS records will automatically be imported into the database.
If you want to re-import the records without re-doing the resolving, just run the last line of the `./resolve_subdomains.sh` script.
### Export the lists
For the tracking list, use `./export_lists.sh`, the output will be in the `dist` folder (please change the links before distributing them).
For other purposes, tinker with the `./export.py` program.
#### Explanations
Note that if you created an `explanations` folder at the root of the project, a file with a timestamp will be created in it.
It contains every rule in the database and the reason of their presence (i.e. their dependency).
This might be useful to track changes between runs.
Every rule has an associated tag with four components:
1. A number: the level of the rule (1 if it is a rule present in the `rules*` folders)
2. A letter: `F` if first-party, `M` if multi-party.
3. A letter: `D` if a dupplicate (e.g. `foo.bar.com` if `*.bar.com` is already a rule), `_` if not.
4. A number: the number of rules relying on this one
### Generate the index webpage
This is the one served on <https://hostfiles.frogeye.fr>.
Just run `./generate_index.py`.
### Everything
Once you've made sure every step runs fine, you can use `./eulaurarien.sh` to run every step consecutively.

59
adblock_to_domain_list.py Executable file
View file

@ -0,0 +1,59 @@
#!/usr/bin/env python3
# pylint: disable=C0103
"""
Extract the domains to block as a whole
from a AdBlock rules list.
"""
import argparse
import sys
import typing
import abp.filters
def get_domains(rule: abp.filters.parser.Filter) -> typing.Iterable[str]:
if rule.options:
return
selector_type = rule.selector["type"]
selector_value = rule.selector["value"]
if (
selector_type == "url-pattern"
and selector_value.startswith("||")
and selector_value.endswith("^")
):
yield selector_value[2:-1]
if __name__ == "__main__":
# Parsing arguments
parser = argparse.ArgumentParser(
description="Extract whole domains from an AdBlock blocking list"
)
parser.add_argument(
"-i",
"--input",
type=argparse.FileType("r"),
default=sys.stdin,
help="Input file with AdBlock rules",
)
parser.add_argument(
"-o",
"--output",
type=argparse.FileType("w"),
default=sys.stdout,
help="Outptut file with one rule tracking subdomain per line",
)
args = parser.parse_args()
# Reading rules
rules = abp.filters.parse_filterlist(args.input)
# Filtering
for rule in rules:
if not isinstance(rule, abp.filters.parser.Filter):
continue
for domain in get_domains(rule):
print(domain, file=args.output)

View file

@ -1,4 +1,5 @@
#!/usr/bin/env python3
# pylint: disable=C0103
"""
From a list of URLs, output the subdomains
@ -8,9 +9,33 @@ accessed by the websites.
import sys
import typing
import urllib.parse
import time
import progressbar
import selenium.webdriver.firefox.options
import seleniumwire.webdriver
import logging
log = logging.getLogger("cs")
DRIVER = None
SCROLL_TIME = 10.0
SCROLL_STEPS = 100
SCROLL_CMD = f"window.scrollBy(0,document.body.scrollHeight/{SCROLL_STEPS})"
def new_driver() -> seleniumwire.webdriver.browser.Firefox:
profile = selenium.webdriver.FirefoxProfile()
profile.set_preference("privacy.trackingprotection.enabled", False)
profile.set_preference("network.cookie.cookieBehavior", 0)
profile.set_preference("privacy.trackingprotection.pbmode.enabled", False)
profile.set_preference("privacy.trackingprotection.cryptomining.enabled", False)
profile.set_preference("privacy.trackingprotection.fingerprinting.enabled", False)
options = selenium.webdriver.firefox.options.Options()
# options.add_argument('-headless')
driver = seleniumwire.webdriver.Firefox(
profile, executable_path="geckodriver", options=options
)
return driver
def subdomain_from_url(url: str) -> str:
@ -26,22 +51,47 @@ def collect_subdomains(url: str) -> typing.Iterable[str]:
Load an URL into an headless browser and return all the domains
it tried to access.
"""
options = selenium.webdriver.firefox.options.Options()
options.add_argument('-headless')
driver = seleniumwire.webdriver.Firefox(
executable_path='geckodriver', options=options)
global DRIVER
if not DRIVER:
DRIVER = new_driver()
driver.get(url)
for request in driver.requests:
if request.response:
yield subdomain_from_url(request.path)
driver.close()
try:
DRIVER.get(url)
for s in range(SCROLL_STEPS):
DRIVER.execute_script(SCROLL_CMD)
time.sleep(SCROLL_TIME / SCROLL_STEPS)
for request in DRIVER.requests:
if request.response:
yield subdomain_from_url(request.path)
except Exception:
log.exception("Error")
DRIVER.quit()
DRIVER = None
if __name__ == '__main__':
for line in sys.stdin:
line = line.strip()
if not line:
continue
for subdomain in collect_subdomains(line):
print(subdomain)
def collect_subdomains_standalone(url: str) -> None:
url = url.strip()
if not url:
return
for subdomain in collect_subdomains(url):
print(subdomain)
if __name__ == "__main__":
assert len(sys.argv) <= 2
filename = None
if len(sys.argv) == 2 and sys.argv[1] != "-":
filename = sys.argv[1]
num_lines = sum(1 for line in open(filename))
iterator = progressbar.progressbar(open(filename), max_value=num_lines)
else:
iterator = sys.stdin
for line in iterator:
collect_subdomains_standalone(line)
if DRIVER:
DRIVER.quit()
if filename:
iterator.close()

11
collect_subdomains.sh Executable file
View file

@ -0,0 +1,11 @@
#!/usr/bin/env bash
function log() {
echo -e "\033[33m$@\033[0m"
}
# Get all subdomains accessed by each website in the website list
cat websites/*.list | sort -u > temp/all_websites.list
./collect_subdomains.py temp/all_websites.list > temp/subdomains_from_websites.list
sort -u temp/subdomains_from_websites.list > subdomains/from_websites.cache.list

799
database.py Normal file
View file

@ -0,0 +1,799 @@
#!/usr/bin/env python3
"""
Utility functions to interact with the database.
"""
import typing
import time
import logging
import coloredlogs
import pickle
import numpy
import math
import os
TLD_LIST: typing.Set[str] = set()
coloredlogs.install(level="DEBUG", fmt="%(asctime)s %(name)s %(levelname)s %(message)s")
Asn = int
Timestamp = int
Level = int
class Path:
pass
class RulePath(Path):
def __str__(self) -> str:
return "(rule)"
class RuleFirstPath(RulePath):
def __str__(self) -> str:
return "(first-party rule)"
class RuleMultiPath(RulePath):
def __str__(self) -> str:
return "(multi-party rule)"
class DomainPath(Path):
def __init__(self, parts: typing.List[str]):
self.parts = parts
def __str__(self) -> str:
return "?." + Database.unpack_domain(self)
class HostnamePath(DomainPath):
def __str__(self) -> str:
return Database.unpack_domain(self)
class ZonePath(DomainPath):
def __str__(self) -> str:
return "*." + Database.unpack_domain(self)
class AsnPath(Path):
def __init__(self, asn: Asn):
self.asn = asn
def __str__(self) -> str:
return Database.unpack_asn(self)
class Ip4Path(Path):
def __init__(self, value: int, prefixlen: int):
self.value = value
self.prefixlen = prefixlen
def __str__(self) -> str:
return Database.unpack_ip4network(self)
class Match:
def __init__(self) -> None:
self.source: typing.Optional[Path] = None
self.updated: int = 0
self.dupplicate: bool = False
# Cache
self.level: int = 0
self.first_party: bool = False
self.references: int = 0
def active(self, first_party: bool = None) -> bool:
if self.updated == 0 or (first_party and not self.first_party):
return False
return True
def disable(self) -> None:
self.updated = 0
class AsnNode(Match):
def __init__(self) -> None:
Match.__init__(self)
self.name = ""
class DomainTreeNode:
def __init__(self) -> None:
self.children: typing.Dict[str, DomainTreeNode] = dict()
self.match_zone = Match()
self.match_hostname = Match()
class IpTreeNode(Match):
def __init__(self) -> None:
Match.__init__(self)
self.zero: typing.Optional[IpTreeNode] = None
self.one: typing.Optional[IpTreeNode] = None
Node = typing.Union[DomainTreeNode, IpTreeNode, AsnNode]
MatchCallable = typing.Callable[[Path, Match], typing.Any]
class Profiler:
def __init__(self) -> None:
do_profile = int(os.environ.get("PROFILE", "0"))
if do_profile:
self.log = logging.getLogger("profiler")
self.time_last = time.perf_counter()
self.time_step = "init"
self.time_dict: typing.Dict[str, float] = dict()
self.step_dict: typing.Dict[str, int] = dict()
self.enter_step = self.enter_step_real
self.profile = self.profile_real
else:
self.enter_step = self.enter_step_dummy
self.profile = self.profile_dummy
def enter_step_dummy(self, name: str) -> None:
return
def enter_step_real(self, name: str) -> None:
now = time.perf_counter()
try:
self.time_dict[self.time_step] += now - self.time_last
self.step_dict[self.time_step] += int(name != self.time_step)
except KeyError:
self.time_dict[self.time_step] = now - self.time_last
self.step_dict[self.time_step] = 1
self.time_step = name
self.time_last = time.perf_counter()
def profile_dummy(self) -> None:
return
def profile_real(self) -> None:
self.enter_step("profile")
total = sum(self.time_dict.values())
for key, secs in sorted(self.time_dict.items(), key=lambda t: t[1]):
times = self.step_dict[key]
self.log.debug(
f"{key:<20}: {times:9d} × {secs/times:5.3e} "
f"= {secs:9.2f} s ({secs/total:7.2%}) "
)
self.log.debug(
f"{'total':<20}: " f"{total:9.2f} s ({1:7.2%})"
)
class Database(Profiler):
VERSION = 18
PATH = "blocking.p"
def initialize(self) -> None:
self.log.warning("Creating database version: %d ", Database.VERSION)
# Dummy match objects that everything refer to
self.rules: typing.List[Match] = list()
for first_party in (False, True):
m = Match()
m.updated = 1
m.level = 0
m.first_party = first_party
self.rules.append(m)
self.domtree = DomainTreeNode()
self.asns: typing.Dict[Asn, AsnNode] = dict()
self.ip4tree = IpTreeNode()
def load(self) -> None:
self.enter_step("load")
try:
with open(self.PATH, "rb") as db_fdsec:
version, data = pickle.load(db_fdsec)
if version == Database.VERSION:
self.rules, self.domtree, self.asns, self.ip4tree = data
return
self.log.warning(
"Outdated database version found: %d, " "it will be rebuilt.",
version,
)
except (TypeError, AttributeError, EOFError):
self.log.error(
"Corrupt (or heavily outdated) database found, " "it will be rebuilt."
)
except FileNotFoundError:
pass
self.initialize()
def save(self) -> None:
self.enter_step("save")
with open(self.PATH, "wb") as db_fdsec:
data = self.rules, self.domtree, self.asns, self.ip4tree
pickle.dump((self.VERSION, data), db_fdsec)
self.profile()
def __init__(self) -> None:
Profiler.__init__(self)
self.log = logging.getLogger("db")
self.load()
self.ip4cache_shift: int = 32
self.ip4cache = numpy.ones(1)
def _set_ip4cache(self, path: Path, _: Match) -> None:
assert isinstance(path, Ip4Path)
self.enter_step("set_ip4cache")
mini = path.value >> self.ip4cache_shift
maxi = (path.value + 2 ** (32 - path.prefixlen)) >> self.ip4cache_shift
if mini == maxi:
self.ip4cache[mini] = True
else:
self.ip4cache[mini:maxi] = True
def fill_ip4cache(self, max_size: int = 512 * 1024 ** 2) -> None:
"""
Size in bytes
"""
if max_size > 2 ** 32 / 8:
self.log.warning(
"Allocating more than 512 MiB of RAM for "
"the Ip4 cache is not necessary."
)
max_cache_width = int(math.log2(max(1, max_size * 8)))
allocated = False
cache_width = min(32, max_cache_width)
while not allocated:
cache_size = 2 ** cache_width
try:
self.ip4cache = numpy.zeros(cache_size, dtype=bool)
except MemoryError:
self.log.exception("Could not allocate cache. Retrying a smaller one.")
cache_width -= 1
continue
allocated = True
self.ip4cache_shift = 32 - cache_width
for _ in self.exec_each_ip4(self._set_ip4cache):
pass
@staticmethod
def populate_tld_list() -> None:
with open("temp/all_tld.list", "r") as tld_fdesc:
for tld in tld_fdesc:
tld = tld.strip()
TLD_LIST.add(tld)
@staticmethod
def validate_domain(path: str) -> bool:
if len(path) > 255:
return False
splits = path.split(".")
if not TLD_LIST:
Database.populate_tld_list()
if splits[-1] not in TLD_LIST:
return False
for split in splits:
if not 1 <= len(split) <= 63:
return False
return True
@staticmethod
def pack_domain(domain: str) -> DomainPath:
return DomainPath(domain.split(".")[::-1])
@staticmethod
def unpack_domain(domain: DomainPath) -> str:
return ".".join(domain.parts[::-1])
@staticmethod
def pack_asn(asn: str) -> AsnPath:
asn = asn.upper()
if asn.startswith("AS"):
asn = asn[2:]
return AsnPath(int(asn))
@staticmethod
def unpack_asn(asn: AsnPath) -> str:
return f"AS{asn.asn}"
@staticmethod
def validate_ip4address(path: str) -> bool:
splits = path.split(".")
if len(splits) != 4:
return False
for split in splits:
try:
if not 0 <= int(split) <= 255:
return False
except ValueError:
return False
return True
@staticmethod
def pack_ip4address_low(address: str) -> int:
addr = 0
for split in address.split("."):
octet = int(split)
addr = (addr << 8) + octet
return addr
@staticmethod
def pack_ip4address(address: str) -> Ip4Path:
return Ip4Path(Database.pack_ip4address_low(address), 32)
@staticmethod
def unpack_ip4address(address: Ip4Path) -> str:
addr = address.value
assert address.prefixlen == 32
octets: typing.List[int] = list()
octets = [0] * 4
for o in reversed(range(4)):
octets[o] = addr & 0xFF
addr >>= 8
return ".".join(map(str, octets))
@staticmethod
def validate_ip4network(path: str) -> bool:
# A bit generous but ok for our usage
splits = path.split("/")
if len(splits) != 2:
return False
if not Database.validate_ip4address(splits[0]):
return False
try:
if not 0 <= int(splits[1]) <= 32:
return False
except ValueError:
return False
return True
@staticmethod
def pack_ip4network(network: str) -> Ip4Path:
address, prefixlen_str = network.split("/")
prefixlen = int(prefixlen_str)
addr = Database.pack_ip4address(address)
addr.prefixlen = prefixlen
return addr
@staticmethod
def unpack_ip4network(network: Ip4Path) -> str:
addr = network.value
octets: typing.List[int] = list()
octets = [0] * 4
for o in reversed(range(4)):
octets[o] = addr & 0xFF
addr >>= 8
return ".".join(map(str, octets)) + "/" + str(network.prefixlen)
def get_match(self, path: Path) -> Match:
if isinstance(path, RuleMultiPath):
return self.rules[0]
elif isinstance(path, RuleFirstPath):
return self.rules[1]
elif isinstance(path, AsnPath):
return self.asns[path.asn]
elif isinstance(path, DomainPath):
dicd = self.domtree
for part in path.parts:
dicd = dicd.children[part]
if isinstance(path, HostnamePath):
return dicd.match_hostname
elif isinstance(path, ZonePath):
return dicd.match_zone
else:
raise ValueError
elif isinstance(path, Ip4Path):
dici = self.ip4tree
for i in range(31, 31 - path.prefixlen, -1):
bit = (path.value >> i) & 0b1
dici_next = dici.one if bit else dici.zero
if not dici_next:
raise IndexError
dici = dici_next
return dici
else:
raise ValueError
def exec_each_asn(
self,
callback: MatchCallable,
) -> typing.Any:
for asn in self.asns:
match = self.asns[asn]
if match.active():
c = callback(
AsnPath(asn),
match,
)
try:
yield from c
except TypeError: # not iterable
pass
def exec_each_domain(
self,
callback: MatchCallable,
_dic: DomainTreeNode = None,
_par: DomainPath = None,
) -> typing.Any:
_dic = _dic or self.domtree
_par = _par or DomainPath([])
if _dic.match_hostname.active():
c = callback(
HostnamePath(_par.parts),
_dic.match_hostname,
)
try:
yield from c
except TypeError: # not iterable
pass
if _dic.match_zone.active():
c = callback(
ZonePath(_par.parts),
_dic.match_zone,
)
try:
yield from c
except TypeError: # not iterable
pass
for part in _dic.children:
dic = _dic.children[part]
yield from self.exec_each_domain(
callback, _dic=dic, _par=DomainPath(_par.parts + [part])
)
def exec_each_ip4(
self,
callback: MatchCallable,
_dic: IpTreeNode = None,
_par: Ip4Path = None,
) -> typing.Any:
_dic = _dic or self.ip4tree
_par = _par or Ip4Path(0, 0)
if _dic.active():
c = callback(
_par,
_dic,
)
try:
yield from c
except TypeError: # not iterable
pass
# 0
pref = _par.prefixlen + 1
dic = _dic.zero
if dic:
# addr0 = _par.value & (0xFFFFFFFF ^ (1 << (32-pref)))
# assert addr0 == _par.value
addr0 = _par.value
yield from self.exec_each_ip4(callback, _dic=dic, _par=Ip4Path(addr0, pref))
# 1
dic = _dic.one
if dic:
addr1 = _par.value | (1 << (32 - pref))
# assert addr1 != _par.value
yield from self.exec_each_ip4(callback, _dic=dic, _par=Ip4Path(addr1, pref))
def exec_each(
self,
callback: MatchCallable,
) -> typing.Any:
yield from self.exec_each_domain(callback)
yield from self.exec_each_ip4(callback)
yield from self.exec_each_asn(callback)
def update_references(self) -> None:
# Should be correctly calculated normally,
# keeping this just in case
def reset_references_cb(path: Path, match: Match) -> None:
match.references = 0
for _ in self.exec_each(reset_references_cb):
pass
def increment_references_cb(path: Path, match: Match) -> None:
if match.source:
source = self.get_match(match.source)
source.references += 1
for _ in self.exec_each(increment_references_cb):
pass
def _clean_deps(self) -> None:
# Disable the matches that depends on the targeted
# matches until all disabled matches reference count = 0
did_something = True
def clean_deps_cb(path: Path, match: Match) -> None:
nonlocal did_something
if not match.source:
return
source = self.get_match(match.source)
if not source.active():
self._unset_match(match)
elif match.first_party > source.first_party:
match.first_party = source.first_party
else:
return
did_something = True
while did_something:
did_something = False
self.enter_step("pass_clean_deps")
for _ in self.exec_each(clean_deps_cb):
pass
def prune(self, before: int, base_only: bool = False) -> None:
# Disable the matches targeted
def prune_cb(path: Path, match: Match) -> None:
if base_only and match.level > 1:
return
if match.updated > before:
return
self._unset_match(match)
self.log.debug("Print: disabled %s", path)
self.enter_step("pass_prune")
for _ in self.exec_each(prune_cb):
pass
self._clean_deps()
# Remove branches with no match
# TODO
def explain(self, path: Path) -> str:
match = self.get_match(path)
string = str(path)
if isinstance(match, AsnNode):
string += f" ({match.name})"
party_char = "F" if match.first_party else "M"
dup_char = "D" if match.dupplicate else "_"
string += f" {match.level}{party_char}{dup_char}{match.references}"
if match.source:
string += f"{self.explain(match.source)}"
return string
def list_records(
self,
first_party_only: bool = False,
end_chain_only: bool = False,
no_dupplicates: bool = False,
rules_only: bool = False,
hostnames_only: bool = False,
explain: bool = False,
) -> typing.Iterable[str]:
def export_cb(path: Path, match: Match) -> typing.Iterable[str]:
if first_party_only and not match.first_party:
return
if end_chain_only and match.references > 0:
return
if no_dupplicates and match.dupplicate:
return
if rules_only and match.level > 1:
return
if hostnames_only and not isinstance(path, HostnamePath):
return
if explain:
yield self.explain(path)
else:
yield str(path)
yield from self.exec_each(export_cb)
def count_records(
self,
first_party_only: bool = False,
end_chain_only: bool = False,
no_dupplicates: bool = False,
rules_only: bool = False,
hostnames_only: bool = False,
) -> str:
memo: typing.Dict[str, int] = dict()
def count_records_cb(path: Path, match: Match) -> None:
if first_party_only and not match.first_party:
return
if end_chain_only and match.references > 0:
return
if no_dupplicates and match.dupplicate:
return
if rules_only and match.level > 1:
return
if hostnames_only and not isinstance(path, HostnamePath):
return
try:
memo[path.__class__.__name__] += 1
except KeyError:
memo[path.__class__.__name__] = 1
for _ in self.exec_each(count_records_cb):
pass
split: typing.List[str] = list()
for key, value in sorted(memo.items(), key=lambda s: s[0]):
split.append(f"{key[:-4].lower()}s: {value}")
return ", ".join(split)
def get_domain(self, domain_str: str) -> typing.Iterable[DomainPath]:
self.enter_step("get_domain_pack")
domain = self.pack_domain(domain_str)
self.enter_step("get_domain_brws")
dic = self.domtree
depth = 0
for part in domain.parts:
if dic.match_zone.active():
self.enter_step("get_domain_yield")
yield ZonePath(domain.parts[:depth])
self.enter_step("get_domain_brws")
if part not in dic.children:
return
dic = dic.children[part]
depth += 1
if dic.match_zone.active():
self.enter_step("get_domain_yield")
yield ZonePath(domain.parts)
if dic.match_hostname.active():
self.enter_step("get_domain_yield")
yield HostnamePath(domain.parts)
def get_ip4(self, ip4_str: str) -> typing.Iterable[Path]:
self.enter_step("get_ip4_pack")
ip4val = self.pack_ip4address_low(ip4_str)
self.enter_step("get_ip4_cache")
if not self.ip4cache[ip4val >> self.ip4cache_shift]:
return
self.enter_step("get_ip4_brws")
dic = self.ip4tree
for i in range(31, -1, -1):
bit = (ip4val >> i) & 0b1
if dic.active():
self.enter_step("get_ip4_yield")
yield Ip4Path(ip4val >> (i + 1) << (i + 1), 31 - i)
self.enter_step("get_ip4_brws")
next_dic = dic.one if bit else dic.zero
if next_dic is None:
return
dic = next_dic
if dic.active():
self.enter_step("get_ip4_yield")
yield Ip4Path(ip4val, 32)
def _unset_match(
self,
match: Match,
) -> None:
match.disable()
if match.source:
source_match = self.get_match(match.source)
source_match.references -= 1
def _set_match(
self,
match: Match,
updated: int,
source: Path,
source_match: Match = None,
dupplicate: bool = False,
) -> None:
# source_match is in parameters because most of the time
# its parent function needs it too,
# so it can pass it to save a traversal
source_match = source_match or self.get_match(source)
new_level = source_match.level + 1
if (
updated > match.updated
or new_level < match.level
or source_match.first_party > match.first_party
):
# NOTE FP and level of matches referencing this one
# won't be updated until run or prune
if match.source:
old_source = self.get_match(match.source)
old_source.references -= 1
match.updated = updated
match.level = new_level
match.first_party = source_match.first_party
match.source = source
source_match.references += 1
match.dupplicate = dupplicate
def _set_domain(
self, hostname: bool, domain_str: str, updated: int, source: Path
) -> None:
self.enter_step("set_domain_val")
if not Database.validate_domain(domain_str):
raise ValueError(f"Invalid domain: {domain_str}")
self.enter_step("set_domain_pack")
domain = self.pack_domain(domain_str)
self.enter_step("set_domain_fp")
source_match = self.get_match(source)
is_first_party = source_match.first_party
self.enter_step("set_domain_brws")
dic = self.domtree
dupplicate = False
for part in domain.parts:
if part not in dic.children:
dic.children[part] = DomainTreeNode()
dic = dic.children[part]
if dic.match_zone.active(is_first_party):
dupplicate = True
if hostname:
match = dic.match_hostname
else:
match = dic.match_zone
self._set_match(
match,
updated,
source,
source_match=source_match,
dupplicate=dupplicate,
)
def set_hostname(self, *args: typing.Any, **kwargs: typing.Any) -> None:
self._set_domain(True, *args, **kwargs)
def set_zone(self, *args: typing.Any, **kwargs: typing.Any) -> None:
self._set_domain(False, *args, **kwargs)
def set_asn(self, asn_str: str, updated: int, source: Path) -> None:
self.enter_step("set_asn")
path = self.pack_asn(asn_str)
if path.asn in self.asns:
match = self.asns[path.asn]
else:
match = AsnNode()
self.asns[path.asn] = match
self._set_match(
match,
updated,
source,
)
def _set_ip4(self, ip4: Ip4Path, updated: int, source: Path) -> None:
self.enter_step("set_ip4_fp")
source_match = self.get_match(source)
is_first_party = source_match.first_party
self.enter_step("set_ip4_brws")
dic = self.ip4tree
dupplicate = False
for i in range(31, 31 - ip4.prefixlen, -1):
bit = (ip4.value >> i) & 0b1
next_dic = dic.one if bit else dic.zero
if next_dic is None:
next_dic = IpTreeNode()
if bit:
dic.one = next_dic
else:
dic.zero = next_dic
dic = next_dic
if dic.active(is_first_party):
dupplicate = True
self._set_match(
dic,
updated,
source,
source_match=source_match,
dupplicate=dupplicate,
)
self._set_ip4cache(ip4, dic)
def set_ip4address(
self, ip4address_str: str, *args: typing.Any, **kwargs: typing.Any
) -> None:
self.enter_step("set_ip4add_val")
if not Database.validate_ip4address(ip4address_str):
raise ValueError(f"Invalid ip4address: {ip4address_str}")
self.enter_step("set_ip4add_pack")
ip4 = self.pack_ip4address(ip4address_str)
self._set_ip4(ip4, *args, **kwargs)
def set_ip4network(
self, ip4network_str: str, *args: typing.Any, **kwargs: typing.Any
) -> None:
self.enter_step("set_ip4net_val")
if not Database.validate_ip4network(ip4network_str):
raise ValueError(f"Invalid ip4network: {ip4network_str}")
self.enter_step("set_ip4net_pack")
ip4 = self.pack_ip4network(ip4network_str)
self._set_ip4(ip4, *args, **kwargs)

54
db.py Executable file
View file

@ -0,0 +1,54 @@
#!/usr/bin/env python3
import argparse
import database
import time
import os
if __name__ == "__main__":
# Parsing arguments
parser = argparse.ArgumentParser(description="Database operations")
parser.add_argument(
"-i", "--initialize", action="store_true", help="Reconstruct the whole database"
)
parser.add_argument(
"-p", "--prune", action="store_true", help="Remove old entries from database"
)
parser.add_argument(
"-b",
"--prune-base",
action="store_true",
help="With --prune, only prune base rules "
"(the ones added by ./feed_rules.py)",
)
parser.add_argument(
"-s",
"--prune-before",
type=int,
default=(int(time.time()) - 60 * 60 * 24 * 31 * 6),
help="With --prune, only rules updated before "
"this UNIX timestamp will be deleted",
)
parser.add_argument(
"-r",
"--references",
action="store_true",
help="DEBUG: Update the reference count",
)
args = parser.parse_args()
if not args.initialize:
DB = database.Database()
else:
if os.path.isfile(database.Database.PATH):
os.unlink(database.Database.PATH)
DB = database.Database()
DB.enter_step("main")
if args.prune:
DB.prune(before=args.prune_before, base_only=args.prune_base)
if args.references:
DB.update_references()
DB.save()

2
dist/.gitignore vendored Normal file
View file

@ -0,0 +1,2 @@
*.txt
*.html

114
dist/README.md vendored Normal file
View file

@ -0,0 +1,114 @@
# Geoffrey Frogeye's block list of first-party trackers
## What's a first-party tracker?
A tracker is a script put on many websites to gather informations about the visitor.
They can be used for multiple reasons: statistics, risk management, marketing, ads serving…
In any case, they are a threat to Internet users' privacy and many may want to block them.
Traditionnaly, trackers are served from a third-party.
For example, `website1.com` and `website2.com` both load their tracking script from `https://trackercompany.com/trackerscript.js`.
In order to block those, one can simply block the hostname `trackercompany.com`, which is what most ad blockers do.
However, to circumvent this block, tracker companies made the websites using them load trackers from `somestring.website1.com`.
The latter is a DNS redirection to `website1.trackercompany.com`, directly to an IP address belonging to the tracking company.
Those are called first-party trackers.
On top of aforementionned privacy issues, they also cause some security issue, as websites usually trust those scripts more.
For more information, learn about [Content Security Policy](https://developer.mozilla.org/en-US/docs/Web/HTTP/CSP), [same-origin policy](https://developer.mozilla.org/en-US/docs/Web/Security/Same-origin_policy) and [Cross-Origin Resource Sharing](https://enable-cors.org/).
In order to block those trackers, ad blockers would need to block every subdomain pointing to anything under `trackercompany.com` or to their network.
Unfortunately, most don't support those blocking methods as they are not DNS-aware, e.g. they only see `somestring.website1.com`.
This list is an inventory of every `somestring.website1.com` found to allow non DNS-aware ad blocker to still block first-party trackers.
### Learn more
- [CNAME Cloaking, the dangerous disguise of third-party trackers](https://medium.com/nextdns/cname-cloaking-the-dangerous-disguise-of-third-party-trackers-195205dc522a) from NextDNS
- [Trackers first-party](https://blog.imirhil.fr/2019/11/13/first-party-tracker.html) from Aeris, in french
- [uBlock Origin issue](https://github.com/uBlockOrigin/uBlock-issues/issues/780)
- [CNAME Cloaking and Bounce Tracking Defense](https://webkit.org/blog/11338/cname-cloaking-and-bounce-tracking-defense/) on WebKit's blog
- [Characterizing CNAME cloaking-based tracking](https://blog.apnic.net/2020/08/04/characterizing-cname-cloaking-based-tracking/) on APNIC's webiste
- [Characterizing CNAME Cloaking-Based Tracking on the Web](https://tma.ifip.org/2020/wp-content/uploads/sites/9/2020/06/tma2020-camera-paper66.pdf) is a research paper from Sokendai and ANSSI
## List variants
### First-party trackers
**Recommended for hostfiles-based ad blockers, such as [Pi-hole](https://pi-hole.net/) (&lt;v5.0, as it introduced CNAME blocking).**
**Recommended for Android ad blockers as applications, such ad [Blokada](https://blokada.org/).**
- Hosts file: <https://hostfiles.frogeye.fr/firstparty-trackers-hosts.txt>
- Raw list: <https://hostfiles.frogeye.fr/firstparty-trackers.txt>
This list contains every hostname redirecting to [a hand-picked list of first-party trackers](https://git.frogeye.fr/geoffrey/eulaurarien/src/branch/master/rules/first-party.list).
It should be safe from false-positives.
It also contains all tracking hostnames under company domains (e.g. `website1.trackercompany.com`),
useful for ad blockers that don't support mass regex blocking,
while still preventing fallback to third-party trackers.
Don't be afraid of the size of the list, as this is due to the nature of first-party trackers: a single tracker generates at least one hostname per client (typically two).
### First-party only trackers
**Recommended for ad blockers as web browser extensions, such as [uBlock Origin](https://ublockorigin.com/) (&lt;v1.25.0 or for Chromium-based browsers, as it introduced CNAME uncloaking for Firefox).**
- Hosts file: <https://hostfiles.frogeye.fr/firstparty-only-trackers-hosts.txt>
- Raw list: <https://hostfiles.frogeye.fr/firstparty-only-trackers.txt>
This is the same list as above, albeit not containing the hostnames under the tracking company domains (e.g. `website1.trackercompany.com`).
This allows for reducing the size of the list for ad-blockers that already block those third-party trackers with their support of regex blocking.
Use in conjunction with other block lists used in regex-mode, such as [Peter Lowe's](https://pgl.yoyo.org/adservers/)
### Multi-party trackers
- Hosts file: <https://hostfiles.frogeye.fr/multiparty-trackers-hosts.txt>
- Raw list: <https://hostfiles.frogeye.fr/multiparty-trackers.txt>
As first-party trackers usually evolve from third-party trackers, this list contains every hostname redirecting to trackers found in existing lists of third-party trackers (see next section).
Since the latter were not designed with first-party trackers in mind, they are likely to contain false-positives.
On the other hand, they might protect against first-party tracker that we're not aware of / have not yet confirmed.
#### Source of third-party trackers
- [EasyPrivacy](https://easylist.to/easylist/easyprivacy.txt)
- [AdGuard](https://github.com/AdguardTeam/AdguardFilters)
(yes there's only two for now. A lot of existing ones cause a lot of false positives)
### Multi-party only trackers
- Hosts file: <https://hostfiles.frogeye.fr/multiparty-only-trackers-hosts.txt>
- Raw list: <https://hostfiles.frogeye.fr/multiparty-only-trackers.txt>
This is the same list as above, albeit not containing the hostnames under the tracking company domains (e.g. `website1.trackercompany.com`).
This allows for reducing the size of the list for ad-blockers that already block those third-party trackers with their support of regex blocking.
Use in conjunction with other block lists used in regex-mode, such as the ones in the previous section.
## Meta
In case of false positives/negatives, or any other question contact me the way you like: <https://geoffrey.frogeye.fr>
The software used to generate this list is available here: <https://git.frogeye.fr/geoffrey/eulaurarien>
## Acknowledgements
Some of the first-party tracker included in this list have been found by:
- [Aeris](https://imirhil.fr/)
- NextDNS and [their blocklist](https://github.com/nextdns/cname-cloaking-blocklist)'s contributors
- Yuki2718 from [Wilders Security Forums](https://www.wilderssecurity.com/threads/ublock-a-lean-and-fast-blocker.365273/page-168#post-2880361)
- Ha Dao, Johan Mazel, and Kensuke Fukuda, ["Characterizing CNAME Cloaking-Based Tracking on the Web", Proceedings of IFIP/IEEE Traffic Measurement Analysis Conference (TMA), 9 pages, 2020.](https://tma.ifip.org/2020/wp-content/uploads/sites/9/2020/06/tma2020-camera-paper66.pdf)
- AdGuard and [their blocklist](https://github.com/AdguardTeam/cname-trackers)'s contributors
The list was generated using data from
- [Cisco Umbrella Popularity List](http://s3-us-west-1.amazonaws.com/umbrella-static/index.html)
- [Public DNS Server List](https://public-dns.info/)
Similar projects:
- [NextDNS blocklist](https://github.com/nextdns/cname-cloaking-blocklist): for DNS-aware ad blockers
- [Stefan Froberg's lists](https://www.orwell1984.today/cname/): subset of those lists grouped by tracker
- [AdGuard blocklist](https://github.com/AdguardTeam/cname-trackers): same thing with a bigger scope, maintained by a bigger team

2
dist/markdown7.min.css vendored Normal file
View file

@ -0,0 +1,2 @@
/* Source: https://github.com/jasonm23/markdown-css-themes */
body{font-family:Helvetica,arial,sans-serif;font-size:14px;line-height:1.6;padding-top:10px;padding-bottom:10px;background-color:#fff;padding:30px}body>:first-child{margin-top:0!important}body>:last-child{margin-bottom:0!important}a{color:#4183c4}a.absent{color:#c00}a.anchor{display:block;padding-left:30px;margin-left:-30px;cursor:pointer;position:absolute;top:0;left:0;bottom:0}h1,h2,h3,h4,h5,h6{margin:20px 0 10px;padding:0;font-weight:700;-webkit-font-smoothing:antialiased;cursor:text;position:relative}h1:hover a.anchor,h2:hover a.anchor,h3:hover a.anchor,h4:hover a.anchor,h5:hover a.anchor,h6:hover a.anchor{text-decoration:none}h1 code,h1 tt{font-size:inherit}h2 code,h2 tt{font-size:inherit}h3 code,h3 tt{font-size:inherit}h4 code,h4 tt{font-size:inherit}h5 code,h5 tt{font-size:inherit}h6 code,h6 tt{font-size:inherit}h1{font-size:28px;color:#000}h2{font-size:24px;border-bottom:1px solid #ccc;color:#000}h3{font-size:18px}h4{font-size:16px}h5{font-size:14px}h6{color:#777;font-size:14px}blockquote,dl,li,ol,p,pre,table,ul{margin:15px 0}hr{border:0 none;color:#ccc;height:4px;padding:0}body>h2:first-child{margin-top:0;padding-top:0}body>h1:first-child{margin-top:0;padding-top:0}body>h1:first-child+h2{margin-top:0;padding-top:0}body>h3:first-child,body>h4:first-child,body>h5:first-child,body>h6:first-child{margin-top:0;padding-top:0}a:first-child h1,a:first-child h2,a:first-child h3,a:first-child h4,a:first-child h5,a:first-child h6{margin-top:0;padding-top:0}h1 p,h2 p,h3 p,h4 p,h5 p,h6 p{margin-top:0}li p.first{display:inline-block}li{margin:0}ol,ul{padding-left:30px}ol :first-child,ul :first-child{margin-top:0}dl{padding:0}dl dt{font-size:14px;font-weight:700;font-style:italic;padding:0;margin:15px 0 5px}dl dt:first-child{padding:0}dl dt>:first-child{margin-top:0}dl dt>:last-child{margin-bottom:0}dl dd{margin:0 0 15px;padding:0 15px}dl dd>:first-child{margin-top:0}dl dd>:last-child{margin-bottom:0}blockquote{border-left:4px solid #ddd;padding:0 15px;color:#777}blockquote>:first-child{margin-top:0}blockquote>:last-child{margin-bottom:0}table{padding:0;border-collapse:collapse}table tr{border-top:1px solid #ccc;background-color:#fff;margin:0;padding:0}table tr:nth-child(2n){background-color:#f8f8f8}table tr th{font-weight:700;border:1px solid #ccc;margin:0;padding:6px 13px}table tr td{border:1px solid #ccc;margin:0;padding:6px 13px}table tr td :first-child,table tr th :first-child{margin-top:0}table tr td :last-child,table tr th :last-child{margin-bottom:0}img{max-width:100%}span.frame{display:block;overflow:hidden}span.frame>span{border:1px solid #ddd;display:block;float:left;overflow:hidden;margin:13px 0 0;padding:7px;width:auto}span.frame span img{display:block;float:left}span.frame span span{clear:both;color:#333;display:block;padding:5px 0 0}span.align-center{display:block;overflow:hidden;clear:both}span.align-center>span{display:block;overflow:hidden;margin:13px auto 0;text-align:center}span.align-center span img{margin:0 auto;text-align:center}span.align-right{display:block;overflow:hidden;clear:both}span.align-right>span{display:block;overflow:hidden;margin:13px 0 0;text-align:right}span.align-right span img{margin:0;text-align:right}span.float-left{display:block;margin-right:13px;overflow:hidden;float:left}span.float-left span{margin:13px 0 0}span.float-right{display:block;margin-left:13px;overflow:hidden;float:right}span.float-right>span{display:block;overflow:hidden;margin:13px auto 0;text-align:right}code,tt{margin:0 2px;padding:0 5px;white-space:nowrap;border:1px solid #eaeaea;background-color:#f8f8f8;border-radius:3px}pre code{margin:0;padding:0;white-space:pre;border:none;background:0 0}.highlight pre{background-color:#f8f8f8;border:1px solid #ccc;font-size:13px;line-height:19px;overflow:auto;padding:6px 10px;border-radius:3px}pre{background-color:#f8f8f8;border:1px solid #ccc;font-size:13px;line-height:19px;overflow:auto;padding:6px 10px;border-radius:3px}pre code,pre tt{background-color:transparent;border:none}sup{font-size:.83em;vertical-align:super;line-height:0}*{-webkit-print-color-adjust:exact}@media screen and (min-width:914px){body{width:854px;margin:0 auto}}@media print{pre,table{page-break-inside:avoid}pre{word-wrap:break-word}}

View file

@ -2,21 +2,13 @@
# Main script for eulaurarien
# Get all subdomains accessed by each website in the website list
cat websites.list | ./collect_subdomains.py > subdomains.list
sort -u subdomains.list > subdomains.sorted.list
[ ! -f .env ] && touch .env
# Filter out the subdomains not pointing to a first-party tracker
cat subdomains.sorted.list | ./filter_subdomains.py > toblock.list
sort -u toblock.list > toblock.sorted.list
./fetch_resources.sh
./collect_subdomains.sh
./import_rules.sh
./resolve_subdomains.sh
./prune.sh
./export_lists.sh
./generate_index.py
# Format the blocklist so it can be used as a hostlist
(
echo "# First party trackers"
echo "# List generated on $(date -Isec) by eulaurarian $(git describe --tags --dirty)"
cat toblock.sorted.list | while read host;
do
echo "0.0.0.0 $host"
done
) > toblock.hosts.list

91
export.py Executable file
View file

@ -0,0 +1,91 @@
#!/usr/bin/env python3
import database
import argparse
import sys
if __name__ == "__main__":
# Parsing arguments
parser = argparse.ArgumentParser(
description="Export the hostnames rules stored " "in the Database as plain text"
)
parser.add_argument(
"-o",
"--output",
type=argparse.FileType("w"),
default=sys.stdout,
help="Output file, one rule per line",
)
parser.add_argument(
"-f",
"--first-party",
action="store_true",
help="Only output rules issued from first-party sources",
)
parser.add_argument(
"-e",
"--end-chain",
action="store_true",
help="Only output rules that are not referenced by any other",
)
parser.add_argument(
"-r",
"--rules",
action="store_true",
help="Output all kinds of rules, not just hostnames",
)
parser.add_argument(
"-b",
"--base-rules",
action="store_true",
help="Output base rules "
"(the ones added by ./feed_rules.py) "
"(implies --rules)",
)
parser.add_argument(
"-d",
"--no-dupplicates",
action="store_true",
help="Do not output rules that already match a zone/network rule "
"(e.g. dummy.example.com when there's a zone example.com rule)",
)
parser.add_argument(
"-x",
"--explain",
action="store_true",
help="Show the chain of rules leading to one "
"(and the number of references they have)",
)
parser.add_argument(
"-c",
"--count",
action="store_true",
help="Show the number of rules per type instead of listing them",
)
args = parser.parse_args()
DB = database.Database()
if args.count:
assert not args.explain
print(
DB.count_records(
first_party_only=args.first_party,
end_chain_only=args.end_chain,
no_dupplicates=args.no_dupplicates,
rules_only=args.base_rules,
hostnames_only=not (args.rules or args.base_rules),
)
)
else:
for domain in DB.list_records(
first_party_only=args.first_party,
end_chain_only=args.end_chain,
no_dupplicates=args.no_dupplicates,
rules_only=args.base_rules,
hostnames_only=not (args.rules or args.base_rules),
explain=args.explain,
):
print(domain, file=args.output)

98
export_lists.sh Executable file
View file

@ -0,0 +1,98 @@
#!/usr/bin/env bash
function log() {
echo -e "\033[33m$@\033[0m"
}
log "Calculating statistics…"
oldest="$(cat last_updates/*.txt | sort -n | head -1)"
oldest_date=$(date -Isec -d @$oldest)
gen_date=$(date -Isec)
gen_software=$(git describe --tags)
number_websites=$(wc -l < temp/all_websites.list)
number_subdomains=$(wc -l < temp/all_subdomains.list)
number_dns=$(grep 'NOERROR' temp/all_resolved.txt | wc -l)
for partyness in {first,multi}
do
if [ $partyness = "first" ]
then
partyness_flags="--first-party"
else
partyness_flags=""
fi
rules_input=$(./export.py --count --base-rules $partyness_flags)
rules_found=$(./export.py --count --rules $partyness_flags)
rules_found_nd=$(./export.py --count --rules --no-dupplicates $partyness_flags)
echo
echo "Statistics for ${partyness}-party trackers"
echo "Input rules: $rules_input"
echo "Subsequent rules: $rules_found"
echo "Subsequent rules (no dupplicate): $rules_found_nd"
echo "Output hostnames: $(./export.py --count $partyness_flags)"
echo "Output hostnames (no dupplicate): $(./export.py --count --no-dupplicates $partyness_flags)"
echo "Output hostnames (end-chain only): $(./export.py --count --end-chain $partyness_flags)"
echo "Output hostnames (no dupplicate, end-chain only): $(./export.py --count --no-dupplicates --end-chain $partyness_flags)"
for trackerness in {trackers,only-trackers}
do
if [ $trackerness = "trackers" ]
then
trackerness_flags=""
else
trackerness_flags="--no-dupplicates"
fi
file_list="dist/${partyness}party-${trackerness}.txt"
file_host="dist/${partyness}party-${trackerness}-hosts.txt"
log "Generating lists for variant ${partyness}-party ${trackerness}"
# Real export heeere
./export.py $partyness_flags $trackerness_flags > $file_list
# Sometimes a bit heavy to have the DB open and sort the output
# so this is done in two steps
sort -u $file_list -o $file_list
rules_output=$(./export.py --count $partyness_flags $trackerness_flags)
(
echo "# First-party trackers host list"
echo "# Variant: ${partyness}-party ${trackerness}"
echo "#"
echo "# About first-party trackers: https://hostfiles.frogeye.fr/#whats-a-first-party-tracker"
echo "#"
echo "# In case of false positives/negatives, or any other question,"
echo "# contact me the way you like: https://geoffrey.frogeye.fr"
echo "#"
echo "# Latest versions and variants: https://hostfiles.frogeye.fr/#list-variants"
echo "# Source code: https://git.frogeye.fr/geoffrey/eulaurarien"
echo "# License: https://git.frogeye.fr/geoffrey/eulaurarien/src/branch/master/LICENSE"
echo "# Acknowledgements: https://hostfiles.frogeye.fr/#acknowledgements"
echo "#"
echo "# Generation software: eulaurarien $gen_software"
echo "# List generation date: $gen_date"
echo "# Oldest record: $oldest_date"
echo "# Number of source websites: $number_websites"
echo "# Number of source subdomains: $number_subdomains"
echo "# Number of source DNS records: $number_dns"
echo "#"
echo "# Input rules: $rules_input"
echo "# Subsequent rules: $rules_found"
echo "# … no dupplicates: $rules_found_nd"
echo "# Output rules: $rules_output"
echo "#"
echo
sed 's|^|0.0.0.0 |' "$file_list"
) > "$file_host"
done
done
if [ -d explanations ]
then
filename="$(date -Isec).txt"
./export.py --explain > "explanations/$filename"
ln --force --symbolic "$filename" "explanations/latest.txt"
fi

68
feed_asn.py Executable file
View file

@ -0,0 +1,68 @@
#!/usr/bin/env python3
import database
import argparse
import requests
import typing
import ipaddress
import logging
import time
IPNetwork = typing.Union[ipaddress.IPv4Network, ipaddress.IPv6Network]
def get_ranges(asn: str) -> typing.Iterable[str]:
req = requests.get(
"https://stat.ripe.net/data/as-routing-consistency/data.json",
params={"resource": asn},
)
data = req.json()
for pref in data["data"]["prefixes"]:
yield pref["prefix"]
def get_name(asn: str) -> str:
req = requests.get(
"https://stat.ripe.net/data/as-overview/data.json", params={"resource": asn}
)
data = req.json()
return data["data"]["holder"]
if __name__ == "__main__":
log = logging.getLogger("feed_asn")
# Parsing arguments
parser = argparse.ArgumentParser(
description="Add the IP ranges associated to the AS in the database"
)
args = parser.parse_args()
DB = database.Database()
def add_ranges(
path: database.Path,
match: database.Match,
) -> None:
assert isinstance(path, database.AsnPath)
assert isinstance(match, database.AsnNode)
asn_str = database.Database.unpack_asn(path)
DB.enter_step("asn_get_name")
name = get_name(asn_str)
match.name = name
DB.enter_step("asn_get_ranges")
for prefix in get_ranges(asn_str):
parsed_prefix: IPNetwork = ipaddress.ip_network(prefix)
if parsed_prefix.version == 4:
DB.set_ip4network(prefix, source=path, updated=int(time.time()))
log.info("Added %s from %s (%s)", prefix, path, name)
elif parsed_prefix.version == 6:
log.warning("Unimplemented prefix version: %s", prefix)
else:
log.error("Unknown prefix version: %s", prefix)
for _ in DB.exec_each_asn(add_ranges):
pass
DB.save()

251
feed_dns.py Executable file
View file

@ -0,0 +1,251 @@
#!/usr/bin/env python3
import argparse
import database
import logging
import sys
import typing
import multiprocessing
import time
Record = typing.Tuple[typing.Callable, typing.Callable, int, str, str]
# select, write
FUNCTION_MAP: typing.Any = {
"a": (
database.Database.get_ip4,
database.Database.set_hostname,
),
"cname": (
database.Database.get_domain,
database.Database.set_hostname,
),
"ptr": (
database.Database.get_domain,
database.Database.set_ip4address,
),
}
class Writer(multiprocessing.Process):
def __init__(
self,
recs_queue: multiprocessing.Queue = None,
autosave_interval: int = 0,
ip4_cache: int = 0,
):
if recs_queue: # MP
super(Writer, self).__init__()
self.recs_queue = recs_queue
self.log = logging.getLogger("wr")
self.autosave_interval = autosave_interval
self.ip4_cache = ip4_cache
if not recs_queue: # No MP
self.open_db()
def open_db(self) -> None:
self.db = database.Database()
self.db.log = logging.getLogger("wr")
self.db.fill_ip4cache(max_size=self.ip4_cache)
def exec_record(self, record: Record) -> None:
self.db.enter_step("exec_record")
select, write, updated, name, value = record
try:
for source in select(self.db, value):
write(self.db, name, updated, source=source)
except (ValueError, IndexError):
# ValueError: non-number in IP
# IndexError: IP too big
self.log.exception("Cannot execute: %s", record)
def end(self) -> None:
self.db.enter_step("end")
self.db.save()
def run(self) -> None:
self.open_db()
if self.autosave_interval > 0:
next_save = time.time() + self.autosave_interval
else:
next_save = 0
self.db.enter_step("block_wait")
block: typing.List[Record]
for block in iter(self.recs_queue.get, None):
assert block
record: Record
for record in block:
self.exec_record(record)
if next_save > 0 and time.time() > next_save:
self.log.info("Saving database...")
self.db.save()
self.log.info("Done!")
next_save = time.time() + self.autosave_interval
self.db.enter_step("block_wait")
self.end()
class Parser:
def __init__(
self,
buf: typing.Any,
recs_queue: multiprocessing.Queue = None,
block_size: int = 0,
writer: Writer = None,
):
assert bool(writer) ^ bool(block_size and recs_queue)
self.buf = buf
self.log = logging.getLogger("pr")
self.recs_queue = recs_queue
if writer: # No MP
self.prof: database.Profiler = writer.db
self.register = writer.exec_record
else: # MP
self.block: typing.List[Record] = list()
self.block_size = block_size
self.prof = database.Profiler()
self.prof.log = logging.getLogger("pr")
self.register = self.add_to_queue
def add_to_queue(self, record: Record) -> None:
self.prof.enter_step("register")
self.block.append(record)
if len(self.block) >= self.block_size:
self.prof.enter_step("put_block")
assert self.recs_queue
self.recs_queue.put(self.block)
self.block = list()
def run(self) -> None:
self.consume()
if self.recs_queue:
self.recs_queue.put(self.block)
self.prof.profile()
def consume(self) -> None:
raise NotImplementedError
class MassDnsParser(Parser):
# massdns --output Snrql
# --retry REFUSED,SERVFAIL --resolvers nameservers-ipv4
TYPES = {
"A": (FUNCTION_MAP["a"][0], FUNCTION_MAP["a"][1], -1, None),
# 'AAAA': (FUNCTION_MAP['aaaa'][0], FUNCTION_MAP['aaaa'][1], -1, None),
"CNAME": (FUNCTION_MAP["cname"][0], FUNCTION_MAP["cname"][1], -1, -1),
}
def consume(self) -> None:
self.prof.enter_step("parse_massdns")
timestamp = 0
header = True
for line in self.buf:
line = line[:-1]
if not line:
header = True
continue
split = line.split(" ")
try:
if header:
timestamp = int(split[1])
header = False
else:
select, write, name_offset, value_offset = MassDnsParser.TYPES[
split[1]
]
record = (
select,
write,
timestamp,
split[0][:name_offset].lower(),
split[2][:value_offset].lower(),
)
self.register(record)
self.prof.enter_step("parse_massdns")
except KeyError:
continue
PARSERS = {
"massdns": MassDnsParser,
}
if __name__ == "__main__":
# Parsing arguments
log = logging.getLogger("feed_dns")
args_parser = argparse.ArgumentParser(
description="Read DNS records and import "
"tracking-relevant data into the database"
)
args_parser.add_argument("parser", choices=PARSERS.keys(), help="Input format")
args_parser.add_argument(
"-i",
"--input",
type=argparse.FileType("r"),
default=sys.stdin,
help="Input file",
)
args_parser.add_argument(
"-b", "--block-size", type=int, default=1024, help="Performance tuning value"
)
args_parser.add_argument(
"-q", "--queue-size", type=int, default=128, help="Performance tuning value"
)
args_parser.add_argument(
"-a",
"--autosave-interval",
type=int,
default=900,
help="Interval to which the database will save in seconds. " "0 to disable.",
)
args_parser.add_argument(
"-s",
"--single-process",
action="store_true",
help="Only use one process. " "Might be useful for single core computers.",
)
args_parser.add_argument(
"-4",
"--ip4-cache",
type=int,
default=0,
help="RAM cache for faster IPv4 lookup. "
"Maximum useful value: 512 MiB (536870912). "
"Warning: Depending on the rules, this might already "
"be a memory-heavy process, even without the cache.",
)
args = args_parser.parse_args()
parser_cls = PARSERS[args.parser]
if args.single_process:
writer = Writer(
autosave_interval=args.autosave_interval, ip4_cache=args.ip4_cache
)
parser = parser_cls(args.input, writer=writer)
parser.run()
writer.end()
else:
recs_queue: multiprocessing.Queue = multiprocessing.Queue(
maxsize=args.queue_size
)
writer = Writer(
recs_queue,
autosave_interval=args.autosave_interval,
ip4_cache=args.ip4_cache,
)
writer.start()
parser = parser_cls(
args.input, recs_queue=recs_queue, block_size=args.block_size
)
parser.run()
recs_queue.put(None)
writer.join()

61
feed_rules.py Executable file
View file

@ -0,0 +1,61 @@
#!/usr/bin/env python3
import database
import argparse
import sys
import time
import typing
FUNCTION_MAP = {
"zone": database.Database.set_zone,
"hostname": database.Database.set_hostname,
"asn": database.Database.set_asn,
"ip4network": database.Database.set_ip4network,
"ip4address": database.Database.set_ip4address,
}
if __name__ == "__main__":
# Parsing arguments
parser = argparse.ArgumentParser(description="Import base rules to the database")
parser.add_argument(
"type", choices=FUNCTION_MAP.keys(), help="Type of rule inputed"
)
parser.add_argument(
"-i",
"--input",
type=argparse.FileType("r"),
default=sys.stdin,
help="File with one rule per line",
)
parser.add_argument(
"-f",
"--first-party",
action="store_true",
help="The input only comes from verified first-party sources",
)
args = parser.parse_args()
DB = database.Database()
fun = FUNCTION_MAP[args.type]
source: database.RulePath
if args.first_party:
source = database.RuleFirstPath()
else:
source = database.RuleMultiPath()
for rule in args.input:
rule = rule.strip()
try:
fun(
DB,
rule,
source=source,
updated=int(time.time()),
)
except ValueError:
DB.log.error(f"Could not add rule: {rule}")
DB.save()

45
fetch_resources.sh Executable file
View file

@ -0,0 +1,45 @@
#!/usr/bin/env bash
function log() {
echo -e "\033[33m$@\033[0m"
}
function dl() {
echo "Downloading $1 to $2"
curl --silent "$1" > "$2"
if [ $? -ne 0 ]
then
echo "Failed!"
fi
}
log "Retrieving tests…"
rm -f tests/*.cache.csv
dl https://raw.githubusercontent.com/fukuda-lab/cname_cloaking/master/Subdomain_CNAME-cloaking-based-tracking.csv temp/fukuda.csv
(echo "url,allow,deny,comment"; tail -n +2 temp/fukuda.csv | awk -F, '{ print "https://" $2 "/,," $3 "," $5 }') > tests/fukuda.cache.csv
log "Retrieving rules…"
rm -f rules*/*.cache.*
dl https://easylist.to/easylist/easyprivacy.txt rules_adblock/easyprivacy.cache.txt
dl https://filters.adtidy.org/extension/chromium/filters/3.txt rules_adblock/adguard.cache.txt
log "Retrieving TLD list…"
dl http://data.iana.org/TLD/tlds-alpha-by-domain.txt temp/all_tld.temp.list
grep -v '^#' temp/all_tld.temp.list | awk '{print tolower($0)}' > temp/all_tld.list
log "Retrieving nameservers…"
dl https://public-dns.info/nameservers.txt nameservers/public-dns.cache.list
log "Retrieving top subdomains…"
dl http://s3-us-west-1.amazonaws.com/umbrella-static/top-1m.csv.zip top-1m.csv.zip
unzip top-1m.csv.zip
sed 's|^[0-9]\+,||' top-1m.csv > temp/cisco-umbrella_popularity.fresh.list
rm top-1m.csv top-1m.csv.zip
if [ -f subdomains/cisco-umbrella_popularity.cache.list ]
then
cp subdomains/cisco-umbrella_popularity.cache.list temp/cisco-umbrella_popularity.old.list
pv -f temp/cisco-umbrella_popularity.old.list temp/cisco-umbrella_popularity.fresh.list | sort -u > subdomains/cisco-umbrella_popularity.cache.list
rm temp/cisco-umbrella_popularity.old.list temp/cisco-umbrella_popularity.fresh.list
else
mv temp/cisco-umbrella_popularity.fresh.list subdomains/cisco-umbrella_popularity.cache.list
fi

View file

@ -1,35 +0,0 @@
#!/usr/bin/env python3
"""
From a list of subdomains, output only
the ones resolving to a first-party tracker.
"""
import re
import sys
import dns.resolver
import regexes
def is_subdomain_matching(subdomain: str) -> bool:
"""
Indicates if the subdomain redirects to a first-party tracker.
"""
# TODO Look at the whole chain rather than the last one
query = dns.resolver.query(subdomain, 'A')
canonical = query.canonical_name.to_text()
for regex in regexes.REGEXES:
if re.match(regex, canonical):
return True
return False
if __name__ == '__main__':
for line in sys.stdin:
line = line.strip()
if not line:
continue
if is_subdomain_matching(line):
print(line)

25
generate_index.py Executable file
View file

@ -0,0 +1,25 @@
#!/usr/bin/env python3
import markdown2
extras = ["header-ids"]
with open("dist/README.md", "r") as fdesc:
body = markdown2.markdown(fdesc.read(), extras=extras)
output = f"""<!DOCTYPE html>
<html lang="en">
<head>
<title>Geoffrey Frogeye's block list of first-party trackers</title>
<meta charset="utf-8">
<meta name="author" content="Geoffrey 'Frogeye' Preud'homme" />
<link rel="stylesheet" type="text/css" href="markdown7.min.css">
</head>
<body>
{body}
</body>
</html>
"""
with open("dist/index.html", "w") as fdesc:
fdesc.write(output)

20
import_rules.sh Executable file
View file

@ -0,0 +1,20 @@
#!/usr/bin/env bash
function log() {
echo -e "\033[33m$@\033[0m"
}
log "Importing rules…"
date +%s > "last_updates/rules.txt"
cat rules_adblock/*.txt | grep -v '^!' | grep -v '^\[Adblock' | ./adblock_to_domain_list.py | ./feed_rules.py zone
cat rules_hosts/*.txt | grep -v '^#' | grep -v '^$' | cut -d ' ' -f2 | ./feed_rules.py zone
cat rules/*.list | grep -v '^#' | grep -v '^$' | ./feed_rules.py zone
cat rules_ip/*.txt | grep -v '^#' | grep -v '^$' | ./feed_rules.py ip4network
cat rules_asn/*.txt | grep -v '^#' | grep -v '^$' | ./feed_rules.py asn
cat rules/first-party.list | grep -v '^#' | grep -v '^$' | ./feed_rules.py zone --first-party
cat rules_ip/first-party.txt | grep -v '^#' | grep -v '^$' | ./feed_rules.py ip4network --first-party
cat rules_asn/first-party.txt | grep -v '^#' | grep -v '^$' | ./feed_rules.py asn --first-party
./feed_asn.py

1
last_updates/.gitignore vendored Normal file
View file

@ -0,0 +1 @@
*.txt

2
nameservers/.gitignore vendored Normal file
View file

@ -0,0 +1,2 @@
*.custom.list
*.cache.list

24
nameservers/popular.list Normal file
View file

@ -0,0 +1,24 @@
8.8.8.8
8.8.4.4
2001:4860:4860:0:0:0:0:8888
2001:4860:4860:0:0:0:0:8844
208.67.222.222
208.67.220.220
2620:119:35::35
2620:119:53::53
4.2.2.1
4.2.2.2
8.26.56.26
8.20.247.20
84.200.69.80
84.200.70.40
2001:1608:10:25:0:0:1c04:b12f
2001:1608:10:25:0:0:9249:d69b
9.9.9.10
149.112.112.10
2620:fe::10
2620:fe::fe:10
1.1.1.1
1.0.0.1
2606:4700:4700::1111
2606:4700:4700::1001

9
prune.sh Executable file
View file

@ -0,0 +1,9 @@
#!/usr/bin/env bash
function log() {
echo -e "\033[33m$@\033[0m"
}
oldest="$(cat last_updates/*.txt | sort -n | head -1)"
log "Pruning every record before ${oldest}"
./db.py --prune --prune-before "$oldest"

View file

@ -1,9 +0,0 @@
#!/usr/bin/env python3
"""
List of regex matching first-party trackers.
"""
REGEXES = [
r'^.+\.eulerian\.net\.$'
]

4
requirements.txt Normal file
View file

@ -0,0 +1,4 @@
coloredlogs>=10
markdown2>=2.4<3
numpy>=1.21<2
python-abp>=0.2<0.3

24
resolve_subdomains.sh Executable file
View file

@ -0,0 +1,24 @@
#!/usr/bin/env bash
source .env.default
source .env
function log() {
echo -e "\033[33m$@\033[0m"
}
log "Compiling nameservers…"
pv -f nameservers/*.list | ./validate_list.py --ip4 | sort -u > temp/all_nameservers_ip4.list
log "Compiling subdomains…"
# Sort by last character to utilize the DNS server caching mechanism
# (not as efficient with massdns but it's almost free so why not)
pv -f subdomains/*.list | ./validate_list.py --domain | rev | sort -u | rev > temp/all_subdomains.list
log "Resolving subdomain…"
date +%s > "last_updates/massdns.txt"
"$MASSDNS_BINARY" --output Snrql --hashmap-size "$MASSDNS_HASHMAP_SIZE" --resolvers temp/all_nameservers_ip4.list --outfile temp/all_resolved.txt temp/all_subdomains.list
log "Importing into database…"
[ $SINGLE_PROCESS -eq 1 ] && EXTRA_ARGS="--single-process"
pv -f temp/all_resolved.txt | ./feed_dns.py massdns --ip4-cache "$CACHE_SIZE" $EXTRA_ARGS

2
rules/.gitignore vendored Normal file
View file

@ -0,0 +1,2 @@
*.custom.list
*.cache.list

91
rules/first-party.list Normal file
View file

@ -0,0 +1,91 @@
# Eulerian
eulerian.net
# Xiti (AT Internet)
ati-host.net
at-o.net
# NP6
bp01.net
# Criteo
criteo.com
dnsdelegation.io
storetail.io
# Keyade
keyade.com
# Adobe Experience Cloud
# https://experienceleague.adobe.com/docs/analytics/implementation/vars/config-vars/trackingserversecure.html?lang=en#ssl-tracking-server-in-adobe-experience-platform-launch
omtrdc.net
2o7.net
data.adobedc.net
sc.adobedc.net
# Webtrekk
wt-eu02.net
webtrekk.net
# Otto Group
oghub.io
# Intent Media
partner.intentmedia.net
# Wizaly
wizaly.com
# Commanders Act
tagcommander.com
# Ingenious Technologies
affex.org
# TraceDock
a351fec2c318c11ea9b9b0a0ae18fb0b-1529426863.eu-central-1.elb.amazonaws.com
a5e652663674a11e997c60ac8a4ec150-1684524385.eu-central-1.elb.amazonaws.com
a88045584548111e997c60ac8a4ec150-1610510072.eu-central-1.elb.amazonaws.com
afc4d9aa2a91d11e997c60ac8a4ec150-2082092489.eu-central-1.elb.amazonaws.com
# A8
trck.a8.net
# AD EBiS
# https://prtimes.jp/main/html/rd/p/000000215.000009812.html
ebis.ne.jp
# GENIEE
genieesspv.jp
# SP-Prod
sp-prod.net
# Act-On Software
actonsoftware.com
actonservice.com
# eum-appdynamics.com
eum-appdynamics.com
# Extole
extole.io
extole.com
# Eloqua
hs.eloqua.com
# segment.com
xid.segment.com
# exponea.com
exponea.com
# adclear.net
adclear.net
# contentsfeed.com
contentsfeed.com
# postaffiliatepro.com
postaffiliatepro.com
# Sugar Market (Salesfusion)
msgapp.com
# Exactag
exactag.com
# GMO Internet Group
ad-cloud.jp
# Pardot
pardot.com
# Fathom
# https://usefathom.com/docs/settings/custom-domains
starman.fathomdns.com
# Lead Forensics
# https://www.reddit.com/r/pihole/comments/g7qv3e/leadforensics_tracking_domains_blacklist/
# No real-world data but the website doesn't hide what it does
ghochv3eng.trafficmanager.net
# Branch.io
thirdparty.bnc.lt
# Plausible.io
custom.plausible.io
# DataUnlocker
# Bit different as it is a proxy to non first-party trackers scripts
# but it fits I guess.
smartproxy.dataunlocker.com
# SAS
ci360.sas.com

2
rules_adblock/.gitignore vendored Normal file
View file

@ -0,0 +1,2 @@
*.custom.txt
*.cache.txt

2
rules_asn/.gitignore vendored Normal file
View file

@ -0,0 +1,2 @@
*.custom.txt
*.cache.txt

10
rules_asn/first-party.txt Normal file
View file

@ -0,0 +1,10 @@
# Eulerian
AS50234
# Criteo
AS44788
AS19750
AS55569
# Webtrekk
AS60164
# Act-On Software
AS393648

2
rules_hosts/.gitignore vendored Normal file
View file

@ -0,0 +1,2 @@
*.custom.txt
*.cache.txt

2
rules_ip/.gitignore vendored Normal file
View file

@ -0,0 +1,2 @@
*.custom.txt
*.cache.txt

75
run_tests.py Executable file
View file

@ -0,0 +1,75 @@
#!/usr/bin/env python3
import database
import os
import logging
import csv
TESTS_DIR = "tests"
if __name__ == "__main__":
DB = database.Database()
log = logging.getLogger("tests")
for filename in os.listdir(TESTS_DIR):
if not filename.lower().endswith(".csv"):
continue
log.info("")
log.info("Running tests from %s", filename)
path = os.path.join(TESTS_DIR, filename)
with open(path, "rt") as fdesc:
count_ent = 0
count_all = 0
count_den = 0
pass_ent = 0
pass_all = 0
pass_den = 0
reader = csv.DictReader(fdesc)
for test in reader:
log.debug("Testing %s (%s)", test["url"], test["comment"])
count_ent += 1
passed = True
for allow in test["allow"].split(":"):
if not allow:
continue
count_all += 1
if any(DB.get_domain(allow)):
log.error("False positive: %s", allow)
passed = False
else:
pass_all += 1
for deny in test["deny"].split(":"):
if not deny:
continue
count_den += 1
if not any(DB.get_domain(deny)):
log.error("False negative: %s", deny)
passed = False
else:
pass_den += 1
if passed:
pass_ent += 1
perc_ent = (100 * pass_ent / count_ent) if count_ent else 100
perc_all = (100 * pass_all / count_all) if count_all else 100
perc_den = (100 * pass_den / count_den) if count_den else 100
log.info(
(
"%s: Entries %d/%d (%.2f%%)"
" | Allow %d/%d (%.2f%%)"
"| Deny %d/%d (%.2f%%)"
),
filename,
pass_ent,
count_ent,
perc_ent,
pass_all,
count_all,
perc_all,
pass_den,
count_den,
perc_den,
)

2
subdomains/.gitignore vendored Normal file
View file

@ -0,0 +1,2 @@
*.custom.list
*.cache.list

3
temp/.gitignore vendored Normal file
View file

@ -0,0 +1,3 @@
*.list
*.txt
*.csv

1
tests/.gitignore vendored Normal file
View file

@ -0,0 +1 @@
*.cache.csv

View file

@ -0,0 +1,6 @@
url,allow,deny,comment
https://support.apple.com,support.apple.com,,EdgeKey / AkamaiEdge
https://www.pinterest.fr/,i.pinimg.com,,Cedexis
https://www.tumblr.com/,66.media.tumblr.com,,ChiCDN
https://www.skype.com/fr/,www.skype.com,,TrafficManager
https://www.mitsubishicars.com/,www.mitsubishicars.com,,Tracking domain as reverse DNS
1 url allow deny comment
2 https://support.apple.com support.apple.com EdgeKey / AkamaiEdge
3 https://www.pinterest.fr/ i.pinimg.com Cedexis
4 https://www.tumblr.com/ 66.media.tumblr.com ChiCDN
5 https://www.skype.com/fr/ www.skype.com TrafficManager
6 https://www.mitsubishicars.com/ www.mitsubishicars.com Tracking domain as reverse DNS

28
tests/first-party.csv Normal file
View file

@ -0,0 +1,28 @@
url,allow,deny,comment
https://www.red-by-sfr.fr/,static.s-sfr.fr,nrg.red-by-sfr.fr,Eulerian
https://www.cbc.ca/,,smetrics.cbc.ca,2o7 | Ominuture | Adobe Experience Cloud
https://www.mytoys.de/,,web.mytoys.de,Webtrekk
https://www.baur.de/,,tp.baur.de,Otto Group
https://www.liligo.com/,,compare.liligo.com,???
https://www.boulanger.com/,,tag.boulanger.fr,TagCommander
https://www.airfrance.fr/FR/,,tk.airfrance.fr,Wizaly
https://www.vsgamers.es/,,marketing.net.vsgamers.es,Affex
https://www.vacansoleil.fr/,,tdep.vacansoleil.fr,TraceDock
https://www.ozmall.co.jp/,,js.enhance.co.jp,GENIEE
https://www.thetimes.co.uk/,,cmp.thetimes.co.uk,SP-Prod
https://agilent.com/,,seahorseinfo.agilent.com,Act-On Software
https://halifax.co.uk/,,cem.halifax.co.uk,eum-appdynamics.com
https://www.reallygoodstuff.com/,,refer.reallygoodstuff.com,Extole
https://unity.com/,,eloqua-trackings.unity.com,Eloqua
https://www.notino.gr/,,api.campaigns.notino.com,Exponea
https://www.mytoys.de/,,0815.mytoys.de.adclear.net
https://www.imbc.com/,,ads.imbc.com.contentsfeed.com
https://www.cbdbiocare.com/,,affiliate.cbdbiocare.com,postaffiliatepro.com
https://www.seatadvisor.com/,,marketing.seatadvisor.com,Sugar Market (Salesfusion)
https://www.tchibo.de/,,tagm.tchibo.de,Exactag
https://www.bouygues-immobilier.com/,,go.bouygues-immobilier.fr,Pardot
https://caddyserver.com/,,mule.caddysever.com,Fathom
Reddit.com mail notifications,,click.redditmail.com,Branch.io
https://www.phpliveregex.com/,,yolo.phpliveregex.xom,Plausible.io
https://www.earthclassmail.com/,,1avhg3kanx9.www.earthclassmail.com,DataUnlocker
https://paulfredrick.com/,,execution-ci360.paulfredrick.com,SAS
Can't render this file because it has a wrong number of fields in line 18.

35
validate_list.py Executable file
View file

@ -0,0 +1,35 @@
#!/usr/bin/env python3
# pylint: disable=C0103
"""
Filter out invalid domain names
"""
import database
import argparse
import sys
if __name__ == '__main__':
# Parsing arguments
parser = argparse.ArgumentParser(
description="Filter out invalid domain name/ip addresses from a list.")
parser.add_argument(
'-i', '--input', type=argparse.FileType('r'), default=sys.stdin,
help="Input file, one element per line")
parser.add_argument(
'-o', '--output', type=argparse.FileType('w'), default=sys.stdout,
help="Output file, one element per line")
parser.add_argument(
'-d', '--domain', action='store_true',
help="Can be domain name")
parser.add_argument(
'-4', '--ip4', action='store_true',
help="Can be IP4")
args = parser.parse_args()
for line in args.input:
line = line[:-1].lower()
if (args.domain and database.Database.validate_domain(line)) or \
(args.ip4 and database.Database.validate_ip4address(line)):
print(line, file=args.output)

View file

@ -1,52 +0,0 @@
https://oui.sncf/
https://www.voyage-prive.com/
https://www.odalys-vacances.com/
https://www.homair.com/
https://www.melia.com/
https://www.locasun.fr/
https://www.belambra.fr/
http://www.xl.com/
https://www.bordeaux.aeroport.fr/
https://www.easyvoyage.com/
https://www.leon-de-bruxelles.fr/
https://www.sarenza.com/
https://www.laredoute.fr/
https://www.galerieslafayette.com/
https://www.celio.com/
https://vente-unique.com/
https://www.francoisesaget.com/
https://www.histoiredor.com/
https://www.brandalley.fr/
https://www.fleurancenature.fr/
https://www.chausport.com/
https://www.i-run.fr/
https://fr.smallable.com/
https://www.habitat.fr/
https://www.bhv.fr/
https://www.sfr.fr/
https://www.red-by-sfr.fr/
https://www.masmovil.es/
https://www.yoigo.com/
http://www.fnacdarty.com/
https://www.fnac.com/
https://www.darty.com/
http://www.e-leclerc.com/
https://www.monoprix.fr/
https://www.officedepot.fr/
https://www.carrefour-banque.fr/
https://www.banque-casino.fr/
https://mondial-assistance.fr/
https://allianz-voyage.fr/
https://www.bankia.com/
https://www.april-moto.com/
https://www.younited-credit.com/
https://www.fortuneo.fr/
https://www.orpi.com/
https://www.warnerbros.fr/
https://www.canalplus.com/
https://www.skiset.com/
https://www.promofarma.com/
https://www.toner.fr/
https://www.rentacar.fr/
https://vivatechnology.com/
https://www.liberation.fr/

1
websites/.gitignore vendored Normal file
View file

@ -0,0 +1 @@
*.custom.list

View file

@ -0,0 +1 @@
https://www.ubs.com/

View file

@ -0,0 +1,75 @@
http://ao.com/
https://www.asus.com/
http://www.absolut.com/
http://www.adobe.com/
http://www.afterbuzztv.com/
http://www.airbnb.com/
http://www.alliantcreditunion.org/
http://www.ankama-games.com/
http://www.attraqt.com/
http://www.audi.com/
http://www.autotrader.com/
http://www.bangkokbank.com/
http://www.banzai.it/
http://www.bestbuy.com/
http://www.bigfishgames.com/
http://www.bostonscientific.com/
http://www.radio-canada.ca/
https://www.cashflows.com/
http://www.concur.com/
http://www.chinesecio.com/
http://corporate.crownmedia.com/
https://watch.dazn.com/
http://www.disa.mil/
https://www.douglas.de/
http://www.ets.org/
http://www.easy-forex.com/
http://www.fiat.com/
http://www.fidor.com/
http://www.frankandoak.com/
http://www.fubo.tv/
https://corp.gree.net/
https://www.gymgrossisten.com/
http://www.halfpricedrapes.com/
https://www.hotstar.com/
https://www.iqiyi.com/
http://www.iracing.com/
http://www.mallgroup.com/
https://www.investisdigital.com/
https://www.linenchest.com/
https://www.luisaviaroma.com/
https://www.mcnc.org/
http://www.mauijim.com/
https://www.mediacorp.sg/
http://www.cr.mufg.jp/
http://www.nbcolympics.com/
https://www.ndtv.com/
http://www.nrcs.usda.gov/
http://www.oshean.org/
https://www.ocado.com/
http://www.ottogroup.com/
https://watch.dazn.com/
http://www.philips.com/
http://www.printplanet.de/
http://www.rabobank.com/
https://corp.roblox.com/
http://www.sinet.com.kh/
http://www.schneider.de/
https://thewest.com.au/
https://www.shopdirect.com/
http://www.siemens.com/
http://www.sky.it/
https://www.sc.com/
http://www.stylesha.re/
http://www.tv2.dk/
http://www.grammy.org/
https://www.topcon.co.jp/
http://www.usnews.com/
http://www.ubisoft.com/
http://www.unionbankph.com/
http://www.urbn.com/
http://www.waters.com/
https://www.xero.com/
https://www.esky.com/
https://www.iheartmedia.com/

View file

@ -0,0 +1,90 @@
https://www.rte.ie/
https://www.bbc.com/
https://www.saint-gobain.com/
https://www.sbb.ch/
http://www.rfi.fr/
https://www.france24.com/
https://www.mc-doualiya.com/
https://www.francemediasmonde.com/
https://www.kmmediagroup.co.uk/
https://www.europages.fr/
https://www.ovh.com/
http://www.sa.areva.com/
https://www.orano.group/
https://www.evaluate.com/
https://www.laposte.fr/
https://www.colissimo.fr/
https://www.nrjmobile.fr/
https://www.parisaeroport.fr/
https://www.michelin.fr/
https://www.groupeseb.com/
https://www.seb.fr/
https://www.corkinternationalairporthotel.com/
https://www.donedeal.ie/
https://rmc.bfmtv.com/
https://rmcsport.bfmtv.com/
https://www.mma.fr/
http://banquepopulaire.fr/
https://www.printempsfrance.com/
https://www.pagesjaunes.fr/
https://www.nocibe.fr/
https://e24.no/
https://www.01net.com/
https://www.europe1.fr/
https://www.meilleurtaux.com/
https://www.nexity.fr/
https://www.bestwestern.com/content/
https://www.allsuites-apparthotel.com/
https://www.apec.fr/
https://www.cadremploi.fr/
https://www.eni.com/
https://mappy.com/
https://www.arte.tv/
https://conseil-constitutionnel.fr/
https://www.lcl.fr/
https://www.axa.fr/
https://www.huffpost.com/
https://www.challenges.fr/
https://www.netto.fr/
https://www.boursorama-banque.com/
https://www.marianne.net/
https://www.mediapart.fr/
https://www.tifco.com/
https://www.thalys.com/
https://schibsted.com/
https://www.se.com/
https://www.gouvernement.fr/
https://www.afm-telethon.fr/
https://www.pneus-online.fr/
https://www.lepoint.fr/
http://www.e-leclerc.com/
https://www.logic-immo.com/
https://www.longchamp.com/
https://www.maaf.fr/
https://www.futuroscope.com/
https://www.infojobs.net/
https://www.intermarche.com/
https://www.supercasino.fr/
https://www.chronopost.fr/
https://www.cic.fr/
https://www.courrierinternational.com/
https://www.credit-agricole.fr/
https://www.telekom.com/
https://www.bfmtv.com/
https://www.caisse-epargne.fr/
https://www.calor.fr/
https://www.groupebayard.com/fr/
https://www.bayard-jeunesse.com/
https://www.radiofrance.fr/
https://www.liberation.fr/
https://www.nrj.fr/
https://www.lemonde.fr/
https://www.societegenerale.fr/
https://www.pole-emploi.fr/accueil/
https://www.tf1.fr/
https://www.leboncoin.fr/
https://groupebpce.com/
https://www.france.tv/
https://www.total.com/
http://www.lagardere.com/
https://rakuten.com/

View file

@ -0,0 +1,82 @@
http://www.dholic.co.jp/
https://materialesdefabrica.com/
https://www.lecreuset.com/
https://www.intersport.fr/
https://www.feiradamadrugadasp.com.br/
https://www.wetteronline.de/
https://www.wolfandbadger.com/
https://www.readers.com/
https://www.fossil.com/
https://www.gemo.fr/
https://www.burda-forward.de/
https://www.bakeca.it/
https://www.sarenza.com/
https://www.mytoys.com/
https://tour2000.co.kr
https://theluxurycloset.com/
https://www.lovebonito.com/
https://www.bever.nl/
https://www.shipt.com/
https://www.petermanningnyc.com/
https://www.fashionvalet.com/
https://remixshop.com/
https://lagirl.co.kr/
https://www.avva.com.tr/
https://www.stella.nl/
https://www.maiutazas.hu/
http://www.dynacraftwheels.com/
https://www.itaka.pl/
https://www.inveon.com.tr/
https://www.dr.com.tr/
http://www.lfmall.co.kr/
https://www.beymen.com/
https://www.reebok.com/
https://www.mlmparts.com/
https://www.flyin.com/
https://www.garantibbva.com.tr/
http://www.fiat.com.tr/
https://warburtons.co.uk/
http://www.shark.com/
https://www.latam.com/
https://agilone.com/
https://www.clarks.co.uk/
https://www.joom.com/
https://www.adjust.com/
https://www.tugo.com.vn/
https://www.tatacliq.com/
https://www.valmano.de/
https://www.ab-inbev.com/
https://www.sephora.com/
https://www.sephora.fr/
https://www.officedepot.com/
http://www.officedepot.eu/
https://www.officedepot.fr/
https://www.journey.com.tr/
https://group.jumia.com/
https://www.jumia.com.ng/
http://us.vibram.com/
http://eu.vibram.com/
https://sssports.com/
https://www.theiconic.com.au/
https://spiegel.media/
https://www.halfpricedrapes.com/
https://striderbikes.com/
https://www.promod.fr/
https://www.philips.com/
https://www.hp.com/
https://www.edmunds.com/
https://www.kkfashion.vn/
https://www.newlook.com/
https://www.fragrancenet.com/
https://www.microsoft.com/
https://xbox.com/
https://www.nykaa.com/
https://www.cheapoair.com/
https://www.diageo.com/
https://trimfit.com/
https://www.vax.co.uk/
https://www.laredoute.fr/
https://www.newlook.com/
https://www.softsurroundings.com/
https://www.ebay.fr/

View file

@ -0,0 +1,76 @@
https://www.liberation.fr/
https://www.brandalley.fr/
https://www.greenweez.com/
https://www.melijoe.com/eu/
http://www.laforet.com/
https://www.younited-credit.com/
https://www.mathon.fr/
https://destinia.com/
https://www.habitat.fr/
https://www.vente-unique.com/
https://www.deguisetoi.fr/
https://www.voyage-prive.it/login/index
https://www.madeindesign.com/
https://www.nrjmobile.fr/
https://en.smallable.com/
https://www.voyage-prive.es/login/index
https://www.voyage-prive.de/login/index
https://www.histoiredor.com/fr/histoire-or
https://www.maeva.com/fr-fr/
https://www.voyage-prive.co.uk/login/index
https://www.aujourdhui.com/
https://www.loisirsencheres.com/
https://www.consobaby.com/
https://www.rentacar.fr/
https://www.ugap.fr/
https://www.ponant.com/
https://www.voyage-prive.ch/login/index
https://www.auchantelecom.fr/
https://www.toner.fr/
https://fr.vente-unique.ch/
https://www.iahorro.com/
https://www.vente-unique.it/
https://www.millet.fr/
https://www.venta-unica.com/
https://www.photobox.de/
https://www.futuroscope.com/
https://warnerbros.fr/
https://destinia.ir/
https://www.vegaoo.de/
https://www.fleurancenature.fr/
https://www.palladiumhotelgroup.com/en/
https://www.dcrussia.ru/
https://www.homair.com/
https://www.moonpig.com.au/
https://www.casden.fr/
https://www.madeindesign.co.uk/
https://www.voyage-prive.be/login/index
https://www.vegaoo.es/
https://destinia.co.uk/
https://www.hofmann.pt/
https://www.roxy-russia.ru/
https://www.francoisesaget.com/fr/
https://www.skiset.com/
https://www.millet-mountain.com/
https://www.chausport.com/
https://www.unclejeans.com/
https://www.vegaooparty.com/
https://www.madeindesign.de/
https://www.vegaoo.nl/
https://www.boulangerie.org/
https://www.habitat.eu/
https://www.habitat.net/
https://www.lafrancedunordausud.fr/
https://www.lesnouvellesdelaboulangerie.fr/
https://www.natiloo.com/
https://wecanimal.pt/
https://www.habitatstore.no/no/
https://fr.vente-unique.be/
https://www.madeindesign.it/
https://piensoymascotas.com/
https://destinia.be/
https://www.skiset.co.uk/
http://www.sarenza.ch/
https://www.habitat.de/
https://www.skiset.de/
https://destinia.com.br/

View file

@ -0,0 +1,545 @@
https://01net.com/
https://1001neumaticos.es/
https://acadomia.fr/
https://access-moto.com/
https://achatdesign.com/
https://achatdesign.com/
https://achat-or.com/
https://achat-or-et-argent.fr/
https://admyjob.com/
https://adviso.ca/
https://aegon.es/
https://aeroplan.com/
https://aireuropa.com/
https://allianz-voyage.fr/
https://allrun.fr/
https://april-moto.com/
https://armandthiery.fr/
https://asapparis.com/
https://assurance-sante.com/
https://assurances-france-loisirs.com/
https://assurances-titulaires.com/
https://assurandme.fr/
https://assuronline.com/
https://assuronline.com/
https://auchantelecom.fr/
https://audi.fr/
https://audifrance.fr/
https://audika.com/
https://aureya.com/
https://avatacar.com/
https://ayads.co/
https://bankia.es/
https://bcassurance.fr/
https://bcfinance.fr/
https://bebloom.com/
https://beinsports.com/
https://belambra.com/
https://belambra.co.uk/
https://bernardtapie.com/
https://bfmtv.com/
https://bforbank.com/
https://blesscollectionhotels.com/
https://bongo.be/
https://bongo.nl/
https://brandalley.be/
https://brandalleybyme.fr/
https://brandalley.co.nl/
https://brandalley.de/
https://brandalley.es/
https://brandalley.it/
https://brookeo.fr/
https://caci-online.fr/
https://campagne-audition.fr/
https://campagnes-france.com/
https://capifrance.fr/
https://carrefour-banque.fr/
https://carrefour.com/
https://carrefour.fr/
https://cartecarburant.leclerc/
https://carventura.com/
https://catimini.com/
https://celio.com/
https://chausport.com/
https://ciblo.net/
https://citadium.com/
https://clubavantages.net/
https://coffrefortplus.com/
https://communaute3suisses.fr/
https://comprendrechoisir.com/
https://compteczam.fr/
https://comptoirdescotonniers.com/
https://comptoirdescotonniers.co.uk/
https://comptoirdescotonniers.de/
https://comptoirdescotonniers.es/
https://comptoirdescotonniers.eu/
https://conforama.es/
https://conforama.pt/
https://corporate.com/
https://corsair.ca/
https://corsair.ci/
https://corsair.fr/
https://corsair.gp/
https://corsair.mq/
https://corsair.re/
https://corsair.sn/
https://cossettetourisme.com/
https://cpa-france.org/
https://creditec.fr/
https://credithypo.com/
https://credit-pret-hypothecaire.com/
https://crossnutrition.com/
https://culture.leclerc/
https://darty.com/
https://dcshoes-europe.com/
https://deguisetoi.fr/
https://destinia.ad/
https://destinia.ae/
https://destinia.asia/
https://destinia.at/
https://destinia.be/
https://destinia.cat/
https://destinia.ch/
https://destinia.cl/
https://destinia.cn/
https://destinia.co/
https://destinia.co.il/
https://destinia.com/
https://destinia.com.ar/
https://destinia.com.au/
https://destinia.com.br/
https://destinia.com.eg/
https://destinia.com.pa/
https://destinia.com.tr/
https://destinia.com.ua/
https://destinia.co.no/
https://destinia.co.ro/
https://destinia.co.uk/
https://destinia.co.za/
https://destinia.cr/
https://destinia.cz/
https://destinia.de/
https://destinia.dk/
https://destinia.do/
https://destinia.ec/
https://destinia.fr/
https://destinia.gr/
https://destinia.gt/
https://destinia.hu/
https://destinia.ie/
https://destinia.in/
https://destinia.ir/
https://destinia.is/
https://destinia.it/
https://destinia.jp/
https://destinia.kr/
https://destinia.lt/
https://destinia.lv/
https://destinia.ly/
https://destinia.ma/
https://destinia.mx/
https://destinia.nl/
https://destinia.pe/
https://destinia.pl/
https://destinia.pt/
https://destinia.qa/
https://destinia.ru/
https://destinia.sa/
https://destinia.se/
https://destinia.sg/
https://destinia.sk/
https://destinia.tw/
https://destinia.us/
https://destinia.uy/
https://devialet.com/
https://devred.com/
https://diamant-unique.com/
https://dmp.leclerc/
https://doctipedia.fr/
https://drust.io/
https://eafit.com/
https://easyviaggio.com/
https://easyviajar.com/
https://easyvols.fr/
https://easyvoyage.com/
https://easyvoyage.com/
https://easyvoyage.co.uk/
https://easyvoyage.de/
https://e-cartecadeauleclerc.fr/
https://ecotour.com/
https://eider.com/
https://eidershop.com/
https://eldeseazo.com/
https://e.leclerc/
https://e-leclerc.com/
https://elstarprevention.com/
https://emalu-store.com/
https://etam.com/
https://etam.de/
https://etam.es/
https://eulerian.net/
https://eurotierce.be/
https://evaway.com/
https://evobanco.com/
https://ew3.io/
https://fax-via-internet.it/
https://fdj.fr/
https://fleurancenature.com/
https://fleurancenature.fr/
https://fnac.com/
https://fnac.es/
https://fnacspectacles.com/
https://fnactickets.com/
https://fonestarz.com/
https://fortuneo.fr/
https://franceloisirsvacances.com/
https://francoisesaget.be/
https://francoisesaget.be/
https://francoisesaget.com/
https://franziskasager.de/
https://franziskasager.de/
https://futuroscope.com/
https://futuroscope.mobi/
https://galerieslafayette.com/
https://gestion-assurances.com/
https://granions.fr/
https://grantalexander.com/
https://greenweez.com/
https://greenweez.co.uk/
https://greenweez.de/
https://greenweez.es/
https://greenweez.eu/
https://greenweez.it/
https://groupefsc.com/
https://habitat.de/
https://habitat.fr/
https://habitat.net/
https://hardrockhoteltenerife.com/
https://hipp.fr/
https://histoiredor.com/
https://hofmann.es/
https://hofmann.pt/
https://holidaycheck.fr/
https://homair.com/
https://hoteldeparismontecarlo.com/
https://hotelhermitagemontecarlo.com/
https://hotelsbarriere.com/
https://hrhibiza.com/
https://iahorro.com/
https://io1g.net/
https://iperceptions.com/
https://iperceptions.com/
https://iperceptions.com/
https://iperceptions.com/
https://i-run.fr/
https://jassuremamoto.fr/
https://jassure-ma-voiture-sans-permis.fr/
https://jassuremon3roues.fr/
https://jassuremonauto.fr/
https://jassuremon-camping-car.fr/
https://jassuremonscooter.fr/
https://kauf-unique.at/
https://kauf-unique.de/
https://kidiliz.com/
https://lafrancedunordausud.fr/
https://lafuma-boutique.com/
https://lafuma.com/
https://laredoute.fr/
https://laredoute.pt/
https://lavieimmo.com/
https://leclercbilletterie.com/
https://leclercdrive.fr/
https://leclercvoyages.com/
https://lenergiemoinscher.com/
https://leon-de-bruxelles.fr/
https://leregroupementdecredits.fr/
https://lesbonscommerces.fr/
https://lesbonsservices.fr/
https://leskidunordausud.fr/
https://lespagnedunordausud.fr/
https://lesselectionsskoda.fr/
https://lexpress.fr/
https://liberation.fr/
https://locasun.co.uk/
https://locasun.de/
https://locasun.es/
https://locasun.fr/
https://locasun.it/
https://locasun.nl/
https://locasun-vp.fr/
https://location.e-leclerc.com/
https://location.leclerc/
https://lotoquebec.com/
https://lotoquebec.com/
https://macave.leclerc/
https://madeindesign.ch/
https://madeindesign.com/
https://madeindesign.co.uk/
https://madeindesign.de/
https://madeindesign.it/
https://maeva.com/
https://magnetintell.com/
https://maisonetloisirs.leclerc/
https://masmovil.com/
https://masmovil.es/
https://matby.com/
https://mathon.fr/
https://megustaescribir.com/
https://megustaleer.com/
https://megustaleer.com.co/
https://megustaleer.com.pe/
https://melia.cn/
https://melia.com/
https://melijoe.com/
https://michelin.co.uk/
https://michelin.de/
https://michelin.es/
https://michelin.fr/
https://michelin.nl/
https://miliboo.be/
https://miliboo.ch/
https://miliboo.com/
https://miliboo.co.uk/
https://miliboo.de/
https://miliboo.es/
https://miliboo.it/
https://miliboo.lu/
https://millet.fr/
https://millet-mountain.ch/
https://millet-mountain.com/
https://millet-mountain.de/
https://miropapremama.es/
https://mistergatesdirect.com/
https://mistermenuiserie.com/
https://mis.tourisme-/
https://mixa.fr/
https://molet.com/
https://monalbumphoto.be/
https://monalbumphoto.fr/
https://mondial-assistance.fr/
https://monmedicament-enligne.fr/
https://monnierfreres.com/
https://monnierfreres.com/
https://monnierfreres.co.uk/
https://monnierfreres.de/
https://monnierfreres.eu/
https://monnierfreres.fr/
https://monnierfreres.it/
https://montecarloadvancepurchase.com/
https://montecarlobay.com/
https://monte-carlo-beach.com/
https://montecarlomeeting.com/
https://montecarlosbm-/
https://montecarlosbm.book-secure.com/
https://montecarlosbm.com/
https://montecarloseasonalsale.com/
https://montecarlovirtualtour.com/
https://montecarlowellness.com/
https://monteleone.fr/
https://montreal.org/
https://moonpig.com/
https://motorisationplus.com/
https://mouvement-leclerc.com/
https://mtl.org/
https://muchoviaje.com/
https://multimedia.e-leclerc.com/
https://musique.e-leclerc.com/
https://mydailyhotel.com/
https://myfirstdressing.com/
https://mywarner.warnerbros.fr/
https://natiloo.com/
https://net/
https://netvox-assurances.com/
https://nextseguros.es/
https://nomade-aventure.com/
https://no.photobox.com/
https://nrjmobile.fr/
https://numericable.fr/
https://numericable.fr/
https://numericable.tv/
https://odalys-vacances.com/
https://odalys-vacation-rental.com/
https://officedepot.fr/
https://oki-ni.com/
https://onestep-boutique.com/
https://onestep.fr/
https://oney.es/
https://online.carrefour.fr/
https://ooreka.fr/
https://ooshop.com/
https://optique.e-leclerc.com/
https://orpi.com/
https://oui.sncf/
https://outdoor4pro.com/
https://oxboworld.com/
https://oxbowshop.com/
https://palladiumhotelgroup.com/
https://parapharmacie.leclerc/
https://parapharmacie.leclerc/
https://parfumsclub.de/
https://partenaires-verisure.fr/
https://pcnphysio.com/
https://peachdi.com/
https://pepephone.com/
https://perfumesclub.com/
https://perfumesclub.co.uk/
https://perfumesclub.fr/
https://perfumesclub.it/
https://perfumesclub.nl/
https://perfumesclub.pl/
https://perfumesclub.pt/
https://petit-bateau.be/
https://petit-bateau.co.uk/
https://petit-bateau.de/
https://petit-bateau.fr/
https://petit-bateau.it/
https://peugeot-assurance.fr/
https://photobox.at/
https://photobox.be/
https://photobox.ca/
https://photobox.ch/
https://photobox.com.au/
https://photobox.co.nz/
https://photobox.co.uk/
https://photobox.de/
https://photobox.dk/
https://photobox.es/
https://photobox.fi/
https://photobox.fr/
https://photobox.ie/
https://photobox.it/
https://photobox.nl/
https://photobox.pl/
https://photobox.se/
https://photomoinscher.leclerc/
https://piensoymascotas.com/
https://placedestendances.com/
https://placement-direct.fr/
https://pmubrasil.com.br/
https://pmu.fr/
https://pmu.fr/
https://poeleaboismaison.com/
https://ponant.com/
https://pretunique.fr/
https://primes-energie.leclerc/
https://princessetamtam.com/
https://princessetamtam.co.uk/
https://princessetamtam.de/
https://princessetamtam.eu/
https://privateoutlet.com/
https://privateoutlet.de/
https://privateoutlet.es/
https://privateoutlet.fr/
https://privateoutlet.it/
https://produits-volumineux.e-leclerc.com/
https://promocionesfarma.com/
https://promofarma.com/
https://promovacances.com/
https://quiksilver.eu/
https://rachatdecredit.net/
https://radiateurplus.com/
https://rc.monalbumphoto.be/
https://rc.monalbumphoto.fr/
https://recherche.leclerc/
https://reglotv.e-leclerc.com/
https://rentacar.fr/
https://reunica.com/
https://roxy.eu/
https://rueducommerce.fr/
https://sadyr.es/
https://scooter-assurance.fr/
https://securitasdirect.fr/
https://seevibes.com/
https://sfr.fr/
https://silvergoldtobuy.com/
https://sisley-paris.com/
https://skiset-holidays.com/
https://skiset-holidays.co.uk/
https://skodafabia.fr/
https://skoda.fr/
https://skodasuperb.fr/
https://smallable.com/
https://sncf.com/
https://sport.leclerc/
https://sport.leclerc/
https://swatch.com/
https://swisslife-direct.fr/
https://tartine-et-chocolat.com/
https://tartine-et-chocolat.fr/
https://teamekosport.com/
https://telecommandeonline.com/
https://terrassesmontecarlosbm.com/
https://theushuaiaexperience.com/
https://to-lipton.com/
https://tongalumina.ca/
https://tool-fitness.com/
https://tool-fitness.es/
https://topsante.com/
https://toscane-boutique.fr/
https://tradingsat.com/
https://tremblant.ca/
https://vegaoo.com/
https://vegaoo.co.uk/
https://vegaoo.de/
https://vegaoo.es/
https://vegaoo.it/
https://vegaoo.nl/
https://vegaooparty.com/
https://vegaoopro.com/
https://vegaoo.pt/
https://venta-del-diablo.com/
https://venta-unica.com/
https://ventealapropriete.com/
https://vente-du-diable.com/
https://vente-en-or.com/
https://vente-unique.be/
https://vente-unique.ch/
https://vente-unique.com/
https://vente-unique.it/
https://vente-unique.lu/
https://vente-unique.nl/
https://vente-unique.pt/
https://verif.com/
https://verisure.fr/
https://vin.e-leclerc.com/
https://vip-jardin.com/
https://vivus.es/
https://voyage-prive.be/
https://voyage-prive.ch/
https://voyage-prive.com/
https://voyage-prive.co.uk/
https://voyage-prive.co.uk/
https://voyage-prive.de/
https://voyage-prive.es/
https://voyage-prive.es/
https://voyage-prive.it/
https://voyage-prive.it/
https://voyage-prive.nl/
https://voyage-prive.nl/
https://voyage-prive.pl/
https://voyages-sncf.com/
https://warnerbros.fr/
https://warnerbros.fr/
https://weareknitters.ch/
https://weareknitters.com/
https://weareknitters.co.uk/
https://weareknitters.de/
https://weareknitters.dk/
https://weareknitters.es/
https://weareknitters.fr/
https://weareknitters.it/
https://weareknitters.nl/
https://weareknitters.no/
https://weareknitters.pl/
https://weareknitters.se/
https://wecanimal.pt/
https://yoigo.com/
https://yoigo.es/
https://younited-credit.com/
https://zanzicar.fr/
https://zebestof.com/
https://z-enfant.com/
https://z-eshop.com/
https://zgeneration.com/
https://zive.fr/
https://zone-turf.fr/

View file

@ -0,0 +1 @@
https://red-by-sfr.fr

View file

@ -0,0 +1,20 @@
https://www.allianz.fr/
http://www.belambra.fr/
https://www.macif.fr/
https://www.butagaz.fr/
http://www.cartier.fr/
https://www.isilines.fr/
http://www.jaeger-lecoultre.com/
http://www.laredoute.fr/
https://www.lesfurets.com/
https://www.louvrehotels.com/
http://www.mars.com/
https://www.meetic.fr/
https://www.nikon.fr/
https://www.norauto.fr/
https://www.groupe-psa.com/
https://www.rueducommerce.fr/
https://www.transavia.com/
https://www.truffaut.com/
https://www.uniqlo.com/
https://www.vancleefarpels.com/

20
websites/np6_clients.list Normal file
View file

@ -0,0 +1,20 @@
https://www.harmonie-mutuelle.fr/
https://www.henkel.fr/
https://www.canalplus.com/
http://www.casino.fr/
https://www.alinea.com/
https://www.enedis.fr/
https://www.ubisoft.com/
https://perfectstaycom.zendesk.com/
https://www.perfectstay.com/
https://www.bricodepot.fr/
https://www.sfr.fr/
http://www.prismamedia.com/
https://www.odalys-vacances.com/
https://www.macif.fr/
https://www.cofinoga.fr/
https://www.boursorama-banque.com/
https://mabanque.bnpparibas/
https://www.oui.sncf/
https://www.younited-credit.com/