Compare commits

...

4 commits

7 changed files with 66 additions and 17 deletions

View file

@ -34,6 +34,7 @@ Depending on the sources you'll be using to generate the list, you'll need to in
- [Bash](https://www.gnu.org/software/bash/bash.html)
- [Coreutils](https://www.gnu.org/software/coreutils/)
- [Gawk](https://www.gnu.org/software/gawk/)
- [curl](https://curl.haxx.se)
- [pv](http://www.ivarch.com/programs/pv.shtml)
- [Python 3.4+](https://www.python.org/)

9
dist/README.md vendored
View file

@ -24,9 +24,12 @@ This list is an inventory of every `somestring.website1.com` found to allow non
### Learn more
- [CNAME Cloaking, the dangerous disguise of third-party trackers](https://medium.com/nextdns/cname-cloaking-the-dangerous-disguise-of-third-party-trackers-195205dc522a)
- [Trackers first-party](https://blog.imirhil.fr/2019/11/13/first-party-tracker.html) (french)
- [CNAME Cloaking, the dangerous disguise of third-party trackers](https://medium.com/nextdns/cname-cloaking-the-dangerous-disguise-of-third-party-trackers-195205dc522a) from NextDNS
- [Trackers first-party](https://blog.imirhil.fr/2019/11/13/first-party-tracker.html) from Aeris, in french
- [uBlock Origin issue](https://github.com/uBlockOrigin/uBlock-issues/issues/780)
- [CNAME Cloaking and Bounce Tracking Defense](https://webkit.org/blog/11338/cname-cloaking-and-bounce-tracking-defense/) on WebKit's blog
- [Characterizing CNAME cloaking-based tracking](https://blog.apnic.net/2020/08/04/characterizing-cname-cloaking-based-tracking/) on APNIC's webiste
- [Characterizing CNAME Cloaking-Based Tracking on the Web](https://tma.ifip.org/2020/wp-content/uploads/sites/9/2020/06/tma2020-camera-paper66.pdf) is a research paper from Sokendai and ANSSI
## List variants
@ -93,6 +96,7 @@ Some of the first-party tracker included in this list have been found by:
- [Aeris](https://imirhil.fr/)
- NextDNS and [their blocklist](https://github.com/nextdns/cname-cloaking-blocklist)'s contributors
- Yuki2718 from [Wilders Security Forums](https://www.wilderssecurity.com/threads/ublock-a-lean-and-fast-blocker.365273/page-168#post-2880361)
- Ha Dao, Johan Mazel, and Kensuke Fukuda, ["Characterizing CNAME Cloaking-Based Tracking on the Web", Proceedings of IFIP/IEEE Traffic Measurement Analysis Conference (TMA), 9 pages, 2020.](https://tma.ifip.org/2020/wp-content/uploads/sites/9/2020/06/tma2020-camera-paper66.pdf)
The list was generated using data from
@ -100,6 +104,7 @@ The list was generated using data from
- [Cisco Umbrella Popularity List](http://s3-us-west-1.amazonaws.com/umbrella-static/index.html)
- [Public DNS Server List](https://public-dns.info/)
Similar projects:
- [NextDNS blocklist](https://github.com/nextdns/cname-cloaking-blocklist): for DNS-aware ad blockers

View file

@ -13,10 +13,15 @@ function dl() {
fi
}
log "Retrieving tests…"
rm -f tests/*.cache.csv
dl https://raw.githubusercontent.com/fukuda-lab/cname_cloaking/master/Subdomain_CNAME-cloaking-based-tracking.csv temp/fukuda.csv
(echo "url,allow,deny,comment"; tail -n +2 temp/fukuda.csv | awk -F, '{ print "https://" $2 "/,," $3 "," $5 }') > tests/fukuda.cache.csv
log "Retrieving rules…"
rm -f rules*/*.cache.*
dl https://easylist.to/easylist/easyprivacy.txt rules_adblock/easyprivacy.cache.txt
dl https://filters.adtidy.org/extension/chromium/filters/3.txt rules_adblock/adguard.cache.txt
log "Retrieving TLD list…"
dl http://data.iana.org/TLD/tlds-alpha-by-domain.txt temp/all_tld.temp.list

View file

@ -5,30 +5,67 @@ import os
import logging
import csv
TESTS_DIR = 'tests'
TESTS_DIR = "tests"
if __name__ == '__main__':
if __name__ == "__main__":
DB = database.Database()
log = logging.getLogger('tests')
log = logging.getLogger("tests")
for filename in os.listdir(TESTS_DIR):
if not filename.lower().endswith(".csv"):
continue
log.info("")
log.info("Running tests from %s", filename)
path = os.path.join(TESTS_DIR, filename)
with open(path, 'rt') as fdesc:
with open(path, "rt") as fdesc:
count_ent = 0
count_all = 0
count_den = 0
pass_ent = 0
pass_all = 0
pass_den = 0
reader = csv.DictReader(fdesc)
for test in reader:
log.info("Testing %s (%s)", test['url'], test['comment'])
log.debug("Testing %s (%s)", test["url"], test["comment"])
count_ent += 1
passed = True
for white in test['white'].split(':'):
if not white:
for allow in test["allow"].split(":"):
if not allow:
continue
if any(DB.get_domain(white)):
log.error("False positive: %s", white)
count_all += 1
if any(DB.get_domain(allow)):
log.error("False positive: %s", allow)
passed = False
else:
pass_all += 1
for black in test['black'].split(':'):
if not black:
for deny in test["deny"].split(":"):
if not deny:
continue
if not any(DB.get_domain(black)):
log.error("False negative: %s", black)
count_den += 1
if not any(DB.get_domain(deny)):
log.error("False negative: %s", deny)
passed = False
else:
pass_den += 1
if passed:
pass_ent += 1
perc_ent = (100 * pass_ent / count_ent) if count_ent else 100
perc_all = (100 * pass_all / count_all) if count_all else 100
perc_den = (100 * pass_den / count_den) if count_den else 100
log.info(
"%s: Entries %d/%d (%.2f%%) | Allow %d/%d (%.2f%%) | Deny %d/%d (%.2f%%)",
filename,
pass_ent,
count_ent,
perc_ent,
pass_all,
count_all,
perc_all,
pass_den,
count_den,
perc_den,
)

1
tests/.gitignore vendored Normal file
View file

@ -0,0 +1 @@
*.cache.csv

View file

@ -1,4 +1,4 @@
url,white,black,comment
url,allow,deny,comment
https://support.apple.com,support.apple.com,,EdgeKey / AkamaiEdge
https://www.pinterest.fr/,i.pinimg.com,,Cedexis
https://www.tumblr.com/,66.media.tumblr.com,,ChiCDN

1 url white allow black deny comment
2 https://support.apple.com support.apple.com support.apple.com EdgeKey / AkamaiEdge
3 https://www.pinterest.fr/ i.pinimg.com i.pinimg.com Cedexis
4 https://www.tumblr.com/ 66.media.tumblr.com 66.media.tumblr.com ChiCDN

View file

@ -1,4 +1,4 @@
url,white,black,comment
url,allow,deny,comment
https://www.red-by-sfr.fr/,static.s-sfr.fr,nrg.red-by-sfr.fr,Eulerian
https://www.cbc.ca/,,smetrics.cbc.ca,2o7 | Ominuture | Adobe Experience Cloud
https://www.discover.com/,,content.discover.com,ThreatMetrix

1 url white allow black deny comment
2 https://www.red-by-sfr.fr/ static.s-sfr.fr nrg.red-by-sfr.fr Eulerian
3 https://www.cbc.ca/ smetrics.cbc.ca 2o7 | Ominuture | Adobe Experience Cloud
4 https://www.discover.com/ content.discover.com ThreatMetrix