Generates a host list of first-party trackers for ad-blocking.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

115 lines
7.3 KiB

  1. # Geoffrey Frogeye's block list of first-party trackers
  2. ## What's a first-party tracker?
  3. A tracker is a script put on many websites to gather informations about the visitor.
  4. They can be used for multiple reasons: statistics, risk management, marketing, ads serving…
  5. In any case, they are a threat to Internet users' privacy and many may want to block them.
  6. Traditionnaly, trackers are served from a third-party.
  7. For example, `website1.com` and `website2.com` both load their tracking script from `https://trackercompany.com/trackerscript.js`.
  8. In order to block those, one can simply block the hostname `trackercompany.com`, which is what most ad blockers do.
  9. However, to circumvent this block, tracker companies made the websites using them load trackers from `somestring.website1.com`.
  10. The latter is a DNS redirection to `website1.trackercompany.com`, directly to an IP address belonging to the tracking company.
  11. Those are called first-party trackers.
  12. On top of aforementionned privacy issues, they also cause some security issue, as websites usually trust those scripts more.
  13. For more information, learn about [Content Security Policy](https://developer.mozilla.org/en-US/docs/Web/HTTP/CSP), [same-origin policy](https://developer.mozilla.org/en-US/docs/Web/Security/Same-origin_policy) and [Cross-Origin Resource Sharing](https://enable-cors.org/).
  14. In order to block those trackers, ad blockers would need to block every subdomain pointing to anything under `trackercompany.com` or to their network.
  15. Unfortunately, most don't support those blocking methods as they are not DNS-aware, e.g. they only see `somestring.website1.com`.
  16. This list is an inventory of every `somestring.website1.com` found to allow non DNS-aware ad blocker to still block first-party trackers.
  17. ### Learn more
  18. - [CNAME Cloaking, the dangerous disguise of third-party trackers](https://medium.com/nextdns/cname-cloaking-the-dangerous-disguise-of-third-party-trackers-195205dc522a) from NextDNS
  19. - [Trackers first-party](https://blog.imirhil.fr/2019/11/13/first-party-tracker.html) from Aeris, in french
  20. - [uBlock Origin issue](https://github.com/uBlockOrigin/uBlock-issues/issues/780)
  21. - [CNAME Cloaking and Bounce Tracking Defense](https://webkit.org/blog/11338/cname-cloaking-and-bounce-tracking-defense/) on WebKit's blog
  22. - [Characterizing CNAME cloaking-based tracking](https://blog.apnic.net/2020/08/04/characterizing-cname-cloaking-based-tracking/) on APNIC's webiste
  23. - [Characterizing CNAME Cloaking-Based Tracking on the Web](https://tma.ifip.org/2020/wp-content/uploads/sites/9/2020/06/tma2020-camera-paper66.pdf) is a research paper from Sokendai and ANSSI
  24. ## List variants
  25. ### First-party trackers
  26. **Recommended for hostfiles-based ad blockers, such as [Pi-hole](https://pi-hole.net/) (<v5.0, as it introduced CNAME blocking).**
  27. **Recommended for Android ad blockers as applications, such ad [Blokada](https://blokada.org/).**
  28. - Hosts file: <https://hostfiles.frogeye.fr/firstparty-trackers-hosts.txt>
  29. - Raw list: <https://hostfiles.frogeye.fr/firstparty-trackers.txt>
  30. This list contains every hostname redirecting to [a hand-picked list of first-party trackers](https://git.frogeye.fr/geoffrey/eulaurarien/src/branch/master/rules/first-party.list).
  31. It should be safe from false-positives.
  32. It also contains all tracking hostnames under company domains (e.g. `website1.trackercompany.com`),
  33. useful for ad blockers that don't support mass regex blocking,
  34. while still preventing fallback to third-party trackers.
  35. Don't be afraid of the size of the list, as this is due to the nature of first-party trackers: a single tracker generates at least one hostname per client (typically two).
  36. ### First-party only trackers
  37. **Recommended for ad blockers as web browser extensions, such as [uBlock Origin](https://ublockorigin.com/) (&lt;v1.25.0 or for Chromium-based browsers, as it introduced CNAME uncloaking for Firefox).**
  38. - Hosts file: <https://hostfiles.frogeye.fr/firstparty-only-trackers-hosts.txt>
  39. - Raw list: <https://hostfiles.frogeye.fr/firstparty-only-trackers.txt>
  40. This is the same list as above, albeit not containing the hostnames under the tracking company domains (e.g. `website1.trackercompany.com`).
  41. This allows for reducing the size of the list for ad-blockers that already block those third-party trackers with their support of regex blocking.
  42. Use in conjunction with other block lists used in regex-mode, such as [Peter Lowe's](https://pgl.yoyo.org/adservers/)
  43. ### Multi-party trackers
  44. - Hosts file: <https://hostfiles.frogeye.fr/multiparty-trackers-hosts.txt>
  45. - Raw list: <https://hostfiles.frogeye.fr/multiparty-trackers.txt>
  46. As first-party trackers usually evolve from third-party trackers, this list contains every hostname redirecting to trackers found in existing lists of third-party trackers (see next section).
  47. Since the latter were not designed with first-party trackers in mind, they are likely to contain false-positives.
  48. On the other hand, they might protect against first-party tracker that we're not aware of / have not yet confirmed.
  49. #### Source of third-party trackers
  50. - [EasyPrivacy](https://easylist.to/easylist/easyprivacy.txt)
  51. - [AdGuard](https://github.com/AdguardTeam/AdguardFilters)
  52. (yes there's only two for now. A lot of existing ones cause a lot of false positives)
  53. ### Multi-party only trackers
  54. - Hosts file: <https://hostfiles.frogeye.fr/multiparty-only-trackers-hosts.txt>
  55. - Raw list: <https://hostfiles.frogeye.fr/multiparty-only-trackers.txt>
  56. This is the same list as above, albeit not containing the hostnames under the tracking company domains (e.g. `website1.trackercompany.com`).
  57. This allows for reducing the size of the list for ad-blockers that already block those third-party trackers with their support of regex blocking.
  58. Use in conjunction with other block lists used in regex-mode, such as the ones in the previous section.
  59. ## Meta
  60. In case of false positives/negatives, or any other question contact me the way you like: <https://geoffrey.frogeye.fr>
  61. The software used to generate this list is available here: <https://git.frogeye.fr/geoffrey/eulaurarien>
  62. ## Acknowledgements
  63. Some of the first-party tracker included in this list have been found by:
  64. - [Aeris](https://imirhil.fr/)
  65. - NextDNS and [their blocklist](https://github.com/nextdns/cname-cloaking-blocklist)'s contributors
  66. - Yuki2718 from [Wilders Security Forums](https://www.wilderssecurity.com/threads/ublock-a-lean-and-fast-blocker.365273/page-168#post-2880361)
  67. - Ha Dao, Johan Mazel, and Kensuke Fukuda, ["Characterizing CNAME Cloaking-Based Tracking on the Web", Proceedings of IFIP/IEEE Traffic Measurement Analysis Conference (TMA), 9 pages, 2020.](https://tma.ifip.org/2020/wp-content/uploads/sites/9/2020/06/tma2020-camera-paper66.pdf)
  68. - AdGuard and [their blocklist](https://github.com/AdguardTeam/cname-trackers)'s contributors
  69. The list was generated using data from
  70. - [Rapid7 OpenData](https://opendata.rapid7.com/sonar.fdns_v2/), who kindly provided a free account
  71. - [Cisco Umbrella Popularity List](http://s3-us-west-1.amazonaws.com/umbrella-static/index.html)
  72. - [Public DNS Server List](https://public-dns.info/)
  73. Similar projects:
  74. - [NextDNS blocklist](https://github.com/nextdns/cname-cloaking-blocklist): for DNS-aware ad blockers
  75. - [Stefan Froberg's lists](https://www.orwell1984.today/cname/): subset of those lists grouped by tracker
  76. - [AdGuard blocklist](https://github.com/AdguardTeam/cname-trackers): same thing with a bigger scope, maintained by a bigger team