Remove support for Rapid7

They changed their privacy / pricing model and as such I don't have access to their massive DNS dataset anymore, even after asking. Since 2022-01-02, I put the list on freeze while looking for an alternative, but couldn't find any. To make the list update again with the remaining DNS sources I have, I put the last version of the list generated with the Rapid7 dataset as an input for subdomains, that will now get resolved with MassDNS.
Add requirements.txt file
2022-11-13 20:10:27 +01:00 · 2022-02-26 13:01:11 +01:00 · 2021-08-28 20:58:34 +02:00 · 2021-08-22 18:02:37 +02:00 · 2021-08-22 17:07:25 +02:00 · 2021-08-22 16:53:58 +02:00
54 changed files with 3182 additions and 164 deletions
--- a/.env.default
+++ b/.env.default
@ -0,0 +1,5 @@
+CACHE_SIZE=536870912
+MASSDNS_HASHMAP_SIZE=1000
+PROFILE=0
+SINGLE_PROCESS=0
+MASSDNS_BINARY=massdns
--- a/.gitignore
+++ b/.gitignore
@ -1,3 +1,5 @@
-*.list
-!websites.list
 *.log
+*.p
+.env
+__pycache__
+explanations
--- a/21
+++ b/21
@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2019 Geoffrey 'Frogeye' Preud'homme
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
--- a/README.md
+++ b/README.md
@ -1,54 +1,162 @@
 # eulaurarien

-Generates a host list of first-party trackers for ad-blocking.
+This program is able to generate a list of every hostnames being a DNS redirection to a list of DNS zones and IP networks.

-**DISCLAIMER:** I'm by no way an expert on this subject so my vocabulary or other stuff might be wrong. Use at your own risk.
+It is primarilyy used to generate [Geoffrey Frogeye's block list of first-party trackers](https://hostfiles.frogeye.fr) (learn about first-party trackers by following this link).

-## What's a first-party tracker?
+If you want to contribute but don't want to create an account on this forge, contact me the way you like: <https://geoffrey.frogeye.fr>

-Traditionally, websites load trackers scripts directly.
-For example, `website1.com` and `website2.com` both load `https://trackercompany.com/trackerscript.js` to track their users.
-In order to block those, one can simply block the host `trackercompany.com`.
+## How does this work

-However, to circumvent this easy block, tracker companies made the website using them load trackers from `somethingirelevant.website1.com`.
-The latter being a DNS redirection to `website1.trackercompany.com`, directly pointing to a server serving the tracking script.
-Those are the first-party trackers.
+This program takes as input:

-Blocking `trackercompany.com` doesn't work any more, and blocking `*.trackercompany.com` isn't really possible since:
+- Lists of hostnames to match
+- Lists of DNS zone to match (a domain and their subdomains)
+- Lists of IP address / IP networks to match
+- Lists of Autonomous System numbers to match
+- An enormous quantity of DNS records

-1. Most ad-blocker don't support wildcards
-2. It's a DNS redirection, meaning that most ad-blockers will only see `somethingirelevant.website1.com`
+It will be able to output hostnames being a DNS redirection to any item in the lists provided.

-So the only solution is to block every `somethingirelevant.website1.com`-like subdomains known, which is a lot.
-That's where this scripts comes in, to generate a list of such subdomains.
+DNS records can be locally resolved from a list of subdomains using [MassDNS](https://github.com/blechschmidt/massdns).

-## How does this script work
+Those subdomains can either be provided as is, come from [Cisco Umbrella Popularity List](http://s3-us-west-1.amazonaws.com/umbrella-static/index.html), from your browsing history, or from analyzing the traffic a web browser makes when opening an URL (the program provides utility to do all that).

-It takes an input a list of websites with trackers included.
-So far, this list is manually-generated from the list of clients of such first-party trackers
-(latter we should use a general list of websites to be more exhaustive).
+## Usage

-It open each ones of those websites (just the homepage) in a web browser, and record the domains of the network requests the page makes.
-It then find the DNS redirections of those domains, and compare with regexes of known tracking domains.
-It finally outputs the matching ones.
+Remember you can get an already generated and up-to-date list of first-party trackers from [here](https://hostfiles.frogeye.fr).

-## Requirements
+The following is for the people wanting to build their own list.

-Just to build the list, you can find an already-built list in the releases.
+### Requirements

- Bash
- Python 3.4+
- Firefox
- Selenium
- seleniumwire
- dnspython
+Depending on the sources you'll be using to generate the list, you'll need to install some of the following:

-## Contributing
+- [Bash](https://www.gnu.org/software/bash/bash.html)
+- [Coreutils](https://www.gnu.org/software/coreutils/)
+- [Gawk](https://www.gnu.org/software/gawk/)
+- [curl](https://curl.haxx.se)
+- [pv](http://www.ivarch.com/programs/pv.shtml)
+- [Python 3.4+](https://www.python.org/)
+- [coloredlogs](https://pypi.org/project/coloredlogs/) (sorry I can't help myself)
+- [numpy](https://www.numpy.org/)
+- [python-abp](https://pypi.org/project/python-abp/) (only if you intend to use AdBlock rules as a rule source)
+- [massdns](https://github.com/blechschmidt/massdns) in your `$PATH` (only if you have subdomains as a source)
+- [Firefox](https://www.mozilla.org/firefox/) (only if you have websites as a source)
+- [selenium (Python bindings)](https://pypi.python.org/pypi/selenium) (only if you have websites as a source)
+- [selenium-wire](https://pypi.org/project/selenium-wire/) (only if you have websites as a source)
+- [markdown2](https://pypi.org/project/markdown2/) (only if you intend to generate the index webpage)

-### Adding websites
+### Create a new database

-Just add them to `websites.list`.
+The so-called database (in the form of `blocking.p`) is a file storing all the matching entities (ASN, IPs, hostnames, zones…) and every entity leading to it.
+It exists because the list cannot be generated in one pass, as DNS redirections chain links do not have to be inputed in order.

-### Adding first-party trackers regex
+You can purge of old records the database by running `./prune.sh`.
+When you remove a source of data, remove its corresponding file in `last_updates` to fix the pruning process.

-Just add them to `regexes.py`.
+### Gather external sources
+
+External sources are not stored in this repository.
+You'll need to fetch them by running `./fetch_resources.sh`.
+Those include:
+
+- Third-party trackers lists
+- TLD lists (used to test the validity of hostnames)
+- List of public DNS resolvers (for DNS resolving from subdomains)
+- Top 1M subdomains
+
+### Import rules into the database
+
+You need to put the lists of rules for matching in the different subfolders:
+
+- `rules`: Lists of DNS zones
+- `rules_ip`: Lists of IP networks (for IP addresses append `/32`)
+- `rules_asn`: Lists of Autonomous Systems numbers (IP ranges will be deducted from them)
+- `rules_adblock`: Lists of DNS zones, but in the form of AdBlock lists (only the ones concerning domains will be extracted)
+- `rules_hosts`: Lists of DNS zones, but in the form of hosts lists
+
+See the provided examples for syntax.
+
+In each folder:
+
+- `first-party.ext` will be the only files considered for the first-party variant of the list
+- `*.cache.ext` are from external sources, and thus might be deleted / overwrote
+- `*.custom.ext` are for sources that you don't want commited
+
+Then, run `./import_rules.sh`.
+
+If you removed rules and you want to remove every record depending on those rules immediately,
+run the following command:
+
+```
+./db.py --prune --prune-before "$(cat "last_updates/rules.txt")" --prune-base
+```
+
+### Add subdomains
+
+If you plan to resolve DNS records yourself (as the DNS records datasets are not exhaustive),
+the top 1M subdomains provided might not be enough.
+
+You can add them into the `subdomains` folder.
+It follows the same specificities as the rules folder for `*.cache.ext` and `*.custom.ext` files.
+
+#### Add personal sources
+
+Adding your own browsing history will help create a more suited subdomains list.
+Here's reference command for possible sources:
+
+- **Pi-hole**: `sqlite3 /etc/pihole-FTL.db "select distinct domain from queries" > /path/to/eulaurarien/subdomains/my-pihole.custom.list`
+- **Firefox**: `cp ~/.mozilla/firefox/<your_profile>.default/places.sqlite temp; sqlite3 temp "select distinct rev_host from moz_places" | rev | sed 's|^\.||' > /path/to/eulaurarien/subdomains/my-firefox.custom.list; rm temp`
+
+#### Collect subdomains from websites
+
+You can add the websites URLs into the `websites` folder.
+It follows the same specificities as the rules folder for `*.cache.ext` and `*.custom.ext` files.
+
+Then, run `collect_subdomain.sh`.
+This is a long step, and might be memory-intensive from time to time.
+
+> **Note:** For first-party tracking, a list of subdomains issued from the websites in the repository is avaliable here: <https://hostfiles.frogeye.fr/from_websites.cache.list> 
+
+### Resolve DNS records
+
+Once you've added subdomains, you'll need to resolve them to get their DNS records.
+The program will use a list of public nameservers to do that, but you can add your own in the `nameservers` directory.
+
+Then, run `./resolve_subdomains.sh`.
+Note that this is a network intensive process, not in term of bandwith, but in terms of packet number.
+
+> **Note:** Some VPS providers might detect this as a DDoS attack and cut the network access.
+> Some Wi-Fi connections can be rendered unusable for other uses, some routers might cease to work.
+> Since massdns does not support yet rate limiting, my best bet was a Raspberry Pi with a slow ethernet link (Raspberry Pi < 4).
+
+The DNS records will automatically be imported into the database.
+If you want to re-import the records without re-doing the resolving, just run the last line of the `./resolve_subdomains.sh` script.
+
+### Export the lists
+
+For the tracking list, use `./export_lists.sh`, the output will be in the `dist` folder (please change the links before distributing them).
+For other purposes, tinker with the `./export.py` program.
+
+#### Explanations
+
+Note that if you created an `explanations` folder at the root of the project, a file with a timestamp will be created in it.
+It contains every rule in the database and the reason of their presence (i.e. their dependency).
+This might be useful to track changes between runs.
+
+Every rule has an associated tag with four components:
+
+1. A number: the level of the rule (1 if it is a rule present in the `rules*` folders)
+2. A letter: `F` if first-party, `M` if multi-party.
+3. A letter: `D` if a dupplicate (e.g. `foo.bar.com` if `*.bar.com` is already a rule), `_` if not.
+4. A number: the number of rules relying on this one
+
+### Generate the index webpage
+
+This is the one served on <https://hostfiles.frogeye.fr>.
+Just run `./generate_index.py`.
+
+### Everything
+
+Once you've made sure every step runs fine, you can use `./eulaurarien.sh` to run every step consecutively.
--- a/adblock_to_domain_list.py
+++ b/adblock_to_domain_list.py
@ -0,0 +1,59 @@
+#!/usr/bin/env python3
+# pylint: disable=C0103
+
+"""
+Extract the domains to block as a whole
+from a AdBlock rules list.
+"""
+
+import argparse
+import sys
+import typing
+
+import abp.filters
+
+
+def get_domains(rule: abp.filters.parser.Filter) -> typing.Iterable[str]:
+    if rule.options:
+        return
+    selector_type = rule.selector["type"]
+    selector_value = rule.selector["value"]
+    if (
+        selector_type == "url-pattern"
+        and selector_value.startswith("||")
+        and selector_value.endswith("^")
+    ):
+        yield selector_value[2:-1]
+
+
+if __name__ == "__main__":
+
+    # Parsing arguments
+    parser = argparse.ArgumentParser(
+        description="Extract whole domains from an AdBlock blocking list"
+    )
+    parser.add_argument(
+        "-i",
+        "--input",
+        type=argparse.FileType("r"),
+        default=sys.stdin,
+        help="Input file with AdBlock rules",
+    )
+    parser.add_argument(
+        "-o",
+        "--output",
+        type=argparse.FileType("w"),
+        default=sys.stdout,
+        help="Outptut file with one rule tracking subdomain per line",
+    )
+    args = parser.parse_args()
+
+    # Reading rules
+    rules = abp.filters.parse_filterlist(args.input)
+
+    # Filtering
+    for rule in rules:
+        if not isinstance(rule, abp.filters.parser.Filter):
+            continue
+        for domain in get_domains(rule):
+            print(domain, file=args.output)
--- a/collect_subdomains.py
+++ b/collect_subdomains.py
@ -1,4 +1,5 @@
 #!/usr/bin/env python3
+# pylint: disable=C0103

 """
 From a list of URLs, output the subdomains
@ -8,9 +9,33 @@ accessed by the websites.
 import sys
 import typing
 import urllib.parse
+import time

+import progressbar
 import selenium.webdriver.firefox.options
 import seleniumwire.webdriver
+import logging
+
+log = logging.getLogger("cs")
+DRIVER = None
+SCROLL_TIME = 10.0
+SCROLL_STEPS = 100
+SCROLL_CMD = f"window.scrollBy(0,document.body.scrollHeight/{SCROLL_STEPS})"
+
+
+def new_driver() -> seleniumwire.webdriver.browser.Firefox:
+    profile = selenium.webdriver.FirefoxProfile()
+    profile.set_preference("privacy.trackingprotection.enabled", False)
+    profile.set_preference("network.cookie.cookieBehavior", 0)
+    profile.set_preference("privacy.trackingprotection.pbmode.enabled", False)
+    profile.set_preference("privacy.trackingprotection.cryptomining.enabled", False)
+    profile.set_preference("privacy.trackingprotection.fingerprinting.enabled", False)
+    options = selenium.webdriver.firefox.options.Options()
+    # options.add_argument('-headless')
+    driver = seleniumwire.webdriver.Firefox(
+        profile, executable_path="geckodriver", options=options
+    )
+    return driver


 def subdomain_from_url(url: str) -> str:
@ -26,22 +51,47 @@ def collect_subdomains(url: str) -> typing.Iterable[str]:
    Load an URL into an headless browser and return all the domains
    it tried to access.
    """
-    options = selenium.webdriver.firefox.options.Options()
-    options.add_argument('-headless')
-    driver = seleniumwire.webdriver.Firefox(
-        executable_path='geckodriver', options=options)
+    global DRIVER
+    if not DRIVER:
+        DRIVER = new_driver()

-    driver.get(url)
-    for request in driver.requests:
-        if request.response:
-            yield subdomain_from_url(request.path)
-    driver.close()
+    try:
+        DRIVER.get(url)
+        for s in range(SCROLL_STEPS):
+            DRIVER.execute_script(SCROLL_CMD)
+            time.sleep(SCROLL_TIME / SCROLL_STEPS)
+        for request in DRIVER.requests:
+            if request.response:
+                yield subdomain_from_url(request.path)
+    except Exception:
+        log.exception("Error")
+        DRIVER.quit()
+        DRIVER = None


-if __name__ == '__main__':
-    for line in sys.stdin:
-        line = line.strip()
-        if not line:
-            continue
-        for subdomain in collect_subdomains(line):
-            print(subdomain)
+def collect_subdomains_standalone(url: str) -> None:
+    url = url.strip()
+    if not url:
+        return
+    for subdomain in collect_subdomains(url):
+        print(subdomain)
+
+
+if __name__ == "__main__":
+    assert len(sys.argv) <= 2
+    filename = None
+    if len(sys.argv) == 2 and sys.argv[1] != "-":
+        filename = sys.argv[1]
+        num_lines = sum(1 for line in open(filename))
+        iterator = progressbar.progressbar(open(filename), max_value=num_lines)
+    else:
+        iterator = sys.stdin
+
+    for line in iterator:
+        collect_subdomains_standalone(line)
+
+    if DRIVER:
+        DRIVER.quit()
+
+    if filename:
+        iterator.close()
--- a/collect_subdomains.sh
+++ b/collect_subdomains.sh
@ -0,0 +1,11 @@
+#!/usr/bin/env bash
+
+function log() {
+    echo -e "\033[33m$@\033[0m"
+}
+
+# Get all subdomains accessed by each website in the website list
+
+cat websites/*.list | sort -u > temp/all_websites.list
+./collect_subdomains.py temp/all_websites.list > temp/subdomains_from_websites.list
+sort -u temp/subdomains_from_websites.list > subdomains/from_websites.cache.list
--- a/database.py
+++ b/database.py
@ -0,0 +1,799 @@
+#!/usr/bin/env python3
+
+"""
+Utility functions to interact with the database.
+"""
+
+import typing
+import time
+import logging
+import coloredlogs
+import pickle
+import numpy
+import math
+import os
+
+TLD_LIST: typing.Set[str] = set()
+
+coloredlogs.install(level="DEBUG", fmt="%(asctime)s %(name)s %(levelname)s %(message)s")
+
+Asn = int
+Timestamp = int
+Level = int
+
+
+class Path:
+    pass
+
+
+class RulePath(Path):
+    def __str__(self) -> str:
+        return "(rule)"
+
+
+class RuleFirstPath(RulePath):
+    def __str__(self) -> str:
+        return "(first-party rule)"
+
+
+class RuleMultiPath(RulePath):
+    def __str__(self) -> str:
+        return "(multi-party rule)"
+
+
+class DomainPath(Path):
+    def __init__(self, parts: typing.List[str]):
+        self.parts = parts
+
+    def __str__(self) -> str:
+        return "?." + Database.unpack_domain(self)
+
+
+class HostnamePath(DomainPath):
+    def __str__(self) -> str:
+        return Database.unpack_domain(self)
+
+
+class ZonePath(DomainPath):
+    def __str__(self) -> str:
+        return "*." + Database.unpack_domain(self)
+
+
+class AsnPath(Path):
+    def __init__(self, asn: Asn):
+        self.asn = asn
+
+    def __str__(self) -> str:
+        return Database.unpack_asn(self)
+
+
+class Ip4Path(Path):
+    def __init__(self, value: int, prefixlen: int):
+        self.value = value
+        self.prefixlen = prefixlen
+
+    def __str__(self) -> str:
+        return Database.unpack_ip4network(self)
+
+
+class Match:
+    def __init__(self) -> None:
+        self.source: typing.Optional[Path] = None
+        self.updated: int = 0
+        self.dupplicate: bool = False
+
+        # Cache
+        self.level: int = 0
+        self.first_party: bool = False
+        self.references: int = 0
+
+    def active(self, first_party: bool = None) -> bool:
+        if self.updated == 0 or (first_party and not self.first_party):
+            return False
+        return True
+
+    def disable(self) -> None:
+        self.updated = 0
+
+
+class AsnNode(Match):
+    def __init__(self) -> None:
+        Match.__init__(self)
+        self.name = ""
+
+
+class DomainTreeNode:
+    def __init__(self) -> None:
+        self.children: typing.Dict[str, DomainTreeNode] = dict()
+        self.match_zone = Match()
+        self.match_hostname = Match()
+
+
+class IpTreeNode(Match):
+    def __init__(self) -> None:
+        Match.__init__(self)
+        self.zero: typing.Optional[IpTreeNode] = None
+        self.one: typing.Optional[IpTreeNode] = None
+
+
+Node = typing.Union[DomainTreeNode, IpTreeNode, AsnNode]
+MatchCallable = typing.Callable[[Path, Match], typing.Any]
+
+
+class Profiler:
+    def __init__(self) -> None:
+        do_profile = int(os.environ.get("PROFILE", "0"))
+        if do_profile:
+            self.log = logging.getLogger("profiler")
+            self.time_last = time.perf_counter()
+            self.time_step = "init"
+            self.time_dict: typing.Dict[str, float] = dict()
+            self.step_dict: typing.Dict[str, int] = dict()
+            self.enter_step = self.enter_step_real
+            self.profile = self.profile_real
+        else:
+            self.enter_step = self.enter_step_dummy
+            self.profile = self.profile_dummy
+
+    def enter_step_dummy(self, name: str) -> None:
+        return
+
+    def enter_step_real(self, name: str) -> None:
+        now = time.perf_counter()
+        try:
+            self.time_dict[self.time_step] += now - self.time_last
+            self.step_dict[self.time_step] += int(name != self.time_step)
+        except KeyError:
+            self.time_dict[self.time_step] = now - self.time_last
+            self.step_dict[self.time_step] = 1
+        self.time_step = name
+        self.time_last = time.perf_counter()
+
+    def profile_dummy(self) -> None:
+        return
+
+    def profile_real(self) -> None:
+        self.enter_step("profile")
+        total = sum(self.time_dict.values())
+        for key, secs in sorted(self.time_dict.items(), key=lambda t: t[1]):
+            times = self.step_dict[key]
+            self.log.debug(
+                f"{key:<20}: {times:9d} × {secs/times:5.3e} "
+                f"= {secs:9.2f} s ({secs/total:7.2%}) "
+            )
+        self.log.debug(
+            f"{'total':<20}:                         " f"{total:9.2f} s ({1:7.2%})"
+        )
+
+
+class Database(Profiler):
+    VERSION = 18
+    PATH = "blocking.p"
+
+    def initialize(self) -> None:
+        self.log.warning("Creating database version: %d ", Database.VERSION)
+        # Dummy match objects that everything refer to
+        self.rules: typing.List[Match] = list()
+        for first_party in (False, True):
+            m = Match()
+            m.updated = 1
+            m.level = 0
+            m.first_party = first_party
+            self.rules.append(m)
+        self.domtree = DomainTreeNode()
+        self.asns: typing.Dict[Asn, AsnNode] = dict()
+        self.ip4tree = IpTreeNode()
+
+    def load(self) -> None:
+        self.enter_step("load")
+        try:
+            with open(self.PATH, "rb") as db_fdsec:
+                version, data = pickle.load(db_fdsec)
+                if version == Database.VERSION:
+                    self.rules, self.domtree, self.asns, self.ip4tree = data
+                    return
+                self.log.warning(
+                    "Outdated database version found: %d, " "it will be rebuilt.",
+                    version,
+                )
+        except (TypeError, AttributeError, EOFError):
+            self.log.error(
+                "Corrupt (or heavily outdated) database found, " "it will be rebuilt."
+            )
+        except FileNotFoundError:
+            pass
+        self.initialize()
+
+    def save(self) -> None:
+        self.enter_step("save")
+        with open(self.PATH, "wb") as db_fdsec:
+            data = self.rules, self.domtree, self.asns, self.ip4tree
+            pickle.dump((self.VERSION, data), db_fdsec)
+        self.profile()
+
+    def __init__(self) -> None:
+        Profiler.__init__(self)
+        self.log = logging.getLogger("db")
+        self.load()
+        self.ip4cache_shift: int = 32
+        self.ip4cache = numpy.ones(1)
+
+    def _set_ip4cache(self, path: Path, _: Match) -> None:
+        assert isinstance(path, Ip4Path)
+        self.enter_step("set_ip4cache")
+        mini = path.value >> self.ip4cache_shift
+        maxi = (path.value + 2 ** (32 - path.prefixlen)) >> self.ip4cache_shift
+        if mini == maxi:
+            self.ip4cache[mini] = True
+        else:
+            self.ip4cache[mini:maxi] = True
+
+    def fill_ip4cache(self, max_size: int = 512 * 1024 ** 2) -> None:
+        """
+        Size in bytes
+        """
+        if max_size > 2 ** 32 / 8:
+            self.log.warning(
+                "Allocating more than 512 MiB of RAM for "
+                "the Ip4 cache is not necessary."
+            )
+        max_cache_width = int(math.log2(max(1, max_size * 8)))
+        allocated = False
+        cache_width = min(32, max_cache_width)
+        while not allocated:
+            cache_size = 2 ** cache_width
+            try:
+                self.ip4cache = numpy.zeros(cache_size, dtype=bool)
+            except MemoryError:
+                self.log.exception("Could not allocate cache. Retrying a smaller one.")
+                cache_width -= 1
+                continue
+            allocated = True
+        self.ip4cache_shift = 32 - cache_width
+        for _ in self.exec_each_ip4(self._set_ip4cache):
+            pass
+
+    @staticmethod
+    def populate_tld_list() -> None:
+        with open("temp/all_tld.list", "r") as tld_fdesc:
+            for tld in tld_fdesc:
+                tld = tld.strip()
+                TLD_LIST.add(tld)
+
+    @staticmethod
+    def validate_domain(path: str) -> bool:
+        if len(path) > 255:
+            return False
+        splits = path.split(".")
+        if not TLD_LIST:
+            Database.populate_tld_list()
+        if splits[-1] not in TLD_LIST:
+            return False
+        for split in splits:
+            if not 1 <= len(split) <= 63:
+                return False
+        return True
+
+    @staticmethod
+    def pack_domain(domain: str) -> DomainPath:
+        return DomainPath(domain.split(".")[::-1])
+
+    @staticmethod
+    def unpack_domain(domain: DomainPath) -> str:
+        return ".".join(domain.parts[::-1])
+
+    @staticmethod
+    def pack_asn(asn: str) -> AsnPath:
+        asn = asn.upper()
+        if asn.startswith("AS"):
+            asn = asn[2:]
+        return AsnPath(int(asn))
+
+    @staticmethod
+    def unpack_asn(asn: AsnPath) -> str:
+        return f"AS{asn.asn}"
+
+    @staticmethod
+    def validate_ip4address(path: str) -> bool:
+        splits = path.split(".")
+        if len(splits) != 4:
+            return False
+        for split in splits:
+            try:
+                if not 0 <= int(split) <= 255:
+                    return False
+            except ValueError:
+                return False
+        return True
+
+    @staticmethod
+    def pack_ip4address_low(address: str) -> int:
+        addr = 0
+        for split in address.split("."):
+            octet = int(split)
+            addr = (addr << 8) + octet
+        return addr
+
+    @staticmethod
+    def pack_ip4address(address: str) -> Ip4Path:
+        return Ip4Path(Database.pack_ip4address_low(address), 32)
+
+    @staticmethod
+    def unpack_ip4address(address: Ip4Path) -> str:
+        addr = address.value
+        assert address.prefixlen == 32
+        octets: typing.List[int] = list()
+        octets = [0] * 4
+        for o in reversed(range(4)):
+            octets[o] = addr & 0xFF
+            addr >>= 8
+        return ".".join(map(str, octets))
+
+    @staticmethod
+    def validate_ip4network(path: str) -> bool:
+        # A bit generous but ok for our usage
+        splits = path.split("/")
+        if len(splits) != 2:
+            return False
+        if not Database.validate_ip4address(splits[0]):
+            return False
+        try:
+            if not 0 <= int(splits[1]) <= 32:
+                return False
+        except ValueError:
+            return False
+        return True
+
+    @staticmethod
+    def pack_ip4network(network: str) -> Ip4Path:
+        address, prefixlen_str = network.split("/")
+        prefixlen = int(prefixlen_str)
+        addr = Database.pack_ip4address(address)
+        addr.prefixlen = prefixlen
+        return addr
+
+    @staticmethod
+    def unpack_ip4network(network: Ip4Path) -> str:
+        addr = network.value
+        octets: typing.List[int] = list()
+        octets = [0] * 4
+        for o in reversed(range(4)):
+            octets[o] = addr & 0xFF
+            addr >>= 8
+        return ".".join(map(str, octets)) + "/" + str(network.prefixlen)
+
+    def get_match(self, path: Path) -> Match:
+        if isinstance(path, RuleMultiPath):
+            return self.rules[0]
+        elif isinstance(path, RuleFirstPath):
+            return self.rules[1]
+        elif isinstance(path, AsnPath):
+            return self.asns[path.asn]
+        elif isinstance(path, DomainPath):
+            dicd = self.domtree
+            for part in path.parts:
+                dicd = dicd.children[part]
+            if isinstance(path, HostnamePath):
+                return dicd.match_hostname
+            elif isinstance(path, ZonePath):
+                return dicd.match_zone
+            else:
+                raise ValueError
+        elif isinstance(path, Ip4Path):
+            dici = self.ip4tree
+            for i in range(31, 31 - path.prefixlen, -1):
+                bit = (path.value >> i) & 0b1
+                dici_next = dici.one if bit else dici.zero
+                if not dici_next:
+                    raise IndexError
+                dici = dici_next
+            return dici
+        else:
+            raise ValueError
+
+    def exec_each_asn(
+        self,
+        callback: MatchCallable,
+    ) -> typing.Any:
+        for asn in self.asns:
+            match = self.asns[asn]
+            if match.active():
+                c = callback(
+                    AsnPath(asn),
+                    match,
+                )
+                try:
+                    yield from c
+                except TypeError:  # not iterable
+                    pass
+
+    def exec_each_domain(
+        self,
+        callback: MatchCallable,
+        _dic: DomainTreeNode = None,
+        _par: DomainPath = None,
+    ) -> typing.Any:
+        _dic = _dic or self.domtree
+        _par = _par or DomainPath([])
+        if _dic.match_hostname.active():
+            c = callback(
+                HostnamePath(_par.parts),
+                _dic.match_hostname,
+            )
+            try:
+                yield from c
+            except TypeError:  # not iterable
+                pass
+        if _dic.match_zone.active():
+            c = callback(
+                ZonePath(_par.parts),
+                _dic.match_zone,
+            )
+            try:
+                yield from c
+            except TypeError:  # not iterable
+                pass
+        for part in _dic.children:
+            dic = _dic.children[part]
+            yield from self.exec_each_domain(
+                callback, _dic=dic, _par=DomainPath(_par.parts + [part])
+            )
+
+    def exec_each_ip4(
+        self,
+        callback: MatchCallable,
+        _dic: IpTreeNode = None,
+        _par: Ip4Path = None,
+    ) -> typing.Any:
+        _dic = _dic or self.ip4tree
+        _par = _par or Ip4Path(0, 0)
+        if _dic.active():
+            c = callback(
+                _par,
+                _dic,
+            )
+            try:
+                yield from c
+            except TypeError:  # not iterable
+                pass
+
+        # 0
+        pref = _par.prefixlen + 1
+        dic = _dic.zero
+        if dic:
+            # addr0 = _par.value & (0xFFFFFFFF ^ (1 << (32-pref)))
+            # assert addr0 == _par.value
+            addr0 = _par.value
+            yield from self.exec_each_ip4(callback, _dic=dic, _par=Ip4Path(addr0, pref))
+        # 1
+        dic = _dic.one
+        if dic:
+            addr1 = _par.value | (1 << (32 - pref))
+            # assert addr1 != _par.value
+            yield from self.exec_each_ip4(callback, _dic=dic, _par=Ip4Path(addr1, pref))
+
+    def exec_each(
+        self,
+        callback: MatchCallable,
+    ) -> typing.Any:
+        yield from self.exec_each_domain(callback)
+        yield from self.exec_each_ip4(callback)
+        yield from self.exec_each_asn(callback)
+
+    def update_references(self) -> None:
+        # Should be correctly calculated normally,
+        # keeping this just in case
+        def reset_references_cb(path: Path, match: Match) -> None:
+            match.references = 0
+
+        for _ in self.exec_each(reset_references_cb):
+            pass
+
+        def increment_references_cb(path: Path, match: Match) -> None:
+            if match.source:
+                source = self.get_match(match.source)
+                source.references += 1
+
+        for _ in self.exec_each(increment_references_cb):
+            pass
+
+    def _clean_deps(self) -> None:
+        # Disable the matches that depends on the targeted
+        # matches until all disabled matches reference count = 0
+        did_something = True
+
+        def clean_deps_cb(path: Path, match: Match) -> None:
+            nonlocal did_something
+            if not match.source:
+                return
+            source = self.get_match(match.source)
+            if not source.active():
+                self._unset_match(match)
+            elif match.first_party > source.first_party:
+                match.first_party = source.first_party
+            else:
+                return
+            did_something = True
+
+        while did_something:
+            did_something = False
+            self.enter_step("pass_clean_deps")
+            for _ in self.exec_each(clean_deps_cb):
+                pass
+
+    def prune(self, before: int, base_only: bool = False) -> None:
+        # Disable the matches targeted
+        def prune_cb(path: Path, match: Match) -> None:
+            if base_only and match.level > 1:
+                return
+            if match.updated > before:
+                return
+            self._unset_match(match)
+            self.log.debug("Print: disabled %s", path)
+
+        self.enter_step("pass_prune")
+        for _ in self.exec_each(prune_cb):
+            pass
+
+        self._clean_deps()
+
+        # Remove branches with no match
+        # TODO
+
+    def explain(self, path: Path) -> str:
+        match = self.get_match(path)
+        string = str(path)
+        if isinstance(match, AsnNode):
+            string += f" ({match.name})"
+        party_char = "F" if match.first_party else "M"
+        dup_char = "D" if match.dupplicate else "_"
+        string += f" {match.level}{party_char}{dup_char}{match.references}"
+        if match.source:
+            string += f" ← {self.explain(match.source)}"
+        return string
+
+    def list_records(
+        self,
+        first_party_only: bool = False,
+        end_chain_only: bool = False,
+        no_dupplicates: bool = False,
+        rules_only: bool = False,
+        hostnames_only: bool = False,
+        explain: bool = False,
+    ) -> typing.Iterable[str]:
+        def export_cb(path: Path, match: Match) -> typing.Iterable[str]:
+            if first_party_only and not match.first_party:
+                return
+            if end_chain_only and match.references > 0:
+                return
+            if no_dupplicates and match.dupplicate:
+                return
+            if rules_only and match.level > 1:
+                return
+            if hostnames_only and not isinstance(path, HostnamePath):
+                return
+
+            if explain:
+                yield self.explain(path)
+            else:
+                yield str(path)
+
+        yield from self.exec_each(export_cb)
+
+    def count_records(
+        self,
+        first_party_only: bool = False,
+        end_chain_only: bool = False,
+        no_dupplicates: bool = False,
+        rules_only: bool = False,
+        hostnames_only: bool = False,
+    ) -> str:
+        memo: typing.Dict[str, int] = dict()
+
+        def count_records_cb(path: Path, match: Match) -> None:
+            if first_party_only and not match.first_party:
+                return
+            if end_chain_only and match.references > 0:
+                return
+            if no_dupplicates and match.dupplicate:
+                return
+            if rules_only and match.level > 1:
+                return
+            if hostnames_only and not isinstance(path, HostnamePath):
+                return
+
+            try:
+                memo[path.__class__.__name__] += 1
+            except KeyError:
+                memo[path.__class__.__name__] = 1
+
+        for _ in self.exec_each(count_records_cb):
+            pass
+
+        split: typing.List[str] = list()
+        for key, value in sorted(memo.items(), key=lambda s: s[0]):
+            split.append(f"{key[:-4].lower()}s: {value}")
+        return ", ".join(split)
+
+    def get_domain(self, domain_str: str) -> typing.Iterable[DomainPath]:
+        self.enter_step("get_domain_pack")
+        domain = self.pack_domain(domain_str)
+        self.enter_step("get_domain_brws")
+        dic = self.domtree
+        depth = 0
+        for part in domain.parts:
+            if dic.match_zone.active():
+                self.enter_step("get_domain_yield")
+                yield ZonePath(domain.parts[:depth])
+            self.enter_step("get_domain_brws")
+            if part not in dic.children:
+                return
+            dic = dic.children[part]
+            depth += 1
+        if dic.match_zone.active():
+            self.enter_step("get_domain_yield")
+            yield ZonePath(domain.parts)
+        if dic.match_hostname.active():
+            self.enter_step("get_domain_yield")
+            yield HostnamePath(domain.parts)
+
+    def get_ip4(self, ip4_str: str) -> typing.Iterable[Path]:
+        self.enter_step("get_ip4_pack")
+        ip4val = self.pack_ip4address_low(ip4_str)
+        self.enter_step("get_ip4_cache")
+        if not self.ip4cache[ip4val >> self.ip4cache_shift]:
+            return
+        self.enter_step("get_ip4_brws")
+        dic = self.ip4tree
+        for i in range(31, -1, -1):
+            bit = (ip4val >> i) & 0b1
+            if dic.active():
+                self.enter_step("get_ip4_yield")
+                yield Ip4Path(ip4val >> (i + 1) << (i + 1), 31 - i)
+                self.enter_step("get_ip4_brws")
+            next_dic = dic.one if bit else dic.zero
+            if next_dic is None:
+                return
+            dic = next_dic
+        if dic.active():
+            self.enter_step("get_ip4_yield")
+            yield Ip4Path(ip4val, 32)
+
+    def _unset_match(
+        self,
+        match: Match,
+    ) -> None:
+        match.disable()
+        if match.source:
+            source_match = self.get_match(match.source)
+            source_match.references -= 1
+
+    def _set_match(
+        self,
+        match: Match,
+        updated: int,
+        source: Path,
+        source_match: Match = None,
+        dupplicate: bool = False,
+    ) -> None:
+        # source_match is in parameters because most of the time
+        # its parent function needs it too,
+        # so it can pass it to save a traversal
+        source_match = source_match or self.get_match(source)
+        new_level = source_match.level + 1
+        if (
+            updated > match.updated
+            or new_level < match.level
+            or source_match.first_party > match.first_party
+        ):
+            # NOTE FP and level of matches referencing this one
+            # won't be updated until run or prune
+            if match.source:
+                old_source = self.get_match(match.source)
+                old_source.references -= 1
+            match.updated = updated
+            match.level = new_level
+            match.first_party = source_match.first_party
+            match.source = source
+            source_match.references += 1
+            match.dupplicate = dupplicate
+
+    def _set_domain(
+        self, hostname: bool, domain_str: str, updated: int, source: Path
+    ) -> None:
+        self.enter_step("set_domain_val")
+        if not Database.validate_domain(domain_str):
+            raise ValueError(f"Invalid domain: {domain_str}")
+        self.enter_step("set_domain_pack")
+        domain = self.pack_domain(domain_str)
+        self.enter_step("set_domain_fp")
+        source_match = self.get_match(source)
+        is_first_party = source_match.first_party
+        self.enter_step("set_domain_brws")
+        dic = self.domtree
+        dupplicate = False
+        for part in domain.parts:
+            if part not in dic.children:
+                dic.children[part] = DomainTreeNode()
+            dic = dic.children[part]
+            if dic.match_zone.active(is_first_party):
+                dupplicate = True
+        if hostname:
+            match = dic.match_hostname
+        else:
+            match = dic.match_zone
+        self._set_match(
+            match,
+            updated,
+            source,
+            source_match=source_match,
+            dupplicate=dupplicate,
+        )
+
+    def set_hostname(self, *args: typing.Any, **kwargs: typing.Any) -> None:
+        self._set_domain(True, *args, **kwargs)
+
+    def set_zone(self, *args: typing.Any, **kwargs: typing.Any) -> None:
+        self._set_domain(False, *args, **kwargs)
+
+    def set_asn(self, asn_str: str, updated: int, source: Path) -> None:
+        self.enter_step("set_asn")
+        path = self.pack_asn(asn_str)
+        if path.asn in self.asns:
+            match = self.asns[path.asn]
+        else:
+            match = AsnNode()
+            self.asns[path.asn] = match
+        self._set_match(
+            match,
+            updated,
+            source,
+        )
+
+    def _set_ip4(self, ip4: Ip4Path, updated: int, source: Path) -> None:
+        self.enter_step("set_ip4_fp")
+        source_match = self.get_match(source)
+        is_first_party = source_match.first_party
+        self.enter_step("set_ip4_brws")
+        dic = self.ip4tree
+        dupplicate = False
+        for i in range(31, 31 - ip4.prefixlen, -1):
+            bit = (ip4.value >> i) & 0b1
+            next_dic = dic.one if bit else dic.zero
+            if next_dic is None:
+                next_dic = IpTreeNode()
+                if bit:
+                    dic.one = next_dic
+                else:
+                    dic.zero = next_dic
+            dic = next_dic
+            if dic.active(is_first_party):
+                dupplicate = True
+        self._set_match(
+            dic,
+            updated,
+            source,
+            source_match=source_match,
+            dupplicate=dupplicate,
+        )
+        self._set_ip4cache(ip4, dic)
+
+    def set_ip4address(
+        self, ip4address_str: str, *args: typing.Any, **kwargs: typing.Any
+    ) -> None:
+        self.enter_step("set_ip4add_val")
+        if not Database.validate_ip4address(ip4address_str):
+            raise ValueError(f"Invalid ip4address: {ip4address_str}")
+        self.enter_step("set_ip4add_pack")
+        ip4 = self.pack_ip4address(ip4address_str)
+        self._set_ip4(ip4, *args, **kwargs)
+
+    def set_ip4network(
+        self, ip4network_str: str, *args: typing.Any, **kwargs: typing.Any
+    ) -> None:
+        self.enter_step("set_ip4net_val")
+        if not Database.validate_ip4network(ip4network_str):
+            raise ValueError(f"Invalid ip4network: {ip4network_str}")
+        self.enter_step("set_ip4net_pack")
+        ip4 = self.pack_ip4network(ip4network_str)
+        self._set_ip4(ip4, *args, **kwargs)
--- a/db.py
+++ b/db.py
@ -0,0 +1,54 @@
+#!/usr/bin/env python3
+
+import argparse
+import database
+import time
+import os
+
+if __name__ == "__main__":
+
+    # Parsing arguments
+    parser = argparse.ArgumentParser(description="Database operations")
+    parser.add_argument(
+        "-i", "--initialize", action="store_true", help="Reconstruct the whole database"
+    )
+    parser.add_argument(
+        "-p", "--prune", action="store_true", help="Remove old entries from database"
+    )
+    parser.add_argument(
+        "-b",
+        "--prune-base",
+        action="store_true",
+        help="With --prune, only prune base rules "
+        "(the ones added by ./feed_rules.py)",
+    )
+    parser.add_argument(
+        "-s",
+        "--prune-before",
+        type=int,
+        default=(int(time.time()) - 60 * 60 * 24 * 31 * 6),
+        help="With --prune, only rules updated before "
+        "this UNIX timestamp will be deleted",
+    )
+    parser.add_argument(
+        "-r",
+        "--references",
+        action="store_true",
+        help="DEBUG: Update the reference count",
+    )
+    args = parser.parse_args()
+
+    if not args.initialize:
+        DB = database.Database()
+    else:
+        if os.path.isfile(database.Database.PATH):
+            os.unlink(database.Database.PATH)
+        DB = database.Database()
+
+    DB.enter_step("main")
+    if args.prune:
+        DB.prune(before=args.prune_before, base_only=args.prune_base)
+    if args.references:
+        DB.update_references()
+
+    DB.save()
--- a/dist/.gitignore
+++ b/dist/.gitignore
@ -0,0 +1,2 @@
+*.txt
+*.html
--- a/dist/README.md
+++ b/dist/README.md
@ -0,0 +1,114 @@
+# Geoffrey Frogeye's block list of first-party trackers
+
+## What's a first-party tracker?
+
+A tracker is a script put on many websites to gather informations about the visitor.
+They can be used for multiple reasons: statistics, risk management, marketing, ads serving…
+In any case, they are a threat to Internet users' privacy and many may want to block them.
+
+Traditionnaly, trackers are served from a third-party.
+For example, `website1.com` and `website2.com` both load their tracking script from `https://trackercompany.com/trackerscript.js`.
+In order to block those, one can simply block the hostname `trackercompany.com`, which is what most ad blockers do.
+
+However, to circumvent this block, tracker companies made the websites using them load trackers from `somestring.website1.com`.
+The latter is a DNS redirection to `website1.trackercompany.com`, directly to an IP address belonging to the tracking company.
+
+Those are called first-party trackers.
+On top of aforementionned privacy issues, they also cause some security issue, as websites usually trust those scripts more.
+For more information, learn about [Content Security Policy](https://developer.mozilla.org/en-US/docs/Web/HTTP/CSP), [same-origin policy](https://developer.mozilla.org/en-US/docs/Web/Security/Same-origin_policy) and [Cross-Origin Resource Sharing](https://enable-cors.org/).
+
+In order to block those trackers, ad blockers would need to block every subdomain pointing to anything under `trackercompany.com` or to their network.
+Unfortunately, most don't support those blocking methods as they are not DNS-aware, e.g. they only see `somestring.website1.com`.
+
+This list is an inventory of every `somestring.website1.com` found to allow non DNS-aware ad blocker to still block first-party trackers.
+
+### Learn more
+
+- [CNAME Cloaking, the dangerous disguise of third-party trackers](https://medium.com/nextdns/cname-cloaking-the-dangerous-disguise-of-third-party-trackers-195205dc522a) from NextDNS
+- [Trackers first-party](https://blog.imirhil.fr/2019/11/13/first-party-tracker.html) from Aeris, in french
+- [uBlock Origin issue](https://github.com/uBlockOrigin/uBlock-issues/issues/780)
+- [CNAME Cloaking and Bounce Tracking Defense](https://webkit.org/blog/11338/cname-cloaking-and-bounce-tracking-defense/) on WebKit's blog
+- [Characterizing CNAME cloaking-based tracking](https://blog.apnic.net/2020/08/04/characterizing-cname-cloaking-based-tracking/) on APNIC's webiste
+- [Characterizing CNAME Cloaking-Based Tracking on the Web](https://tma.ifip.org/2020/wp-content/uploads/sites/9/2020/06/tma2020-camera-paper66.pdf) is a research paper from Sokendai and ANSSI
+
+## List variants
+
+### First-party trackers
+
+**Recommended for hostfiles-based ad blockers, such as [Pi-hole](https://pi-hole.net/) (&lt;v5.0, as it introduced CNAME blocking).**
+**Recommended for Android ad blockers as applications, such ad [Blokada](https://blokada.org/).**
+
+- Hosts file: <https://hostfiles.frogeye.fr/firstparty-trackers-hosts.txt>
+- Raw list: <https://hostfiles.frogeye.fr/firstparty-trackers.txt>
+
+This list contains every hostname redirecting to [a hand-picked list of first-party trackers](https://git.frogeye.fr/geoffrey/eulaurarien/src/branch/master/rules/first-party.list).
+It should be safe from false-positives.
+It also contains all tracking hostnames under company domains (e.g. `website1.trackercompany.com`),
+useful for ad blockers that don't support mass regex blocking,
+while still preventing fallback to third-party trackers.
+Don't be afraid of the size of the list, as this is due to the nature of first-party trackers: a single tracker generates at least one hostname per client (typically two).
+
+### First-party only trackers
+
+**Recommended for ad blockers as web browser extensions, such as [uBlock Origin](https://ublockorigin.com/) (&lt;v1.25.0 or for Chromium-based browsers, as it introduced CNAME uncloaking for Firefox).**
+
+- Hosts file: <https://hostfiles.frogeye.fr/firstparty-only-trackers-hosts.txt>
+- Raw list: <https://hostfiles.frogeye.fr/firstparty-only-trackers.txt>
+
+This is the same list as above, albeit not containing the hostnames under the tracking company domains (e.g. `website1.trackercompany.com`).
+This allows for reducing the size of the list for ad-blockers that already block those third-party trackers with their support of regex blocking.
+Use in conjunction with other block lists used in regex-mode, such as [Peter Lowe's](https://pgl.yoyo.org/adservers/)
+
+### Multi-party trackers
+
+- Hosts file: <https://hostfiles.frogeye.fr/multiparty-trackers-hosts.txt>
+- Raw list: <https://hostfiles.frogeye.fr/multiparty-trackers.txt>
+
+As first-party trackers usually evolve from third-party trackers, this list contains every hostname redirecting to trackers found in existing lists of third-party trackers (see next section).
+Since the latter were not designed with first-party trackers in mind, they are likely to contain false-positives.
+On the other hand, they might protect against first-party tracker that we're not aware of / have not yet confirmed.
+
+#### Source of third-party trackers
+
+- [EasyPrivacy](https://easylist.to/easylist/easyprivacy.txt)
+- [AdGuard](https://github.com/AdguardTeam/AdguardFilters)
+
+(yes there's only two for now. A lot of existing ones cause a lot of false positives)
+
+### Multi-party only trackers
+
+- Hosts file: <https://hostfiles.frogeye.fr/multiparty-only-trackers-hosts.txt>
+- Raw list: <https://hostfiles.frogeye.fr/multiparty-only-trackers.txt>
+
+This is the same list as above, albeit not containing the hostnames under the tracking company domains (e.g. `website1.trackercompany.com`).
+This allows for reducing the size of the list for ad-blockers that already block those third-party trackers with their support of regex blocking.
+Use in conjunction with other block lists used in regex-mode, such as the ones in the previous section.
+
+## Meta
+
+In case of false positives/negatives, or any other question contact me the way you like: <https://geoffrey.frogeye.fr>
+
+The software used to generate this list is available here: <https://git.frogeye.fr/geoffrey/eulaurarien>
+
+## Acknowledgements
+
+Some of the first-party tracker included in this list have been found by:
+
+- [Aeris](https://imirhil.fr/)
+- NextDNS and [their blocklist](https://github.com/nextdns/cname-cloaking-blocklist)'s contributors
+- Yuki2718 from [Wilders Security Forums](https://www.wilderssecurity.com/threads/ublock-a-lean-and-fast-blocker.365273/page-168#post-2880361)
+- Ha Dao, Johan Mazel, and Kensuke Fukuda, ["Characterizing CNAME Cloaking-Based Tracking on the Web", Proceedings of IFIP/IEEE Traffic Measurement Analysis Conference (TMA), 9 pages, 2020.](https://tma.ifip.org/2020/wp-content/uploads/sites/9/2020/06/tma2020-camera-paper66.pdf)
+- AdGuard and [their blocklist](https://github.com/AdguardTeam/cname-trackers)'s contributors
+
+The list was generated using data from
+
+- [Cisco Umbrella Popularity List](http://s3-us-west-1.amazonaws.com/umbrella-static/index.html)
+- [Public DNS Server List](https://public-dns.info/)
+
+
+Similar projects:
+
+- [NextDNS blocklist](https://github.com/nextdns/cname-cloaking-blocklist): for DNS-aware ad blockers
+- [Stefan Froberg's lists](https://www.orwell1984.today/cname/): subset of those lists grouped by tracker
+- [AdGuard blocklist](https://github.com/AdguardTeam/cname-trackers): same thing with a bigger scope, maintained by a bigger team
+
--- a/dist/markdown7.min.css
+++ b/dist/markdown7.min.css
@ -0,0 +1,2 @@
+/* Source: https://github.com/jasonm23/markdown-css-themes */
+body{font-family:Helvetica,arial,sans-serif;font-size:14px;line-height:1.6;padding-top:10px;padding-bottom:10px;background-color:#fff;padding:30px}body>:first-child{margin-top:0!important}body>:last-child{margin-bottom:0!important}a{color:#4183c4}a.absent{color:#c00}a.anchor{display:block;padding-left:30px;margin-left:-30px;cursor:pointer;position:absolute;top:0;left:0;bottom:0}h1,h2,h3,h4,h5,h6{margin:20px 0 10px;padding:0;font-weight:700;-webkit-font-smoothing:antialiased;cursor:text;position:relative}h1:hover a.anchor,h2:hover a.anchor,h3:hover a.anchor,h4:hover a.anchor,h5:hover a.anchor,h6:hover a.anchor{text-decoration:none}h1 code,h1 tt{font-size:inherit}h2 code,h2 tt{font-size:inherit}h3 code,h3 tt{font-size:inherit}h4 code,h4 tt{font-size:inherit}h5 code,h5 tt{font-size:inherit}h6 code,h6 tt{font-size:inherit}h1{font-size:28px;color:#000}h2{font-size:24px;border-bottom:1px solid #ccc;color:#000}h3{font-size:18px}h4{font-size:16px}h5{font-size:14px}h6{color:#777;font-size:14px}blockquote,dl,li,ol,p,pre,table,ul{margin:15px 0}hr{border:0 none;color:#ccc;height:4px;padding:0}body>h2:first-child{margin-top:0;padding-top:0}body>h1:first-child{margin-top:0;padding-top:0}body>h1:first-child+h2{margin-top:0;padding-top:0}body>h3:first-child,body>h4:first-child,body>h5:first-child,body>h6:first-child{margin-top:0;padding-top:0}a:first-child h1,a:first-child h2,a:first-child h3,a:first-child h4,a:first-child h5,a:first-child h6{margin-top:0;padding-top:0}h1 p,h2 p,h3 p,h4 p,h5 p,h6 p{margin-top:0}li p.first{display:inline-block}li{margin:0}ol,ul{padding-left:30px}ol :first-child,ul :first-child{margin-top:0}dl{padding:0}dl dt{font-size:14px;font-weight:700;font-style:italic;padding:0;margin:15px 0 5px}dl dt:first-child{padding:0}dl dt>:first-child{margin-top:0}dl dt>:last-child{margin-bottom:0}dl dd{margin:0 0 15px;padding:0 15px}dl dd>:first-child{margin-top:0}dl dd>:last-child{margin-bottom:0}blockquote{border-left:4px solid #ddd;padding:0 15px;color:#777}blockquote>:first-child{margin-top:0}blockquote>:last-child{margin-bottom:0}table{padding:0;border-collapse:collapse}table tr{border-top:1px solid #ccc;background-color:#fff;margin:0;padding:0}table tr:nth-child(2n){background-color:#f8f8f8}table tr th{font-weight:700;border:1px solid #ccc;margin:0;padding:6px 13px}table tr td{border:1px solid #ccc;margin:0;padding:6px 13px}table tr td :first-child,table tr th :first-child{margin-top:0}table tr td :last-child,table tr th :last-child{margin-bottom:0}img{max-width:100%}span.frame{display:block;overflow:hidden}span.frame>span{border:1px solid #ddd;display:block;float:left;overflow:hidden;margin:13px 0 0;padding:7px;width:auto}span.frame span img{display:block;float:left}span.frame span span{clear:both;color:#333;display:block;padding:5px 0 0}span.align-center{display:block;overflow:hidden;clear:both}span.align-center>span{display:block;overflow:hidden;margin:13px auto 0;text-align:center}span.align-center span img{margin:0 auto;text-align:center}span.align-right{display:block;overflow:hidden;clear:both}span.align-right>span{display:block;overflow:hidden;margin:13px 0 0;text-align:right}span.align-right span img{margin:0;text-align:right}span.float-left{display:block;margin-right:13px;overflow:hidden;float:left}span.float-left span{margin:13px 0 0}span.float-right{display:block;margin-left:13px;overflow:hidden;float:right}span.float-right>span{display:block;overflow:hidden;margin:13px auto 0;text-align:right}code,tt{margin:0 2px;padding:0 5px;white-space:nowrap;border:1px solid #eaeaea;background-color:#f8f8f8;border-radius:3px}pre code{margin:0;padding:0;white-space:pre;border:none;background:0 0}.highlight pre{background-color:#f8f8f8;border:1px solid #ccc;font-size:13px;line-height:19px;overflow:auto;padding:6px 10px;border-radius:3px}pre{background-color:#f8f8f8;border:1px solid #ccc;font-size:13px;line-height:19px;overflow:auto;padding:6px 10px;border-radius:3px}pre code,pre tt{background-color:transparent;border:none}sup{font-size:.83em;vertical-align:super;line-height:0}*{-webkit-print-color-adjust:exact}@media screen and (min-width:914px){body{width:854px;margin:0 auto}}@media print{pre,table{page-break-inside:avoid}pre{word-wrap:break-word}}
--- a/eulaurarien.sh
+++ b/eulaurarien.sh
@ -2,21 +2,13 @@

 # Main script for eulaurarien

-# Get all subdomains accessed by each website in the website list
-cat websites.list | ./collect_subdomains.py > subdomains.list
-sort -u subdomains.list > subdomains.sorted.list
+[ ! -f .env ] && touch .env

-# Filter out the subdomains not pointing to a first-party tracker
-cat subdomains.sorted.list | ./filter_subdomains.py > toblock.list
-sort -u toblock.list > toblock.sorted.list
+./fetch_resources.sh
+./collect_subdomains.sh
+./import_rules.sh
+./resolve_subdomains.sh
+./prune.sh
+./export_lists.sh
+./generate_index.py

-# Format the blocklist so it can be used as a hostlist
-
-(
-    echo "# First party trackers"
-    echo "# List generated on $(date -Isec) by eulaurarian $(git describe --tags --dirty)"
-    cat toblock.sorted.list | while read host;
-    do
-        echo "0.0.0.0 $host"
-    done
-) > toblock.hosts.list
--- a/export.py
+++ b/export.py
@ -0,0 +1,91 @@
+#!/usr/bin/env python3
+
+import database
+import argparse
+import sys
+
+
+if __name__ == "__main__":
+
+    # Parsing arguments
+    parser = argparse.ArgumentParser(
+        description="Export the hostnames rules stored " "in the Database as plain text"
+    )
+    parser.add_argument(
+        "-o",
+        "--output",
+        type=argparse.FileType("w"),
+        default=sys.stdout,
+        help="Output file, one rule per line",
+    )
+    parser.add_argument(
+        "-f",
+        "--first-party",
+        action="store_true",
+        help="Only output rules issued from first-party sources",
+    )
+    parser.add_argument(
+        "-e",
+        "--end-chain",
+        action="store_true",
+        help="Only output rules that are not referenced by any other",
+    )
+    parser.add_argument(
+        "-r",
+        "--rules",
+        action="store_true",
+        help="Output all kinds of rules, not just hostnames",
+    )
+    parser.add_argument(
+        "-b",
+        "--base-rules",
+        action="store_true",
+        help="Output base rules "
+        "(the ones added by ./feed_rules.py) "
+        "(implies --rules)",
+    )
+    parser.add_argument(
+        "-d",
+        "--no-dupplicates",
+        action="store_true",
+        help="Do not output rules that already match a zone/network rule "
+        "(e.g. dummy.example.com when there's a zone example.com rule)",
+    )
+    parser.add_argument(
+        "-x",
+        "--explain",
+        action="store_true",
+        help="Show the chain of rules leading to one "
+        "(and the number of references they have)",
+    )
+    parser.add_argument(
+        "-c",
+        "--count",
+        action="store_true",
+        help="Show the number of rules per type instead of listing them",
+    )
+    args = parser.parse_args()
+
+    DB = database.Database()
+
+    if args.count:
+        assert not args.explain
+        print(
+            DB.count_records(
+                first_party_only=args.first_party,
+                end_chain_only=args.end_chain,
+                no_dupplicates=args.no_dupplicates,
+                rules_only=args.base_rules,
+                hostnames_only=not (args.rules or args.base_rules),
+            )
+        )
+    else:
+        for domain in DB.list_records(
+            first_party_only=args.first_party,
+            end_chain_only=args.end_chain,
+            no_dupplicates=args.no_dupplicates,
+            rules_only=args.base_rules,
+            hostnames_only=not (args.rules or args.base_rules),
+            explain=args.explain,
+        ):
+            print(domain, file=args.output)
--- a/export_lists.sh
+++ b/export_lists.sh
@ -0,0 +1,98 @@
+#!/usr/bin/env bash
+
+function log() {
+    echo -e "\033[33m$@\033[0m"
+}
+
+log "Calculating statistics…"
+oldest="$(cat last_updates/*.txt | sort -n | head -1)"
+oldest_date=$(date -Isec -d @$oldest)
+gen_date=$(date -Isec)
+gen_software=$(git describe --tags)
+number_websites=$(wc -l < temp/all_websites.list)
+number_subdomains=$(wc -l < temp/all_subdomains.list)
+number_dns=$(grep 'NOERROR' temp/all_resolved.txt | wc -l)
+
+for partyness in {first,multi}
+do
+    if [ $partyness = "first" ]
+    then
+        partyness_flags="--first-party"
+    else
+        partyness_flags=""
+    fi
+
+    rules_input=$(./export.py --count --base-rules $partyness_flags)
+    rules_found=$(./export.py --count --rules $partyness_flags)
+    rules_found_nd=$(./export.py --count --rules --no-dupplicates $partyness_flags)
+
+    echo
+    echo "Statistics for ${partyness}-party trackers"
+    echo "Input rules: $rules_input"
+    echo "Subsequent rules: $rules_found"
+    echo "Subsequent rules (no dupplicate): $rules_found_nd"
+    echo "Output hostnames: $(./export.py --count $partyness_flags)"
+    echo "Output hostnames (no dupplicate): $(./export.py --count --no-dupplicates $partyness_flags)"
+    echo "Output hostnames (end-chain only): $(./export.py --count --end-chain $partyness_flags)"
+    echo "Output hostnames (no dupplicate, end-chain only): $(./export.py --count --no-dupplicates --end-chain $partyness_flags)"
+
+    for trackerness in {trackers,only-trackers}
+    do
+        if [ $trackerness = "trackers" ]
+        then
+            trackerness_flags=""
+        else
+            trackerness_flags="--no-dupplicates"
+        fi
+        file_list="dist/${partyness}party-${trackerness}.txt"
+        file_host="dist/${partyness}party-${trackerness}-hosts.txt"
+
+        log "Generating lists for variant ${partyness}-party ${trackerness}…"
+
+        # Real export heeere
+        ./export.py $partyness_flags $trackerness_flags > $file_list
+        # Sometimes a bit heavy to have the DB open and sort the output
+        # so this is done in two steps
+        sort -u $file_list -o $file_list
+
+        rules_output=$(./export.py --count $partyness_flags $trackerness_flags)
+
+        (
+            echo "# First-party trackers host list"
+            echo "# Variant: ${partyness}-party ${trackerness}"
+            echo "#"
+            echo "# About first-party trackers: https://hostfiles.frogeye.fr/#whats-a-first-party-tracker"
+            echo "#"
+            echo "# In case of false positives/negatives, or any other question,"
+            echo "# contact me the way you like: https://geoffrey.frogeye.fr"
+            echo "#"
+            echo "# Latest versions and variants: https://hostfiles.frogeye.fr/#list-variants"
+            echo "# Source code: https://git.frogeye.fr/geoffrey/eulaurarien"
+            echo "# License: https://git.frogeye.fr/geoffrey/eulaurarien/src/branch/master/LICENSE"
+            echo "# Acknowledgements: https://hostfiles.frogeye.fr/#acknowledgements"
+            echo "#"
+            echo "# Generation software: eulaurarien $gen_software"
+            echo "# List generation date: $gen_date"
+            echo "# Oldest record: $oldest_date"
+            echo "# Number of source websites: $number_websites"
+            echo "# Number of source subdomains: $number_subdomains"
+            echo "# Number of source DNS records: $number_dns"
+            echo "#"
+            echo "# Input rules: $rules_input"
+            echo "# Subsequent rules: $rules_found"
+            echo "# … no dupplicates: $rules_found_nd"
+            echo "# Output rules: $rules_output"
+            echo "#"
+            echo
+            sed 's|^|0.0.0.0 |' "$file_list"
+        ) > "$file_host"
+
+    done
+done
+
+if [ -d explanations ]
+then
+    filename="$(date -Isec).txt"
+    ./export.py --explain > "explanations/$filename"
+    ln --force --symbolic "$filename" "explanations/latest.txt"
+fi
--- a/feed_asn.py
+++ b/feed_asn.py
@ -0,0 +1,68 @@
+#!/usr/bin/env python3
+
+import database
+import argparse
+import requests
+import typing
+import ipaddress
+import logging
+import time
+
+IPNetwork = typing.Union[ipaddress.IPv4Network, ipaddress.IPv6Network]
+
+
+def get_ranges(asn: str) -> typing.Iterable[str]:
+    req = requests.get(
+        "https://stat.ripe.net/data/as-routing-consistency/data.json",
+        params={"resource": asn},
+    )
+    data = req.json()
+    for pref in data["data"]["prefixes"]:
+        yield pref["prefix"]
+
+
+def get_name(asn: str) -> str:
+    req = requests.get(
+        "https://stat.ripe.net/data/as-overview/data.json", params={"resource": asn}
+    )
+    data = req.json()
+    return data["data"]["holder"]
+
+
+if __name__ == "__main__":
+
+    log = logging.getLogger("feed_asn")
+
+    # Parsing arguments
+    parser = argparse.ArgumentParser(
+        description="Add the IP ranges associated to the AS in the database"
+    )
+    args = parser.parse_args()
+
+    DB = database.Database()
+
+    def add_ranges(
+        path: database.Path,
+        match: database.Match,
+    ) -> None:
+        assert isinstance(path, database.AsnPath)
+        assert isinstance(match, database.AsnNode)
+        asn_str = database.Database.unpack_asn(path)
+        DB.enter_step("asn_get_name")
+        name = get_name(asn_str)
+        match.name = name
+        DB.enter_step("asn_get_ranges")
+        for prefix in get_ranges(asn_str):
+            parsed_prefix: IPNetwork = ipaddress.ip_network(prefix)
+            if parsed_prefix.version == 4:
+                DB.set_ip4network(prefix, source=path, updated=int(time.time()))
+                log.info("Added %s from %s (%s)", prefix, path, name)
+            elif parsed_prefix.version == 6:
+                log.warning("Unimplemented prefix version: %s", prefix)
+            else:
+                log.error("Unknown prefix version: %s", prefix)
+
+    for _ in DB.exec_each_asn(add_ranges):
+        pass
+
+    DB.save()
--- a/feed_dns.py
+++ b/feed_dns.py
@ -0,0 +1,251 @@
+#!/usr/bin/env python3
+
+import argparse
+import database
+import logging
+import sys
+import typing
+import multiprocessing
+import time
+
+Record = typing.Tuple[typing.Callable, typing.Callable, int, str, str]
+
+# select, write
+FUNCTION_MAP: typing.Any = {
+    "a": (
+        database.Database.get_ip4,
+        database.Database.set_hostname,
+    ),
+    "cname": (
+        database.Database.get_domain,
+        database.Database.set_hostname,
+    ),
+    "ptr": (
+        database.Database.get_domain,
+        database.Database.set_ip4address,
+    ),
+}
+
+
+class Writer(multiprocessing.Process):
+    def __init__(
+        self,
+        recs_queue: multiprocessing.Queue = None,
+        autosave_interval: int = 0,
+        ip4_cache: int = 0,
+    ):
+        if recs_queue:  # MP
+            super(Writer, self).__init__()
+            self.recs_queue = recs_queue
+        self.log = logging.getLogger("wr")
+        self.autosave_interval = autosave_interval
+        self.ip4_cache = ip4_cache
+        if not recs_queue:  # No MP
+            self.open_db()
+
+    def open_db(self) -> None:
+        self.db = database.Database()
+        self.db.log = logging.getLogger("wr")
+        self.db.fill_ip4cache(max_size=self.ip4_cache)
+
+    def exec_record(self, record: Record) -> None:
+        self.db.enter_step("exec_record")
+        select, write, updated, name, value = record
+        try:
+            for source in select(self.db, value):
+                write(self.db, name, updated, source=source)
+        except (ValueError, IndexError):
+            # ValueError: non-number in IP
+            # IndexError: IP too big
+            self.log.exception("Cannot execute: %s", record)
+
+    def end(self) -> None:
+        self.db.enter_step("end")
+        self.db.save()
+
+    def run(self) -> None:
+        self.open_db()
+        if self.autosave_interval > 0:
+            next_save = time.time() + self.autosave_interval
+        else:
+            next_save = 0
+
+        self.db.enter_step("block_wait")
+        block: typing.List[Record]
+        for block in iter(self.recs_queue.get, None):
+
+            assert block
+            record: Record
+            for record in block:
+                self.exec_record(record)
+
+            if next_save > 0 and time.time() > next_save:
+                self.log.info("Saving database...")
+                self.db.save()
+                self.log.info("Done!")
+                next_save = time.time() + self.autosave_interval
+
+            self.db.enter_step("block_wait")
+        self.end()
+
+
+class Parser:
+    def __init__(
+        self,
+        buf: typing.Any,
+        recs_queue: multiprocessing.Queue = None,
+        block_size: int = 0,
+        writer: Writer = None,
+    ):
+        assert bool(writer) ^ bool(block_size and recs_queue)
+        self.buf = buf
+        self.log = logging.getLogger("pr")
+        self.recs_queue = recs_queue
+        if writer:  # No MP
+            self.prof: database.Profiler = writer.db
+            self.register = writer.exec_record
+        else:  # MP
+            self.block: typing.List[Record] = list()
+            self.block_size = block_size
+            self.prof = database.Profiler()
+            self.prof.log = logging.getLogger("pr")
+            self.register = self.add_to_queue
+
+    def add_to_queue(self, record: Record) -> None:
+        self.prof.enter_step("register")
+        self.block.append(record)
+        if len(self.block) >= self.block_size:
+            self.prof.enter_step("put_block")
+            assert self.recs_queue
+            self.recs_queue.put(self.block)
+            self.block = list()
+
+    def run(self) -> None:
+        self.consume()
+        if self.recs_queue:
+            self.recs_queue.put(self.block)
+        self.prof.profile()
+
+    def consume(self) -> None:
+        raise NotImplementedError
+
+
+class MassDnsParser(Parser):
+    # massdns --output Snrql
+    # --retry REFUSED,SERVFAIL --resolvers nameservers-ipv4
+    TYPES = {
+        "A": (FUNCTION_MAP["a"][0], FUNCTION_MAP["a"][1], -1, None),
+        # 'AAAA': (FUNCTION_MAP['aaaa'][0], FUNCTION_MAP['aaaa'][1], -1, None),
+        "CNAME": (FUNCTION_MAP["cname"][0], FUNCTION_MAP["cname"][1], -1, -1),
+    }
+
+    def consume(self) -> None:
+        self.prof.enter_step("parse_massdns")
+        timestamp = 0
+        header = True
+        for line in self.buf:
+            line = line[:-1]
+            if not line:
+                header = True
+                continue
+
+            split = line.split(" ")
+            try:
+                if header:
+                    timestamp = int(split[1])
+                    header = False
+                else:
+                    select, write, name_offset, value_offset = MassDnsParser.TYPES[
+                        split[1]
+                    ]
+                    record = (
+                        select,
+                        write,
+                        timestamp,
+                        split[0][:name_offset].lower(),
+                        split[2][:value_offset].lower(),
+                    )
+                    self.register(record)
+                    self.prof.enter_step("parse_massdns")
+            except KeyError:
+                continue
+
+
+PARSERS = {
+    "massdns": MassDnsParser,
+}
+
+if __name__ == "__main__":
+
+    # Parsing arguments
+    log = logging.getLogger("feed_dns")
+    args_parser = argparse.ArgumentParser(
+        description="Read DNS records and import "
+        "tracking-relevant data into the database"
+    )
+    args_parser.add_argument("parser", choices=PARSERS.keys(), help="Input format")
+    args_parser.add_argument(
+        "-i",
+        "--input",
+        type=argparse.FileType("r"),
+        default=sys.stdin,
+        help="Input file",
+    )
+    args_parser.add_argument(
+        "-b", "--block-size", type=int, default=1024, help="Performance tuning value"
+    )
+    args_parser.add_argument(
+        "-q", "--queue-size", type=int, default=128, help="Performance tuning value"
+    )
+    args_parser.add_argument(
+        "-a",
+        "--autosave-interval",
+        type=int,
+        default=900,
+        help="Interval to which the database will save in seconds. " "0 to disable.",
+    )
+    args_parser.add_argument(
+        "-s",
+        "--single-process",
+        action="store_true",
+        help="Only use one process. " "Might be useful for single core computers.",
+    )
+    args_parser.add_argument(
+        "-4",
+        "--ip4-cache",
+        type=int,
+        default=0,
+        help="RAM cache for faster IPv4 lookup. "
+        "Maximum useful value: 512 MiB (536870912). "
+        "Warning: Depending on the rules, this might already "
+        "be a memory-heavy process, even without the cache.",
+    )
+    args = args_parser.parse_args()
+
+    parser_cls = PARSERS[args.parser]
+    if args.single_process:
+        writer = Writer(
+            autosave_interval=args.autosave_interval, ip4_cache=args.ip4_cache
+        )
+        parser = parser_cls(args.input, writer=writer)
+        parser.run()
+        writer.end()
+    else:
+        recs_queue: multiprocessing.Queue = multiprocessing.Queue(
+            maxsize=args.queue_size
+        )
+
+        writer = Writer(
+            recs_queue,
+            autosave_interval=args.autosave_interval,
+            ip4_cache=args.ip4_cache,
+        )
+        writer.start()
+
+        parser = parser_cls(
+            args.input, recs_queue=recs_queue, block_size=args.block_size
+        )
+        parser.run()
+
+        recs_queue.put(None)
+        writer.join()
--- a/feed_rules.py
+++ b/feed_rules.py
@ -0,0 +1,61 @@
+#!/usr/bin/env python3
+
+import database
+import argparse
+import sys
+import time
+import typing
+
+FUNCTION_MAP = {
+    "zone": database.Database.set_zone,
+    "hostname": database.Database.set_hostname,
+    "asn": database.Database.set_asn,
+    "ip4network": database.Database.set_ip4network,
+    "ip4address": database.Database.set_ip4address,
+}
+
+if __name__ == "__main__":
+
+    # Parsing arguments
+    parser = argparse.ArgumentParser(description="Import base rules to the database")
+    parser.add_argument(
+        "type", choices=FUNCTION_MAP.keys(), help="Type of rule inputed"
+    )
+    parser.add_argument(
+        "-i",
+        "--input",
+        type=argparse.FileType("r"),
+        default=sys.stdin,
+        help="File with one rule per line",
+    )
+    parser.add_argument(
+        "-f",
+        "--first-party",
+        action="store_true",
+        help="The input only comes from verified first-party sources",
+    )
+    args = parser.parse_args()
+
+    DB = database.Database()
+
+    fun = FUNCTION_MAP[args.type]
+
+    source: database.RulePath
+    if args.first_party:
+        source = database.RuleFirstPath()
+    else:
+        source = database.RuleMultiPath()
+
+    for rule in args.input:
+        rule = rule.strip()
+        try:
+            fun(
+                DB,
+                rule,
+                source=source,
+                updated=int(time.time()),
+            )
+        except ValueError:
+            DB.log.error(f"Could not add rule: {rule}")
+
+    DB.save()
--- a/fetch_resources.sh
+++ b/fetch_resources.sh
@ -0,0 +1,45 @@
+#!/usr/bin/env bash
+
+function log() {
+    echo -e "\033[33m$@\033[0m"
+}
+
+function dl() {
+    echo "Downloading $1 to $2…"
+    curl --silent "$1" > "$2"
+    if [ $? -ne 0 ]
+    then
+        echo "Failed!"
+    fi
+}
+
+log "Retrieving tests…"
+rm -f tests/*.cache.csv
+dl https://raw.githubusercontent.com/fukuda-lab/cname_cloaking/master/Subdomain_CNAME-cloaking-based-tracking.csv temp/fukuda.csv
+(echo "url,allow,deny,comment"; tail -n +2 temp/fukuda.csv | awk -F, '{ print "https://" $2 "/,," $3 "," $5 }') > tests/fukuda.cache.csv
+
+log "Retrieving rules…"
+rm -f rules*/*.cache.*
+dl https://easylist.to/easylist/easyprivacy.txt rules_adblock/easyprivacy.cache.txt
+dl https://filters.adtidy.org/extension/chromium/filters/3.txt rules_adblock/adguard.cache.txt
+
+log "Retrieving TLD list…"
+dl http://data.iana.org/TLD/tlds-alpha-by-domain.txt temp/all_tld.temp.list
+grep -v '^#' temp/all_tld.temp.list | awk '{print tolower($0)}' > temp/all_tld.list
+
+log "Retrieving nameservers…"
+dl https://public-dns.info/nameservers.txt nameservers/public-dns.cache.list
+
+log "Retrieving top subdomains…"
+dl http://s3-us-west-1.amazonaws.com/umbrella-static/top-1m.csv.zip top-1m.csv.zip
+unzip top-1m.csv.zip
+sed 's|^[0-9]\+,||' top-1m.csv > temp/cisco-umbrella_popularity.fresh.list
+rm top-1m.csv top-1m.csv.zip
+if [ -f subdomains/cisco-umbrella_popularity.cache.list ]
+then
+    cp subdomains/cisco-umbrella_popularity.cache.list temp/cisco-umbrella_popularity.old.list
+    pv -f temp/cisco-umbrella_popularity.old.list temp/cisco-umbrella_popularity.fresh.list | sort -u > subdomains/cisco-umbrella_popularity.cache.list
+    rm temp/cisco-umbrella_popularity.old.list temp/cisco-umbrella_popularity.fresh.list
+else
+    mv temp/cisco-umbrella_popularity.fresh.list subdomains/cisco-umbrella_popularity.cache.list
+fi
--- a/filter_subdomains.py
+++ b/filter_subdomains.py
@ -1,35 +0,0 @@
-#!/usr/bin/env python3
-
-"""
-From a list of subdomains, output only
-the ones resolving to a first-party tracker.
-"""
-
-import re
-import sys
-
-import dns.resolver
-
-import regexes
-
-
-def is_subdomain_matching(subdomain: str) -> bool:
-    """
-    Indicates if the subdomain redirects to a first-party tracker.
-    """
-    # TODO Look at the whole chain rather than the last one
-    query = dns.resolver.query(subdomain, 'A')
-    canonical = query.canonical_name.to_text()
-    for regex in regexes.REGEXES:
-        if re.match(regex, canonical):
-            return True
-    return False
-
-
-if __name__ == '__main__':
-    for line in sys.stdin:
-        line = line.strip()
-        if not line:
-            continue
-        if is_subdomain_matching(line):
-            print(line)
--- a/generate_index.py
+++ b/generate_index.py
@ -0,0 +1,25 @@
+#!/usr/bin/env python3
+
+import markdown2
+
+extras = ["header-ids"]
+
+with open("dist/README.md", "r") as fdesc:
+    body = markdown2.markdown(fdesc.read(), extras=extras)
+
+output = f"""<!DOCTYPE html>
+<html lang="en">
+<head>
+<title>Geoffrey Frogeye's block list of first-party trackers</title>
+<meta charset="utf-8">
+<meta name="author" content="Geoffrey 'Frogeye' Preud'homme" />
+<link rel="stylesheet" type="text/css" href="markdown7.min.css">
+</head>
+<body>
+{body}
+</body>
+</html>
+"""
+
+with open("dist/index.html", "w") as fdesc:
+    fdesc.write(output)
--- a/import_rules.sh
+++ b/import_rules.sh
@ -0,0 +1,20 @@
+#!/usr/bin/env bash
+
+function log() {
+    echo -e "\033[33m$@\033[0m"
+}
+
+log "Importing rules…"
+date +%s > "last_updates/rules.txt"
+cat rules_adblock/*.txt | grep -v '^!' | grep -v '^\[Adblock' | ./adblock_to_domain_list.py | ./feed_rules.py zone
+cat rules_hosts/*.txt | grep -v '^#' | grep -v '^$' | cut -d ' ' -f2 | ./feed_rules.py zone
+cat rules/*.list | grep -v '^#' | grep -v '^$' | ./feed_rules.py zone
+cat rules_ip/*.txt | grep -v '^#' | grep -v '^$' | ./feed_rules.py ip4network
+cat rules_asn/*.txt | grep -v '^#' | grep -v '^$' | ./feed_rules.py asn
+
+cat rules/first-party.list | grep -v '^#' | grep -v '^$' | ./feed_rules.py zone --first-party
+cat rules_ip/first-party.txt | grep -v '^#' | grep -v '^$' | ./feed_rules.py ip4network --first-party
+cat rules_asn/first-party.txt | grep -v '^#' | grep -v '^$' | ./feed_rules.py asn --first-party
+
+./feed_asn.py
+
--- a/last_updates/.gitignore
+++ b/last_updates/.gitignore
@ -0,0 +1 @@
+*.txt
--- a/nameservers/.gitignore
+++ b/nameservers/.gitignore
@ -0,0 +1,2 @@
+*.custom.list
+*.cache.list
--- a/nameservers/popular.list
+++ b/nameservers/popular.list
@ -0,0 +1,24 @@
+8.8.8.8
+8.8.4.4
+2001:4860:4860:0:0:0:0:8888
+2001:4860:4860:0:0:0:0:8844
+208.67.222.222
+208.67.220.220
+2620:119:35::35
+2620:119:53::53
+4.2.2.1
+4.2.2.2
+8.26.56.26
+8.20.247.20
+84.200.69.80
+84.200.70.40
+2001:1608:10:25:0:0:1c04:b12f
+2001:1608:10:25:0:0:9249:d69b
+9.9.9.10
+149.112.112.10
+2620:fe::10
+2620:fe::fe:10
+1.1.1.1
+1.0.0.1
+2606:4700:4700::1111
+2606:4700:4700::1001
--- a/prune.sh
+++ b/prune.sh
@ -0,0 +1,9 @@
+#!/usr/bin/env bash
+
+function log() {
+    echo -e "\033[33m$@\033[0m"
+}
+
+oldest="$(cat last_updates/*.txt | sort -n | head -1)"
+log "Pruning every record before ${oldest}…"
+./db.py --prune --prune-before "$oldest"
--- a/regexes.py
+++ b/regexes.py
@ -1,9 +0,0 @@
-#!/usr/bin/env python3
-
-"""
-List of regex matching first-party trackers.
-"""
-
-REGEXES = [
-    r'^.+\.eulerian\.net\.$'
-]
--- a/requirements.txt
+++ b/requirements.txt
@ -0,0 +1,4 @@
+coloredlogs>=10
+markdown2>=2.4<3
+numpy>=1.21<2
+python-abp>=0.2<0.3
--- a/resolve_subdomains.sh
+++ b/resolve_subdomains.sh
@ -0,0 +1,24 @@
+#!/usr/bin/env bash
+
+source .env.default
+source .env
+
+function log() {
+    echo -e "\033[33m$@\033[0m"
+}
+
+log "Compiling nameservers…"
+pv -f nameservers/*.list | ./validate_list.py --ip4 | sort -u > temp/all_nameservers_ip4.list
+
+log "Compiling subdomains…"
+# Sort by last character to utilize the DNS server caching mechanism
+# (not as efficient with massdns but it's almost free so why not)
+pv -f subdomains/*.list | ./validate_list.py --domain | rev | sort -u | rev > temp/all_subdomains.list
+
+log "Resolving subdomain…"
+date +%s > "last_updates/massdns.txt"
+"$MASSDNS_BINARY" --output Snrql --hashmap-size "$MASSDNS_HASHMAP_SIZE" --resolvers temp/all_nameservers_ip4.list --outfile temp/all_resolved.txt temp/all_subdomains.list
+
+log "Importing into database…"
+[ $SINGLE_PROCESS -eq 1 ] && EXTRA_ARGS="--single-process"
+pv -f temp/all_resolved.txt | ./feed_dns.py massdns --ip4-cache "$CACHE_SIZE" $EXTRA_ARGS
--- a/rules/.gitignore
+++ b/rules/.gitignore
@ -0,0 +1,2 @@
+*.custom.list
+*.cache.list
--- a/rules/first-party.list
+++ b/rules/first-party.list
@ -0,0 +1,91 @@
+# Eulerian
+eulerian.net
+# Xiti (AT Internet)
+ati-host.net
+at-o.net
+# NP6
+bp01.net
+# Criteo
+criteo.com
+dnsdelegation.io
+storetail.io
+# Keyade
+keyade.com
+# Adobe Experience Cloud
+# https://experienceleague.adobe.com/docs/analytics/implementation/vars/config-vars/trackingserversecure.html?lang=en#ssl-tracking-server-in-adobe-experience-platform-launch
+omtrdc.net
+2o7.net
+data.adobedc.net
+sc.adobedc.net
+# Webtrekk
+wt-eu02.net
+webtrekk.net
+# Otto Group
+oghub.io
+# Intent Media
+partner.intentmedia.net
+# Wizaly
+wizaly.com
+# Commanders Act
+tagcommander.com
+# Ingenious Technologies
+affex.org
+# TraceDock
+a351fec2c318c11ea9b9b0a0ae18fb0b-1529426863.eu-central-1.elb.amazonaws.com
+a5e652663674a11e997c60ac8a4ec150-1684524385.eu-central-1.elb.amazonaws.com
+a88045584548111e997c60ac8a4ec150-1610510072.eu-central-1.elb.amazonaws.com
+afc4d9aa2a91d11e997c60ac8a4ec150-2082092489.eu-central-1.elb.amazonaws.com
+# A8
+trck.a8.net
+# AD EBiS
+# https://prtimes.jp/main/html/rd/p/000000215.000009812.html
+ebis.ne.jp
+# GENIEE
+genieesspv.jp
+# SP-Prod
+sp-prod.net
+# Act-On Software
+actonsoftware.com
+actonservice.com
+# eum-appdynamics.com
+eum-appdynamics.com
+# Extole
+extole.io
+extole.com
+# Eloqua
+hs.eloqua.com
+# segment.com
+xid.segment.com
+# exponea.com
+exponea.com
+# adclear.net
+adclear.net
+# contentsfeed.com
+contentsfeed.com
+# postaffiliatepro.com
+postaffiliatepro.com
+# Sugar Market (Salesfusion)
+msgapp.com
+# Exactag
+exactag.com
+# GMO Internet Group
+ad-cloud.jp
+# Pardot
+pardot.com
+# Fathom
+# https://usefathom.com/docs/settings/custom-domains
+starman.fathomdns.com
+# Lead Forensics
+# https://www.reddit.com/r/pihole/comments/g7qv3e/leadforensics_tracking_domains_blacklist/
+# No real-world data but the website doesn't hide what it does
+ghochv3eng.trafficmanager.net
+# Branch.io
+thirdparty.bnc.lt
+# Plausible.io
+custom.plausible.io
+# DataUnlocker
+# Bit different as it is a proxy to non first-party trackers scripts
+# but it fits I guess.
+smartproxy.dataunlocker.com
+# SAS
+ci360.sas.com
--- a/rules_adblock/.gitignore
+++ b/rules_adblock/.gitignore
@ -0,0 +1,2 @@
+*.custom.txt
+*.cache.txt
--- a/rules_asn/.gitignore
+++ b/rules_asn/.gitignore
@ -0,0 +1,2 @@
+*.custom.txt
+*.cache.txt
--- a/rules_asn/first-party.txt
+++ b/rules_asn/first-party.txt
@ -0,0 +1,10 @@
+# Eulerian
+AS50234
+# Criteo
+AS44788
+AS19750
+AS55569
+# Webtrekk
+AS60164
+# Act-On Software
+AS393648
--- a/rules_hosts/.gitignore
+++ b/rules_hosts/.gitignore
@ -0,0 +1,2 @@
+*.custom.txt
+*.cache.txt
--- a/rules_ip/.gitignore
+++ b/rules_ip/.gitignore
@ -0,0 +1,2 @@
+*.custom.txt
+*.cache.txt
--- a/run_tests.py
+++ b/run_tests.py
@ -0,0 +1,75 @@
+#!/usr/bin/env python3
+
+import database
+import os
+import logging
+import csv
+
+TESTS_DIR = "tests"
+
+if __name__ == "__main__":
+
+    DB = database.Database()
+    log = logging.getLogger("tests")
+
+    for filename in os.listdir(TESTS_DIR):
+        if not filename.lower().endswith(".csv"):
+            continue
+        log.info("")
+        log.info("Running tests from %s", filename)
+        path = os.path.join(TESTS_DIR, filename)
+        with open(path, "rt") as fdesc:
+            count_ent = 0
+            count_all = 0
+            count_den = 0
+            pass_ent = 0
+            pass_all = 0
+            pass_den = 0
+            reader = csv.DictReader(fdesc)
+            for test in reader:
+                log.debug("Testing %s (%s)", test["url"], test["comment"])
+                count_ent += 1
+                passed = True
+
+                for allow in test["allow"].split(":"):
+                    if not allow:
+                        continue
+                    count_all += 1
+                    if any(DB.get_domain(allow)):
+                        log.error("False positive: %s", allow)
+                        passed = False
+                    else:
+                        pass_all += 1
+
+                for deny in test["deny"].split(":"):
+                    if not deny:
+                        continue
+                    count_den += 1
+                    if not any(DB.get_domain(deny)):
+                        log.error("False negative: %s", deny)
+                        passed = False
+                    else:
+                        pass_den += 1
+
+                if passed:
+                    pass_ent += 1
+            perc_ent = (100 * pass_ent / count_ent) if count_ent else 100
+            perc_all = (100 * pass_all / count_all) if count_all else 100
+            perc_den = (100 * pass_den / count_den) if count_den else 100
+            log.info(
+                (
+                    "%s: Entries %d/%d (%.2f%%)"
+                    " | Allow %d/%d (%.2f%%)"
+                    "| Deny %d/%d (%.2f%%)"
+                ),
+                filename,
+                pass_ent,
+                count_ent,
+                perc_ent,
+                pass_all,
+                count_all,
+                perc_all,
+                pass_den,
+                count_den,
+                perc_den,
+            )
--- a/subdomains/.gitignore
+++ b/subdomains/.gitignore
@ -0,0 +1,2 @@
+*.custom.list
+*.cache.list
--- a/temp/.gitignore
+++ b/temp/.gitignore
@ -0,0 +1,3 @@
+*.list
+*.txt
+*.csv
--- a/tests/.gitignore
+++ b/tests/.gitignore
@ -0,0 +1 @@
+*.cache.csv
--- a/tests/false-positives.csv
+++ b/tests/false-positives.csv
@ -0,0 +1,6 @@
+url,allow,deny,comment
+https://support.apple.com,support.apple.com,,EdgeKey / AkamaiEdge
+https://www.pinterest.fr/,i.pinimg.com,,Cedexis
+https://www.tumblr.com/,66.media.tumblr.com,,ChiCDN
+https://www.skype.com/fr/,www.skype.com,,TrafficManager
+https://www.mitsubishicars.com/,www.mitsubishicars.com,,Tracking domain as reverse DNS
--- a/tests/first-party.csv
+++ b/tests/first-party.csv
@ -0,0 +1,28 @@
+url,allow,deny,comment
+https://www.red-by-sfr.fr/,static.s-sfr.fr,nrg.red-by-sfr.fr,Eulerian
+https://www.cbc.ca/,,smetrics.cbc.ca,2o7 | Ominuture | Adobe Experience Cloud
+https://www.mytoys.de/,,web.mytoys.de,Webtrekk
+https://www.baur.de/,,tp.baur.de,Otto Group
+https://www.liligo.com/,,compare.liligo.com,???
+https://www.boulanger.com/,,tag.boulanger.fr,TagCommander
+https://www.airfrance.fr/FR/,,tk.airfrance.fr,Wizaly
+https://www.vsgamers.es/,,marketing.net.vsgamers.es,Affex
+https://www.vacansoleil.fr/,,tdep.vacansoleil.fr,TraceDock
+https://www.ozmall.co.jp/,,js.enhance.co.jp,GENIEE
+https://www.thetimes.co.uk/,,cmp.thetimes.co.uk,SP-Prod
+https://agilent.com/,,seahorseinfo.agilent.com,Act-On Software
+https://halifax.co.uk/,,cem.halifax.co.uk,eum-appdynamics.com
+https://www.reallygoodstuff.com/,,refer.reallygoodstuff.com,Extole
+https://unity.com/,,eloqua-trackings.unity.com,Eloqua
+https://www.notino.gr/,,api.campaigns.notino.com,Exponea
+https://www.mytoys.de/,,0815.mytoys.de.adclear.net
+https://www.imbc.com/,,ads.imbc.com.contentsfeed.com
+https://www.cbdbiocare.com/,,affiliate.cbdbiocare.com,postaffiliatepro.com
+https://www.seatadvisor.com/,,marketing.seatadvisor.com,Sugar Market (Salesfusion)
+https://www.tchibo.de/,,tagm.tchibo.de,Exactag
+https://www.bouygues-immobilier.com/,,go.bouygues-immobilier.fr,Pardot
+https://caddyserver.com/,,mule.caddysever.com,Fathom
+Reddit.com mail notifications,,click.redditmail.com,Branch.io
+https://www.phpliveregex.com/,,yolo.phpliveregex.xom,Plausible.io
+https://www.earthclassmail.com/,,1avhg3kanx9.www.earthclassmail.com,DataUnlocker
+https://paulfredrick.com/,,execution-ci360.paulfredrick.com,SAS
--- a/validate_list.py
+++ b/validate_list.py
@ -0,0 +1,35 @@
+#!/usr/bin/env python3
+# pylint: disable=C0103
+
+"""
+Filter out invalid domain names
+"""
+
+import database
+import argparse
+import sys
+
+if __name__ == '__main__':
+
+    # Parsing arguments
+    parser = argparse.ArgumentParser(
+        description="Filter out invalid domain name/ip addresses from a list.")
+    parser.add_argument(
+        '-i', '--input', type=argparse.FileType('r'), default=sys.stdin,
+        help="Input file, one element per line")
+    parser.add_argument(
+        '-o', '--output', type=argparse.FileType('w'), default=sys.stdout,
+        help="Output file, one element per line")
+    parser.add_argument(
+        '-d', '--domain', action='store_true',
+        help="Can be domain name")
+    parser.add_argument(
+        '-4', '--ip4', action='store_true',
+        help="Can be IP4")
+    args = parser.parse_args()
+
+    for line in args.input:
+        line = line[:-1].lower()
+        if (args.domain and database.Database.validate_domain(line)) or \
+                (args.ip4 and database.Database.validate_ip4address(line)):
+            print(line, file=args.output)
--- a/websites.list
+++ b/websites.list
@ -1,52 +0,0 @@
-https://oui.sncf/
-https://www.voyage-prive.com/
-https://www.odalys-vacances.com/
-https://www.homair.com/
-https://www.melia.com/
-https://www.locasun.fr/
-https://www.belambra.fr/
-http://www.xl.com/
-https://www.bordeaux.aeroport.fr/
-https://www.easyvoyage.com/
-https://www.leon-de-bruxelles.fr/
-https://www.sarenza.com/
-https://www.laredoute.fr/
-https://www.galerieslafayette.com/
-https://www.celio.com/
-https://vente-unique.com/
-https://www.francoisesaget.com/
-https://www.histoiredor.com/
-https://www.brandalley.fr/
-https://www.fleurancenature.fr/
-https://www.chausport.com/
-https://www.i-run.fr/
-https://fr.smallable.com/
-https://www.habitat.fr/
-https://www.bhv.fr/
-https://www.sfr.fr/
-https://www.red-by-sfr.fr/
-https://www.masmovil.es/
-https://www.yoigo.com/
-http://www.fnacdarty.com/
-https://www.fnac.com/
-https://www.darty.com/
-http://www.e-leclerc.com/
-https://www.monoprix.fr/
-https://www.officedepot.fr/
-https://www.carrefour-banque.fr/
-https://www.banque-casino.fr/
-https://mondial-assistance.fr/
-https://allianz-voyage.fr/
-https://www.bankia.com/
-https://www.april-moto.com/
-https://www.younited-credit.com/
-https://www.fortuneo.fr/
-https://www.orpi.com/
-https://www.warnerbros.fr/
-https://www.canalplus.com/
-https://www.skiset.com/
-https://www.promofarma.com/
-https://www.toner.fr/
-https://www.rentacar.fr/
-https://vivatechnology.com/
-https://www.liberation.fr/
--- a/websites/.gitignore
+++ b/websites/.gitignore
@ -0,0 +1 @@
+*.custom.list
--- a/websites/adobe-experience-cloud_clients.list
+++ b/websites/adobe-experience-cloud_clients.list
@ -0,0 +1 @@
+https://www.ubs.com/
--- a/websites/akamai_clients.list
+++ b/websites/akamai_clients.list
@ -0,0 +1,75 @@
+http://ao.com/
+https://www.asus.com/
+http://www.absolut.com/
+http://www.adobe.com/
+http://www.afterbuzztv.com/
+http://www.airbnb.com/
+http://www.alliantcreditunion.org/
+http://www.ankama-games.com/
+http://www.attraqt.com/
+http://www.audi.com/
+http://www.autotrader.com/
+http://www.bangkokbank.com/
+http://www.banzai.it/
+http://www.bestbuy.com/
+http://www.bigfishgames.com/
+http://www.bostonscientific.com/
+http://www.radio-canada.ca/
+https://www.cashflows.com/
+http://www.concur.com/
+http://www.chinesecio.com/
+http://corporate.crownmedia.com/
+https://watch.dazn.com/
+http://www.disa.mil/
+https://www.douglas.de/
+http://www.ets.org/
+http://www.easy-forex.com/
+http://www.fiat.com/
+http://www.fidor.com/
+http://www.frankandoak.com/
+http://www.fubo.tv/
+https://corp.gree.net/
+https://www.gymgrossisten.com/
+http://www.halfpricedrapes.com/
+https://www.hotstar.com/
+https://www.iqiyi.com/
+http://www.iracing.com/
+http://www.mallgroup.com/
+https://www.investisdigital.com/
+https://www.linenchest.com/
+https://www.luisaviaroma.com/
+https://www.mcnc.org/
+http://www.mauijim.com/
+https://www.mediacorp.sg/
+http://www.cr.mufg.jp/
+http://www.nbcolympics.com/
+https://www.ndtv.com/
+http://www.nrcs.usda.gov/
+http://www.oshean.org/
+https://www.ocado.com/
+http://www.ottogroup.com/
+https://watch.dazn.com/
+http://www.philips.com/
+http://www.printplanet.de/
+http://www.rabobank.com/
+https://corp.roblox.com/
+http://www.sinet.com.kh/
+http://www.schneider.de/
+https://thewest.com.au/
+https://www.shopdirect.com/
+http://www.siemens.com/
+http://www.sky.it/
+https://www.sc.com/
+http://www.stylesha.re/
+http://www.tv2.dk/
+http://www.grammy.org/
+https://www.topcon.co.jp/
+http://www.usnews.com/
+http://www.ubisoft.com/
+http://www.unionbankph.com/
+http://www.urbn.com/
+http://www.waters.com/
+https://www.xero.com/
+https://www.esky.com/
+https://www.iheartmedia.com/
+
--- a/websites/at-internet_clients.list
+++ b/websites/at-internet_clients.list
@ -0,0 +1,90 @@
+https://www.rte.ie/
+https://www.bbc.com/
+https://www.saint-gobain.com/
+https://www.sbb.ch/
+http://www.rfi.fr/
+https://www.france24.com/
+https://www.mc-doualiya.com/
+https://www.francemediasmonde.com/
+https://www.kmmediagroup.co.uk/
+https://www.europages.fr/
+https://www.ovh.com/
+http://www.sa.areva.com/
+https://www.orano.group/
+https://www.evaluate.com/
+https://www.laposte.fr/
+https://www.colissimo.fr/
+https://www.nrjmobile.fr/
+https://www.parisaeroport.fr/
+https://www.michelin.fr/
+https://www.groupeseb.com/
+https://www.seb.fr/
+https://www.corkinternationalairporthotel.com/
+https://www.donedeal.ie/
+https://rmc.bfmtv.com/
+https://rmcsport.bfmtv.com/
+https://www.mma.fr/
+http://banquepopulaire.fr/
+https://www.printempsfrance.com/
+https://www.pagesjaunes.fr/
+https://www.nocibe.fr/
+https://e24.no/
+https://www.01net.com/
+https://www.europe1.fr/
+https://www.meilleurtaux.com/
+https://www.nexity.fr/
+https://www.bestwestern.com/content/
+https://www.allsuites-apparthotel.com/
+https://www.apec.fr/
+https://www.cadremploi.fr/
+https://www.eni.com/
+https://mappy.com/
+https://www.arte.tv/
+https://conseil-constitutionnel.fr/
+https://www.lcl.fr/
+https://www.axa.fr/
+https://www.huffpost.com/
+https://www.challenges.fr/
+https://www.netto.fr/
+https://www.boursorama-banque.com/
+https://www.marianne.net/
+https://www.mediapart.fr/
+https://www.tifco.com/
+https://www.thalys.com/
+https://schibsted.com/
+https://www.se.com/
+https://www.gouvernement.fr/
+https://www.afm-telethon.fr/
+https://www.pneus-online.fr/
+https://www.lepoint.fr/
+http://www.e-leclerc.com/
+https://www.logic-immo.com/
+https://www.longchamp.com/
+https://www.maaf.fr/
+https://www.futuroscope.com/
+https://www.infojobs.net/
+https://www.intermarche.com/
+https://www.supercasino.fr/
+https://www.chronopost.fr/
+https://www.cic.fr/
+https://www.courrierinternational.com/
+https://www.credit-agricole.fr/
+https://www.telekom.com/
+https://www.bfmtv.com/
+https://www.caisse-epargne.fr/
+https://www.calor.fr/
+https://www.groupebayard.com/fr/
+https://www.bayard-jeunesse.com/
+https://www.radiofrance.fr/
+https://www.liberation.fr/
+https://www.nrj.fr/
+https://www.lemonde.fr/
+https://www.societegenerale.fr/
+https://www.pole-emploi.fr/accueil/
+https://www.tf1.fr/
+https://www.leboncoin.fr/
+https://groupebpce.com/
+https://www.france.tv/
+https://www.total.com/
+http://www.lagardere.com/
+https://rakuten.com/
--- a/websites/criteo_clients.list
+++ b/websites/criteo_clients.list
@ -0,0 +1,82 @@
+http://www.dholic.co.jp/
+https://materialesdefabrica.com/
+https://www.lecreuset.com/
+https://www.intersport.fr/
+https://www.feiradamadrugadasp.com.br/
+https://www.wetteronline.de/
+https://www.wolfandbadger.com/
+https://www.readers.com/
+https://www.fossil.com/
+https://www.gemo.fr/
+https://www.burda-forward.de/
+https://www.bakeca.it/
+https://www.sarenza.com/
+https://www.mytoys.com/
+https://tour2000.co.kr
+https://theluxurycloset.com/
+https://www.lovebonito.com/
+https://www.bever.nl/
+https://www.shipt.com/
+https://www.petermanningnyc.com/
+https://www.fashionvalet.com/
+https://remixshop.com/
+https://lagirl.co.kr/
+https://www.avva.com.tr/
+https://www.stella.nl/
+https://www.maiutazas.hu/
+http://www.dynacraftwheels.com/
+https://www.itaka.pl/
+https://www.inveon.com.tr/
+https://www.dr.com.tr/
+http://www.lfmall.co.kr/
+https://www.beymen.com/
+https://www.reebok.com/
+https://www.mlmparts.com/
+https://www.flyin.com/
+https://www.garantibbva.com.tr/
+http://www.fiat.com.tr/
+https://warburtons.co.uk/
+http://www.shark.com/
+https://www.latam.com/
+https://agilone.com/
+https://www.clarks.co.uk/
+https://www.joom.com/
+https://www.adjust.com/
+https://www.tugo.com.vn/
+https://www.tatacliq.com/
+https://www.valmano.de/
+https://www.ab-inbev.com/
+https://www.sephora.com/
+https://www.sephora.fr/
+https://www.officedepot.com/
+http://www.officedepot.eu/
+https://www.officedepot.fr/
+https://www.journey.com.tr/
+https://group.jumia.com/
+https://www.jumia.com.ng/
+http://us.vibram.com/
+http://eu.vibram.com/
+https://sssports.com/
+https://www.theiconic.com.au/
+https://spiegel.media/
+https://www.halfpricedrapes.com/
+https://striderbikes.com/
+https://www.promod.fr/
+https://www.philips.com/
+https://www.hp.com/
+https://www.edmunds.com/
+https://www.kkfashion.vn/
+https://www.newlook.com/
+https://www.fragrancenet.com/
+https://www.microsoft.com/
+https://xbox.com/
+https://www.nykaa.com/
+https://www.cheapoair.com/
+https://www.diageo.com/
+https://trimfit.com/
+https://www.vax.co.uk/
+https://www.laredoute.fr/
+https://www.newlook.com/
+https://www.softsurroundings.com/
+https://www.ebay.fr/
+
--- a/websites/ea_data.publicwww-free.list
+++ b/websites/ea_data.publicwww-free.list
@ -0,0 +1,76 @@
+https://www.liberation.fr/
+https://www.brandalley.fr/
+https://www.greenweez.com/
+https://www.melijoe.com/eu/
+http://www.laforet.com/
+https://www.younited-credit.com/
+https://www.mathon.fr/
+https://destinia.com/
+https://www.habitat.fr/
+https://www.vente-unique.com/
+https://www.deguisetoi.fr/
+https://www.voyage-prive.it/login/index
+https://www.madeindesign.com/
+https://www.nrjmobile.fr/
+https://en.smallable.com/
+https://www.voyage-prive.es/login/index
+https://www.voyage-prive.de/login/index
+https://www.histoiredor.com/fr/histoire-or
+https://www.maeva.com/fr-fr/
+https://www.voyage-prive.co.uk/login/index
+https://www.aujourdhui.com/
+https://www.loisirsencheres.com/
+https://www.consobaby.com/
+https://www.rentacar.fr/
+https://www.ugap.fr/
+https://www.ponant.com/
+https://www.voyage-prive.ch/login/index
+https://www.auchantelecom.fr/
+https://www.toner.fr/
+https://fr.vente-unique.ch/
+https://www.iahorro.com/
+https://www.vente-unique.it/
+https://www.millet.fr/
+https://www.venta-unica.com/
+https://www.photobox.de/
+https://www.futuroscope.com/
+https://warnerbros.fr/
+https://destinia.ir/
+https://www.vegaoo.de/
+https://www.fleurancenature.fr/
+https://www.palladiumhotelgroup.com/en/
+https://www.dcrussia.ru/
+https://www.homair.com/
+https://www.moonpig.com.au/
+https://www.casden.fr/
+https://www.madeindesign.co.uk/
+https://www.voyage-prive.be/login/index
+https://www.vegaoo.es/
+https://destinia.co.uk/
+https://www.hofmann.pt/
+https://www.roxy-russia.ru/
+https://www.francoisesaget.com/fr/
+https://www.skiset.com/
+https://www.millet-mountain.com/
+https://www.chausport.com/
+https://www.unclejeans.com/
+https://www.vegaooparty.com/
+https://www.madeindesign.de/
+https://www.vegaoo.nl/
+https://www.boulangerie.org/
+https://www.habitat.eu/
+https://www.habitat.net/
+https://www.lafrancedunordausud.fr/
+https://www.lesnouvellesdelaboulangerie.fr/
+https://www.natiloo.com/
+https://wecanimal.pt/
+https://www.habitatstore.no/no/
+https://fr.vente-unique.be/
+https://www.madeindesign.it/
+https://piensoymascotas.com/
+https://destinia.be/
+https://www.skiset.co.uk/
+http://www.sarenza.ch/
+https://www.habitat.de/
+https://www.skiset.de/
+https://destinia.com.br/
--- a/websites/eulerian_clients.list
+++ b/websites/eulerian_clients.list
@ -0,0 +1,545 @@
+https://01net.com/
+https://1001neumaticos.es/
+https://acadomia.fr/
+https://access-moto.com/
+https://achatdesign.com/
+https://achatdesign.com/
+https://achat-or.com/
+https://achat-or-et-argent.fr/
+https://admyjob.com/
+https://adviso.ca/
+https://aegon.es/
+https://aeroplan.com/
+https://aireuropa.com/
+https://allianz-voyage.fr/
+https://allrun.fr/
+https://april-moto.com/
+https://armandthiery.fr/
+https://asapparis.com/
+https://assurance-sante.com/
+https://assurances-france-loisirs.com/
+https://assurances-titulaires.com/
+https://assurandme.fr/
+https://assuronline.com/
+https://assuronline.com/
+https://auchantelecom.fr/
+https://audi.fr/
+https://audifrance.fr/
+https://audika.com/
+https://aureya.com/
+https://avatacar.com/
+https://ayads.co/
+https://bankia.es/
+https://bcassurance.fr/
+https://bcfinance.fr/
+https://bebloom.com/
+https://beinsports.com/
+https://belambra.com/
+https://belambra.co.uk/
+https://bernardtapie.com/
+https://bfmtv.com/
+https://bforbank.com/
+https://blesscollectionhotels.com/
+https://bongo.be/
+https://bongo.nl/
+https://brandalley.be/
+https://brandalleybyme.fr/
+https://brandalley.co.nl/
+https://brandalley.de/
+https://brandalley.es/
+https://brandalley.it/
+https://brookeo.fr/
+https://caci-online.fr/
+https://campagne-audition.fr/
+https://campagnes-france.com/
+https://capifrance.fr/
+https://carrefour-banque.fr/
+https://carrefour.com/
+https://carrefour.fr/
+https://cartecarburant.leclerc/
+https://carventura.com/
+https://catimini.com/
+https://celio.com/
+https://chausport.com/
+https://ciblo.net/
+https://citadium.com/
+https://clubavantages.net/
+https://coffrefortplus.com/
+https://communaute3suisses.fr/
+https://comprendrechoisir.com/
+https://compteczam.fr/
+https://comptoirdescotonniers.com/
+https://comptoirdescotonniers.co.uk/
+https://comptoirdescotonniers.de/
+https://comptoirdescotonniers.es/
+https://comptoirdescotonniers.eu/
+https://conforama.es/
+https://conforama.pt/
+https://corporate.com/
+https://corsair.ca/
+https://corsair.ci/
+https://corsair.fr/
+https://corsair.gp/
+https://corsair.mq/
+https://corsair.re/
+https://corsair.sn/
+https://cossettetourisme.com/
+https://cpa-france.org/
+https://creditec.fr/
+https://credithypo.com/
+https://credit-pret-hypothecaire.com/
+https://crossnutrition.com/
+https://culture.leclerc/
+https://darty.com/
+https://dcshoes-europe.com/
+https://deguisetoi.fr/
+https://destinia.ad/
+https://destinia.ae/
+https://destinia.asia/
+https://destinia.at/
+https://destinia.be/
+https://destinia.cat/
+https://destinia.ch/
+https://destinia.cl/
+https://destinia.cn/
+https://destinia.co/
+https://destinia.co.il/
+https://destinia.com/
+https://destinia.com.ar/
+https://destinia.com.au/
+https://destinia.com.br/
+https://destinia.com.eg/
+https://destinia.com.pa/
+https://destinia.com.tr/
+https://destinia.com.ua/
+https://destinia.co.no/
+https://destinia.co.ro/
+https://destinia.co.uk/
+https://destinia.co.za/
+https://destinia.cr/
+https://destinia.cz/
+https://destinia.de/
+https://destinia.dk/
+https://destinia.do/
+https://destinia.ec/
+https://destinia.fr/
+https://destinia.gr/
+https://destinia.gt/
+https://destinia.hu/
+https://destinia.ie/
+https://destinia.in/
+https://destinia.ir/
+https://destinia.is/
+https://destinia.it/
+https://destinia.jp/
+https://destinia.kr/
+https://destinia.lt/
+https://destinia.lv/
+https://destinia.ly/
+https://destinia.ma/
+https://destinia.mx/
+https://destinia.nl/
+https://destinia.pe/
+https://destinia.pl/
+https://destinia.pt/
+https://destinia.qa/
+https://destinia.ru/
+https://destinia.sa/
+https://destinia.se/
+https://destinia.sg/
+https://destinia.sk/
+https://destinia.tw/
+https://destinia.us/
+https://destinia.uy/
+https://devialet.com/
+https://devred.com/
+https://diamant-unique.com/
+https://dmp.leclerc/
+https://doctipedia.fr/
+https://drust.io/
+https://eafit.com/
+https://easyviaggio.com/
+https://easyviajar.com/
+https://easyvols.fr/
+https://easyvoyage.com/
+https://easyvoyage.com/
+https://easyvoyage.co.uk/
+https://easyvoyage.de/
+https://e-cartecadeauleclerc.fr/
+https://ecotour.com/
+https://eider.com/
+https://eidershop.com/
+https://eldeseazo.com/
+https://e.leclerc/
+https://e-leclerc.com/
+https://elstarprevention.com/
+https://emalu-store.com/
+https://etam.com/
+https://etam.de/
+https://etam.es/
+https://eulerian.net/
+https://eurotierce.be/
+https://evaway.com/
+https://evobanco.com/
+https://ew3.io/
+https://fax-via-internet.it/
+https://fdj.fr/
+https://fleurancenature.com/
+https://fleurancenature.fr/
+https://fnac.com/
+https://fnac.es/
+https://fnacspectacles.com/
+https://fnactickets.com/
+https://fonestarz.com/
+https://fortuneo.fr/
+https://franceloisirsvacances.com/
+https://francoisesaget.be/
+https://francoisesaget.be/
+https://francoisesaget.com/
+https://franziskasager.de/
+https://franziskasager.de/
+https://futuroscope.com/
+https://futuroscope.mobi/
+https://galerieslafayette.com/
+https://gestion-assurances.com/
+https://granions.fr/
+https://grantalexander.com/
+https://greenweez.com/
+https://greenweez.co.uk/
+https://greenweez.de/
+https://greenweez.es/
+https://greenweez.eu/
+https://greenweez.it/
+https://groupefsc.com/
+https://habitat.de/
+https://habitat.fr/
+https://habitat.net/
+https://hardrockhoteltenerife.com/
+https://hipp.fr/
+https://histoiredor.com/
+https://hofmann.es/
+https://hofmann.pt/
+https://holidaycheck.fr/
+https://homair.com/
+https://hoteldeparismontecarlo.com/
+https://hotelhermitagemontecarlo.com/
+https://hotelsbarriere.com/
+https://hrhibiza.com/
+https://iahorro.com/
+https://io1g.net/
+https://iperceptions.com/
+https://iperceptions.com/
+https://iperceptions.com/
+https://iperceptions.com/
+https://i-run.fr/
+https://jassuremamoto.fr/
+https://jassure-ma-voiture-sans-permis.fr/
+https://jassuremon3roues.fr/
+https://jassuremonauto.fr/
+https://jassuremon-camping-car.fr/
+https://jassuremonscooter.fr/
+https://kauf-unique.at/
+https://kauf-unique.de/
+https://kidiliz.com/
+https://lafrancedunordausud.fr/
+https://lafuma-boutique.com/
+https://lafuma.com/
+https://laredoute.fr/
+https://laredoute.pt/
+https://lavieimmo.com/
+https://leclercbilletterie.com/
+https://leclercdrive.fr/
+https://leclercvoyages.com/
+https://lenergiemoinscher.com/
+https://leon-de-bruxelles.fr/
+https://leregroupementdecredits.fr/
+https://lesbonscommerces.fr/
+https://lesbonsservices.fr/
+https://leskidunordausud.fr/
+https://lespagnedunordausud.fr/
+https://lesselectionsskoda.fr/
+https://lexpress.fr/
+https://liberation.fr/
+https://locasun.co.uk/
+https://locasun.de/
+https://locasun.es/
+https://locasun.fr/
+https://locasun.it/
+https://locasun.nl/
+https://locasun-vp.fr/
+https://location.e-leclerc.com/
+https://location.leclerc/
+https://lotoquebec.com/
+https://lotoquebec.com/
+https://macave.leclerc/
+https://madeindesign.ch/
+https://madeindesign.com/
+https://madeindesign.co.uk/
+https://madeindesign.de/
+https://madeindesign.it/
+https://maeva.com/
+https://magnetintell.com/
+https://maisonetloisirs.leclerc/
+https://masmovil.com/
+https://masmovil.es/
+https://matby.com/
+https://mathon.fr/
+https://megustaescribir.com/
+https://megustaleer.com/
+https://megustaleer.com.co/
+https://megustaleer.com.pe/
+https://melia.cn/
+https://melia.com/
+https://melijoe.com/
+https://michelin.co.uk/
+https://michelin.de/
+https://michelin.es/
+https://michelin.fr/
+https://michelin.nl/
+https://miliboo.be/
+https://miliboo.ch/
+https://miliboo.com/
+https://miliboo.co.uk/
+https://miliboo.de/
+https://miliboo.es/
+https://miliboo.it/
+https://miliboo.lu/
+https://millet.fr/
+https://millet-mountain.ch/
+https://millet-mountain.com/
+https://millet-mountain.de/
+https://miropapremama.es/
+https://mistergatesdirect.com/
+https://mistermenuiserie.com/
+https://mis.tourisme-/
+https://mixa.fr/
+https://molet.com/
+https://monalbumphoto.be/
+https://monalbumphoto.fr/
+https://mondial-assistance.fr/
+https://monmedicament-enligne.fr/
+https://monnierfreres.com/
+https://monnierfreres.com/
+https://monnierfreres.co.uk/
+https://monnierfreres.de/
+https://monnierfreres.eu/
+https://monnierfreres.fr/
+https://monnierfreres.it/
+https://montecarloadvancepurchase.com/
+https://montecarlobay.com/
+https://monte-carlo-beach.com/
+https://montecarlomeeting.com/
+https://montecarlosbm-/
+https://montecarlosbm.book-secure.com/
+https://montecarlosbm.com/
+https://montecarloseasonalsale.com/
+https://montecarlovirtualtour.com/
+https://montecarlowellness.com/
+https://monteleone.fr/
+https://montreal.org/
+https://moonpig.com/
+https://motorisationplus.com/
+https://mouvement-leclerc.com/
+https://mtl.org/
+https://muchoviaje.com/
+https://multimedia.e-leclerc.com/
+https://musique.e-leclerc.com/
+https://mydailyhotel.com/
+https://myfirstdressing.com/
+https://mywarner.warnerbros.fr/
+https://natiloo.com/
+https://net/
+https://netvox-assurances.com/
+https://nextseguros.es/
+https://nomade-aventure.com/
+https://no.photobox.com/
+https://nrjmobile.fr/
+https://numericable.fr/
+https://numericable.fr/
+https://numericable.tv/
+https://odalys-vacances.com/
+https://odalys-vacation-rental.com/
+https://officedepot.fr/
+https://oki-ni.com/
+https://onestep-boutique.com/
+https://onestep.fr/
+https://oney.es/
+https://online.carrefour.fr/
+https://ooreka.fr/
+https://ooshop.com/
+https://optique.e-leclerc.com/
+https://orpi.com/
+https://oui.sncf/
+https://outdoor4pro.com/
+https://oxboworld.com/
+https://oxbowshop.com/
+https://palladiumhotelgroup.com/
+https://parapharmacie.leclerc/
+https://parapharmacie.leclerc/
+https://parfumsclub.de/
+https://partenaires-verisure.fr/
+https://pcnphysio.com/
+https://peachdi.com/
+https://pepephone.com/
+https://perfumesclub.com/
+https://perfumesclub.co.uk/
+https://perfumesclub.fr/
+https://perfumesclub.it/
+https://perfumesclub.nl/
+https://perfumesclub.pl/
+https://perfumesclub.pt/
+https://petit-bateau.be/
+https://petit-bateau.co.uk/
+https://petit-bateau.de/
+https://petit-bateau.fr/
+https://petit-bateau.it/
+https://peugeot-assurance.fr/
+https://photobox.at/
+https://photobox.be/
+https://photobox.ca/
+https://photobox.ch/
+https://photobox.com.au/
+https://photobox.co.nz/
+https://photobox.co.uk/
+https://photobox.de/
+https://photobox.dk/
+https://photobox.es/
+https://photobox.fi/
+https://photobox.fr/
+https://photobox.ie/
+https://photobox.it/
+https://photobox.nl/
+https://photobox.pl/
+https://photobox.se/
+https://photomoinscher.leclerc/
+https://piensoymascotas.com/
+https://placedestendances.com/
+https://placement-direct.fr/
+https://pmubrasil.com.br/
+https://pmu.fr/
+https://pmu.fr/
+https://poeleaboismaison.com/
+https://ponant.com/
+https://pretunique.fr/
+https://primes-energie.leclerc/
+https://princessetamtam.com/
+https://princessetamtam.co.uk/
+https://princessetamtam.de/
+https://princessetamtam.eu/
+https://privateoutlet.com/
+https://privateoutlet.de/
+https://privateoutlet.es/
+https://privateoutlet.fr/
+https://privateoutlet.it/
+https://produits-volumineux.e-leclerc.com/
+https://promocionesfarma.com/
+https://promofarma.com/
+https://promovacances.com/
+https://quiksilver.eu/
+https://rachatdecredit.net/
+https://radiateurplus.com/
+https://rc.monalbumphoto.be/
+https://rc.monalbumphoto.fr/
+https://recherche.leclerc/
+https://reglotv.e-leclerc.com/
+https://rentacar.fr/
+https://reunica.com/
+https://roxy.eu/
+https://rueducommerce.fr/
+https://sadyr.es/
+https://scooter-assurance.fr/
+https://securitasdirect.fr/
+https://seevibes.com/
+https://sfr.fr/
+https://silvergoldtobuy.com/
+https://sisley-paris.com/
+https://skiset-holidays.com/
+https://skiset-holidays.co.uk/
+https://skodafabia.fr/
+https://skoda.fr/
+https://skodasuperb.fr/
+https://smallable.com/
+https://sncf.com/
+https://sport.leclerc/
+https://sport.leclerc/
+https://swatch.com/
+https://swisslife-direct.fr/
+https://tartine-et-chocolat.com/
+https://tartine-et-chocolat.fr/
+https://teamekosport.com/
+https://telecommandeonline.com/
+https://terrassesmontecarlosbm.com/
+https://theushuaiaexperience.com/
+https://to-lipton.com/
+https://tongalumina.ca/
+https://tool-fitness.com/
+https://tool-fitness.es/
+https://topsante.com/
+https://toscane-boutique.fr/
+https://tradingsat.com/
+https://tremblant.ca/
+https://vegaoo.com/
+https://vegaoo.co.uk/
+https://vegaoo.de/
+https://vegaoo.es/
+https://vegaoo.it/
+https://vegaoo.nl/
+https://vegaooparty.com/
+https://vegaoopro.com/
+https://vegaoo.pt/
+https://venta-del-diablo.com/
+https://venta-unica.com/
+https://ventealapropriete.com/
+https://vente-du-diable.com/
+https://vente-en-or.com/
+https://vente-unique.be/
+https://vente-unique.ch/
+https://vente-unique.com/
+https://vente-unique.it/
+https://vente-unique.lu/
+https://vente-unique.nl/
+https://vente-unique.pt/
+https://verif.com/
+https://verisure.fr/
+https://vin.e-leclerc.com/
+https://vip-jardin.com/
+https://vivus.es/
+https://voyage-prive.be/
+https://voyage-prive.ch/
+https://voyage-prive.com/
+https://voyage-prive.co.uk/
+https://voyage-prive.co.uk/
+https://voyage-prive.de/
+https://voyage-prive.es/
+https://voyage-prive.es/
+https://voyage-prive.it/
+https://voyage-prive.it/
+https://voyage-prive.nl/
+https://voyage-prive.nl/
+https://voyage-prive.pl/
+https://voyages-sncf.com/
+https://warnerbros.fr/
+https://warnerbros.fr/
+https://weareknitters.ch/
+https://weareknitters.com/
+https://weareknitters.co.uk/
+https://weareknitters.de/
+https://weareknitters.dk/
+https://weareknitters.es/
+https://weareknitters.fr/
+https://weareknitters.it/
+https://weareknitters.nl/
+https://weareknitters.no/
+https://weareknitters.pl/
+https://weareknitters.se/
+https://wecanimal.pt/
+https://yoigo.com/
+https://yoigo.es/
+https://younited-credit.com/
+https://zanzicar.fr/
+https://zebestof.com/
+https://z-enfant.com/
+https://z-eshop.com/
+https://zgeneration.com/
+https://zive.fr/
+https://zone-turf.fr/
--- a/websites/eulerian_manual.list
+++ b/websites/eulerian_manual.list
@ -0,0 +1 @@
+https://red-by-sfr.fr
--- a/websites/keyade_clients.list
+++ b/websites/keyade_clients.list
@ -0,0 +1,20 @@
+https://www.allianz.fr/
+http://www.belambra.fr/
+https://www.macif.fr/
+https://www.butagaz.fr/
+http://www.cartier.fr/
+https://www.isilines.fr/
+http://www.jaeger-lecoultre.com/
+http://www.laredoute.fr/
+https://www.lesfurets.com/
+https://www.louvrehotels.com/
+http://www.mars.com/
+https://www.meetic.fr/
+https://www.nikon.fr/
+https://www.norauto.fr/
+https://www.groupe-psa.com/
+https://www.rueducommerce.fr/
+https://www.transavia.com/
+https://www.truffaut.com/
+https://www.uniqlo.com/
+https://www.vancleefarpels.com/
--- a/websites/np6_clients.list
+++ b/websites/np6_clients.list
@ -0,0 +1,20 @@
+https://www.harmonie-mutuelle.fr/
+https://www.henkel.fr/
+https://www.canalplus.com/
+http://www.casino.fr/
+https://www.alinea.com/
+https://www.enedis.fr/
+https://www.ubisoft.com/
+https://perfectstaycom.zendesk.com/
+https://www.perfectstay.com/
+https://www.bricodepot.fr/
+https://www.sfr.fr/
+http://www.prismamedia.com/
+https://www.odalys-vacances.com/
+https://www.macif.fr/
+https://www.cofinoga.fr/
+https://www.boursorama-banque.com/
+https://mabanque.bnpparibas/
+https://www.oui.sncf/
+https://www.younited-credit.com/
+
Author	SHA1	Message	Date
Geoffrey Frogeye	3b6f7a58b3	Remove support for Rapid7 They changed their privacy / pricing model and as such I don't have access to their massive DNS dataset anymore, even after asking. Since 2022-01-02, I put the list on freeze while looking for an alternative, but couldn't find any. To make the list update again with the remaining DNS sources I have, I put the last version of the list generated with the Rapid7 dataset as an input for subdomains, that will now get resolved with MassDNS.	2022-11-13 20:10:27 +01:00
Geoffrey Frogeye	49a36f32f2	Add requirements.txt file	2022-02-26 13:01:11 +01:00
Geoffrey Frogeye	29cf72ae92	Fix most of the README being bold Why did I go with this Markdown generator again?	2021-08-28 20:58:34 +02:00
Geoffrey Frogeye	998c3faf8f	Add SAS.com	2021-08-22 18:02:37 +02:00
Geoffrey Frogeye	c8a14a4e21	Add DataUnlocker	2021-08-22 17:07:25 +02:00
Geoffrey Frogeye	1ec26e7f96	Add Plausible.io	2021-08-22 16:53:58 +02:00
Geoffrey Frogeye	5b49441bc0	Add Branch.io tracker	2021-08-22 16:37:31 +02:00
Geoffrey Frogeye	afd122f2ab	Update usage recommendations	2021-08-15 13:04:55 +02:00
Geoffrey Frogeye	6ae3d5fb55	Add Lead Forensics tracker	2021-08-15 11:39:37 +02:00
Geoffrey Frogeye	10a505d84f	Add Fathom	2021-08-15 11:18:35 +02:00
Geoffrey Frogeye	c06648da53	Added Pardot tracker	2021-08-15 11:06:53 +02:00
Geoffrey Frogeye	f165e5a094	Fix (most) mypy / flake8 errors	2021-08-14 23:35:51 +02:00
Geoffrey Frogeye	3dcccad39a	Black pass	2021-08-14 23:27:28 +02:00
Geoffrey Frogeye	a023dc8322	Fix deprecated np.bool	2021-08-14 23:21:03 +02:00
Geoffrey Frogeye	389e83d492	Fix database maximum cache size cap	2021-08-14 23:19:12 +02:00
Geoffrey Frogeye	edf444cc28	Add ad-cloud.jp and improve names of Japanese trackers Closes #19 Names from https://github.com/AdguardTeam/cname-trackers/issues/1	2021-08-14 22:55:58 +02:00
Geoffrey Frogeye	fa23d466d2	Actually remove ThreatMetrix Forgot -i when grepping	2021-08-14 21:55:44 +02:00
Geoffrey Frogeye	f5f9f88c42	Remove ThreatMetrix I received a lot of false positives for this one, and while I wasn't able to reproduce the issue in most of the cases, I trust the community. It's also not in any other CNAME tracker list, probably for the same reason. Plus, it's apparently not very nasty. So I'll let it go. Closes #17	2021-08-14 21:24:48 +02:00
Geoffrey Frogeye	2997e41f98	Investigated >0.5% trackers from Fukuda paper	2020-12-19 13:41:07 +01:00
Geoffrey Frogeye	6cf1028174	Added other tracking source for Adobe Found on the Adobe documentation and in the wild https://experienceleague.adobe.com/docs/analytics/implementation/vars/config-vars/trackingserversecure.html?lang=en#s.trackingserversecure-in-appmeasurement-and-launch-custom-code-editor	2020-12-19 13:15:38 +01:00
Geoffrey Frogeye	b98a37f9da	Add 1st chain Act-On To unclobber -only lists	2020-12-07 08:27:20 +01:00
Geoffrey Frogeye	8828d4cf24	Investigated >1% trackers from Fukuda paper	2020-12-07 00:03:58 +01:00
Geoffrey Frogeye	04205dd9fc	Add AdGuard in the distribution README	2020-12-06 23:18:27 +01:00
Geoffrey Frogeye	cec96b7e50	Add Fukuda & co research paper to test suite	2020-12-06 22:13:05 +01:00
Geoffrey Frogeye	eb1fcefd49	Use more correct terms	2020-12-06 21:29:48 +01:00
Geoffrey Frogeye	0ecb431728	Add AdGuard for multiparty	2020-12-06 21:01:24 +01:00
Geoffrey Frogeye	c1619b3cff	Add more sources and acknowledgement	2020-12-06 21:01:20 +01:00
Geoffrey Frogeye	2c0286e36b	Add genieesspv.jp CNAME tracker Closes #18	2020-08-22 10:46:43 +02:00
Geoffrey Frogeye	954bc86eaa	More Tracedock domains From https://gist.github.com/pietvanzoen/ed7b8322a552542bc00a83ced7332d33	2020-08-08 09:14:09 +02:00
Geoffrey Frogeye	b09f861c27	README: Added more reasons the browsers trust first party	2020-01-11 13:01:51 +01:00
Geoffrey Frogeye	9326dc6aca	Added similar projects	2020-01-11 11:43:14 +01:00
Geoffrey Frogeye	c803a714fa	I don't know how to write the word “explanation“...	2020-01-11 11:31:16 +01:00
Geoffrey Frogeye	b3a3219f93	Improved usage scenarios for different lists	2020-01-11 11:26:54 +01:00
Geoffrey Frogeye	fbc06f71bb	Added symlink to latest explaination	2020-01-07 14:37:01 +01:00
Geoffrey Frogeye	63ab7651fc	Disabled RDNS import due to #15	2020-01-07 14:17:38 +01:00
Geoffrey Frogeye	0724feed26	README: Removed help message and fixed category for finder	2020-01-06 16:44:45 +01:00
Geoffrey Frogeye	adb07417f5	Fixed import_rapid7 script typo	2020-01-05 22:35:12 +01:00
Geoffrey Frogeye	0cc18303fd	Re-import Rapid7 datasets when rules have been updated	2020-01-04 10:54:46 +01:00
Geoffrey Frogeye	708c53041e	Added two japanese trackers	2020-01-03 22:09:16 +01:00
Geoffrey Frogeye	808e36dde3	Improvements to subdomain collection I use this for tracker identification so it's not perfect but still it's a bit better.	2020-01-03 22:08:06 +01:00
Geoffrey Frogeye	2b97ee4cb9	Better list output	2019-12-27 21:46:57 +01:00
Geoffrey Frogeye	fd8bfee088	Improved -only variants descriptions	2019-12-27 15:58:20 +01:00
Geoffrey Frogeye	e93807142c	Explanations folder	2019-12-27 15:35:30 +01:00
Geoffrey Frogeye	a4a908955a	Added index webpage	2019-12-27 15:21:33 +01:00
Geoffrey Frogeye	7e06e98808	Added TraceDock FP tracker Thought they did change the URL of their load balancers, guess I was wrong.	2019-12-27 13:43:38 +01:00
Geoffrey Frogeye	4fca68c6f0	Fixed handling of unknown field error	2019-12-27 01:10:21 +01:00
Geoffrey Frogeye	54a9c78534	Handled another error	2019-12-26 20:38:35 +01:00
Geoffrey Frogeye	171fa93873	Force pv output Even if redirected to a file Allow to see progress when ran in a cron or something	2019-12-26 15:38:56 +01:00
Geoffrey Frogeye	095e51fad9	Ensure massdns output is lower case For some reason some server output part of their response as upper case. This fails the reading process as it's designed to only work on lower case for performance reasons.	2019-12-26 15:32:24 +01:00
Geoffrey Frogeye	883942ba55	Allow custom massdns path	2019-12-26 00:33:23 +01:00
Geoffrey Frogeye	d3b244f317	Forgot one dependency	2019-12-26 00:16:18 +01:00
Geoffrey Frogeye	018f6548ea	Fixed feed_dns not saving in single-threaded mode Would you believe it, seven hours of processing for nothing	2019-12-26 00:02:01 +01:00
Geoffrey Frogeye	0b9e2d0975	Validate also lower the case of domains	2019-12-25 15:31:20 +01:00
Geoffrey Frogeye	2bcf6cbbf7	Added SINGLE_PROCESS environment variable	2019-12-25 15:15:49 +01:00
Geoffrey Frogeye	b310ca2fc2	Clever pruning mechanism	2019-12-25 14:54:57 +01:00
Geoffrey Frogeye	bb9e6de62f	Profiling is now optional	2019-12-25 13:52:19 +01:00
Geoffrey Frogeye	c543e0eab6	Make multi-processing optional for feed_dns	2019-12-25 13:04:15 +01:00
Geoffrey Frogeye	195f41bd9f	Use smaller cache if it cannot allocate	2019-12-25 13:03:55 +01:00
Geoffrey Frogeye	0e7479e23e	Added handling for IPs too big	2019-12-25 12:35:06 +01:00
Geoffrey Frogeye	9f343ed296	Removed debug print	2019-12-24 15:12:38 +01:00
Geoffrey Frogeye	c65ae94892	Added ability to use Rapid7 API Closes #11	2019-12-24 15:08:18 +01:00
Geoffrey Frogeye	7d1c1a1d54	Implement pruning	2019-12-21 19:38:20 +01:00
Geoffrey Frogeye	1a6e64da3d	Forgot numpy dependency	2019-12-20 21:08:21 +01:00
Geoffrey Frogeye	d66040a7b6	Added some litterature Well not really litterature in the scientific term but still something to read	2019-12-20 18:22:15 +01:00
Geoffrey Frogeye	57e2919f25	Added information about CORS security issue	2019-12-20 17:58:53 +01:00
Geoffrey Frogeye	94acd106da	Acknwoledgments Gesundheit	2019-12-20 17:46:24 +01:00
Geoffrey Frogeye	885d92dd77	Added LICENSE	2019-12-20 17:38:26 +01:00
Geoffrey Frogeye	8b7e538677	Updated links (could not bother guessing them)	2019-12-20 17:24:05 +01:00
Geoffrey Frogeye	cd46b39756	Merge branch 'newworkflow'	2019-12-20 17:18:42 +01:00
Geoffrey Frogeye	38cf532854	Updated README Split in two actually (program and list). Closes #3 Also, Closes #1 Because I forgot to do it earlier.	2019-12-20 17:15:39 +01:00
Geoffrey Frogeye	53b14c6ffa	Removed TODO placeholders in commands description It's better than nothing but not by that much	2019-12-19 08:07:01 +01:00
Geoffrey Frogeye	c81be4825c	Automated tests Very rudimentary but should do the trick Closes #4	2019-12-18 22:46:00 +01:00
Geoffrey Frogeye	4a22054796	Added optional cache for faster IP matching	2019-12-18 21:40:24 +01:00
Geoffrey Frogeye	06b745890c	Added other first-party trackers	2019-12-18 17:03:05 +01:00
Geoffrey Frogeye	aca5023c3f	Fixed scripting around	2019-12-18 13:01:32 +01:00
Geoffrey Frogeye	dce35cb299	Harder verficiation before adding entries to DB	2019-12-17 19:53:05 +01:00
Geoffrey Frogeye	747fe46ad0	Script to automatically download from Rapid7 datasets	2019-12-17 15:04:19 +01:00
Geoffrey Frogeye	b43cb1725c	Autosave Not needed but since the import may take multiple hour I get frustrated if this gets interrupted for some reason.	2019-12-17 15:02:42 +01:00
Geoffrey Frogeye	f5c60c482a	Merge branch 'master' of git.frogeye.fr:geoffrey/eulaurarien	2019-12-17 14:28:38 +01:00
Geoffrey Frogeye	12ecfa1a5d	Added outdated documentation warning in README	2019-12-17 14:28:23 +01:00
Geoffrey Frogeye	e882e09b37	Added outdated documentation warning in README	2019-12-17 14:27:43 +01:00
Geoffrey Frogeye	d65107f849	Save dupplicates too Maybe I won't publish them but this will help me for tracking trackers.	2019-12-17 14:10:41 +01:00
Geoffrey Frogeye	ea0855bd00	Forgot to push this little guy Good thing I cleaned up my working directory. It only exists because pickles created from database.py itself won't be openable from a file simply importing databse.py. So we create it when in 'imported state'.	2019-12-17 13:50:39 +01:00
Geoffrey Frogeye	7851b038f5	Reworked rule export	2019-12-17 13:30:24 +01:00
Geoffrey Frogeye	8f6e01c857	Added first_party tracking Well, tracking if a rule is from a first or a multi rule... Hope I did not do any mistake	2019-12-16 19:09:02 +01:00
Geoffrey Frogeye	c3bf102289	Made references work	2019-12-16 14:18:03 +01:00
Geoffrey Frogeye	03a4042238	Added level Also fixed IP logic because this was real messed up	2019-12-16 09:31:29 +01:00
Geoffrey Frogeye	3197fa1663	Remove list usage for IpTreeNode	2019-12-16 06:54:18 +01:00
Geoffrey Frogeye	a0e68f0848	Reworked match and node system For level, and first_party later Next: add get_match to retrieve level of source and have correct levels ... am I going somewhere with all this?	2019-12-15 23:13:25 +01:00
Geoffrey Frogeye	aec8d3f8de	Reworked how paths work Get those tuples out of my eyes	2019-12-15 22:21:05 +01:00
Geoffrey Frogeye	7af2074c7a	Small optimisation of feed_switch	2019-12-15 17:12:44 +01:00
Geoffrey Frogeye	45325782d2	Multi-processed parser	2019-12-15 17:05:41 +01:00
Geoffrey Frogeye	ce52897d30	Smol fixes	2019-12-15 16:48:17 +01:00
Geoffrey Frogeye	954b33b2a6	Slightly better Rapid7 parser	2019-12-15 16:38:01 +01:00
Geoffrey Frogeye	d976752797	Store Ip4Path as int instead of List[int]	2019-12-15 16:26:18 +01:00
Geoffrey Frogeye	4d966371b2	Workflow: SQL -> Tree Welp. All that for this.	2019-12-15 15:56:26 +01:00
Geoffrey Frogeye	040ce4c14e	Typo in source	2019-12-15 01:52:45 +01:00
Geoffrey Frogeye	b50c01f740	Merge branch 'master' into newworkflow	2019-12-15 01:30:03 +01:00
Geoffrey Frogeye	ddceed3d25	Workflow: Can now import DnsMass output Well, in a specific format but DnsMass nonetheless	2019-12-15 00:28:08 +01:00
Geoffrey Frogeye	189deeb559	Workflow: Multiprocess Still trying. It's better than multithread though. Merge branch 'newworkflow' into newworkflow_threaded	2019-12-14 17:27:46 +01:00
Geoffrey Frogeye	d7c239a6f6	Workflow: Some modifications	2019-12-14 16:04:19 +01:00
Geoffrey Frogeye	5023b85d7c	Added intermediate representation for DNS datasets It's just CSV. The DNS from the datasets are not ordered consistently, so we need to parse it completly. It seems that converting to an IR before sending data to ./feed_dns.py through a pipe is faster than decoding the JSON in ./feed_dns.py. This will also reduce the storage of the resolved subdomains by about 15% (compressed).	2019-12-13 21:59:35 +01:00
Geoffrey Frogeye	269b8278b5	Worflow: Fixed rules counts	2019-12-13 18:36:08 +01:00
Geoffrey Frogeye	ab7ef609dd	Workflow: Various optimisations and fixes I forgot to close this one earlier, so: Closes #7	2019-12-13 18:08:22 +01:00
Geoffrey Frogeye	f3eedcba22	Updated now based on timestamp Did I forget to add feed_asn.py a few commits ago? Oh well...	2019-12-13 13:54:00 +01:00
Geoffrey Frogeye	8d94b80fd0	Integrated DNS resolving to workflow Since the bigger datasets are only updated once a month, this might help for quick updates.	2019-12-13 13:38:23 +01:00
Geoffrey Frogeye	231bb83667	Threaded feed_dns Largely disapointing	2019-12-13 12:36:11 +01:00
Geoffrey Frogeye	9050a84670	Read-only mode	2019-12-13 12:35:05 +01:00
Geoffrey Frogeye	e19f666331	Workflow: Automatically import IP ranges from ASN Closes #9	2019-12-13 08:23:38 +01:00
Geoffrey Frogeye	57416b6e2c	Workflow: POO and individual tables per types Mostly for performances reasons. First one to implement threading later. Second one to speed up the dichotomy, but it doesn't seem that much better so far.	2019-12-13 00:11:21 +01:00
Geoffrey Frogeye	b076fa6c34	Typo in new source URL	2019-12-12 23:28:00 +01:00
Geoffrey Frogeye	12dcafe606	Added alternate source of Eulerian CNAMES It was requested so. It should be temporary, once I have a bigger subdomain list that shouldn't be required.	2019-12-12 19:13:54 +01:00
Geoffrey Frogeye	1484733a90	Workflow: Small tweaks	2019-12-09 18:21:08 +01:00
Geoffrey Frogeye	55877be891	IP parsing C accelerated, use bytes everywhere	2019-12-09 09:47:48 +01:00
Geoffrey Frogeye	7937496882	Workflow: Base for new one While I'm automating this you'll need to download the A set from https://opendata.rapid7.com/sonar.fdns_v2/ to the file a.json.gz.	2019-12-09 08:12:48 +01:00
Geoffrey Frogeye	62e6c9005b	Tracker: intendmedia?	2019-12-08 01:32:49 +01:00
Geoffrey Frogeye	dc44dea505	Optimized IP matching	2019-12-08 01:23:36 +01:00
Geoffrey Frogeye	b634ae5bbd	Updated IP ranges for Criteo	2019-12-07 23:23:39 +01:00
Geoffrey Frogeye	16f8bed887	Tracker: Otto Group	2019-12-07 21:30:15 +01:00
Geoffrey Frogeye	d6df0fd4f9	Tracker: Webtrekk	2019-12-07 21:21:33 +01:00
Geoffrey Frogeye	4dd3d4a64b	Preliminary structure for testing In preparation of #4	2019-12-07 19:19:37 +01:00
Geoffrey Frogeye	ae71d6b204	Tracker: 2o7	2019-12-07 19:17:18 +01:00
Geoffrey Frogeye	2b0a723c30	Fix log in scripts Closes #8	2019-12-07 18:45:48 +01:00
Geoffrey Frogeye	0b2eb000c3	FP: ThreatMetrix	2019-12-07 18:23:11 +01:00
Geoffrey Frogeye	cbb0cc6f3b	Rules lists are optional	2019-12-07 18:22:20 +01:00
Geoffrey Frogeye	a5e768fe00	Filtering by IP range Closes #5	2019-12-07 13:56:04 +01:00
Geoffrey Frogeye	28e33dcc7a	Fixed description generation	2019-12-05 20:51:53 +01:00
Geoffrey Frogeye	95d4535abd	Nitpicking	2019-12-05 19:38:26 +01:00
Geoffrey Frogeye	025370bbbe	Splitted list with curated and not curated Closes #2	2019-12-05 19:15:24 +01:00
Geoffrey Frogeye	1c20963ffd	Removed third-parties from easyprivacy	2019-12-05 01:19:10 +01:00
Geoffrey Frogeye	188a8f7455	Removed another source of false-positives	2019-12-05 00:50:32 +01:00
Geoffrey Frogeye	f2bab3ca3f	Added contact information	2019-12-03 21:45:29 +01:00
Geoffrey Frogeye	08f25e26ba	Removed false-positive source Also had edgekey.net for blocking. Thanks @TorchedPoseidon for the report!	2019-12-03 21:27:37 +01:00
Geoffrey Frogeye	8c744d621e	Removed too restrictive source Was blocking ssl.ovh.net and akaimi.net	2019-12-03 18:43:23 +01:00
Geoffrey Frogeye	fe5f0c6c05	Added more rule sources	2019-12-03 17:33:46 +01:00
Geoffrey Frogeye	0159c6037c	Improved DNS resolving performances Also various fixes. Also some debug stuff, make sure to remove that later.	2019-12-03 15:35:21 +01:00
Geoffrey Frogeye	c609b90390	Append top 1M subdomains rather than replacing it	2019-12-03 09:04:19 +01:00
Geoffrey Frogeye	69b82d29fd	Improved rules handling Rules can now come in 3 different formats: - AdBlock rules - Host lists - Domains lists All will be converted into domain lists and aggregated (only AdBlock rules matching a whole domain will be kept). Subdomains will now be matched if it is a subdomain of any domain of the rule. It is way faster (seconds rather than hours!) but less flexible (although it shouldn't be a problem).	2019-12-03 08:48:12 +01:00
Geoffrey Frogeye	c23004fbff	Separated DNS resolution from filtering This effectively removes the parallelism of filtering, which doubles the processing time (5->8 hours), but this allows me to toy around with the performances of this step, which I aim to improve drastically.	2019-12-02 19:03:08 +01:00
Geoffrey Frogeye	7d01d016a5	Can now use AdBlock lists for tracking matching It's not very performant by itself, especially since pyre2 isn't maintained nor really compilableinstallable anymore. The performance seems to have decreased from 200 req/s to 0.2 req/s when using 512 threads, and to 80 req/s using 64 req/s. This might or might not be related,as the CPU doesn't seem to be the bottleneck. I will probably add support for host-based rules, matching the subdomains of such hosts (as for now there doesn't seem to be any other pattern for first-party trackers than subdomains, and this would be a very broad performace / compatibility with existing lists improvement), and convert the AdBlock lists to this format, only keeping domains-only rules.	2019-11-15 08:57:31 +01:00
Geoffrey Frogeye	87bb24c511	Shell typo	2019-11-14 15:40:25 +01:00
Geoffrey Frogeye	300fe8e15e	Added real argument parser Just so we can have color output when running the script :)	2019-11-14 15:37:32 +01:00
Geoffrey Frogeye	88f0bcc648	Refactored for correct retry logic	2019-11-14 15:03:20 +01:00
Geoffrey Frogeye	b343893c72	Merge branch 'master' of git.frogeye.fr:geoffrey/eulaurarien	2019-11-14 13:45:42 +01:00
Geoffrey Frogeye	ae93593930	Statistics about explicit first-parties	2019-11-14 13:31:39 +01:00
Geoffrey Frogeye	bdc691e647	Upped timeout	2019-11-14 13:10:14 +01:00
Geoffrey Frogeye	08a8eaaada	Use threads not subprocesses You dumbo	2019-11-14 12:57:06 +01:00
Geoffrey Frogeye	32377229db	Retry failed requests	2019-11-14 11:35:05 +01:00
Geoffrey Frogeye	04fe454d99	Automatically get top 1M subdomains	2019-11-14 11:23:59 +01:00
Geoffrey Frogeye	7df00fc859	Automatically download nameserver list	2019-11-14 10:56:53 +01:00
Geoffrey Frogeye	1bbc17a8ec	Greatly optimized subdomain filtering	2019-11-14 10:45:06 +01:00
Geoffrey Frogeye	00a0020914	Added some delay for websites subdomains collecting Some websites load their trackers after the page is done loading.	2019-11-14 06:29:24 +01:00
Geoffrey Frogeye	56374e3223	Added RED by SFR website	2019-11-13 18:14:56 +01:00
Geoffrey Frogeye	b17a24c047	Added more trackers and their clients	2019-11-12 13:58:17 +01:00
Geoffrey Frogeye	1c86255bb9	Added list of websites containing EA_data	2019-11-11 15:44:03 +01:00
Geoffrey Frogeye	7a7a3642a5	Added number of trackers in output	2019-11-11 13:00:14 +01:00
Geoffrey Frogeye	4e69bdbfc3	CI Test commit 2	2019-11-11 12:41:22 +01:00
Geoffrey Frogeye	aab8e93abe	CI Test commit 1	2019-11-11 12:31:32 +01:00
Geoffrey Frogeye	e0f28d41d2	Added public updated list link	2019-11-11 12:10:46 +01:00
Geoffrey Frogeye	a0a2af281f	Added possibility to add personal sources	2019-11-11 11:19:46 +01:00
Geoffrey Frogeye	333ae4eb66	Fixed tracker list	2019-11-10 23:58:49 +01:00
Geoffrey Frogeye	0df749f1e0	Added more trackers	2019-11-10 23:29:30 +01:00
Geoffrey Frogeye	b81c7c17ee	Loosely error-proofed subdomain collection	2019-11-10 23:22:21 +01:00
Geoffrey Frogeye	ed72f643fd	Updated website list	2019-11-10 23:16:18 +01:00
Geoffrey Frogeye	c409c2cf9b	More error-proofing	2019-11-10 23:07:21 +01:00
Geoffrey Frogeye	0801bd9e44	Error-proofed DNS-resolution	2019-11-10 22:18:27 +01:00
Geoffrey Frogeye	2f1af3c850	Added progressbar and ETA	2019-11-10 21:59:06 +01:00
Geoffrey Frogeye	d49a7803e9	Fixed typos	2019-11-10 18:29:16 +01:00