33 changed files with 522 additions and 931 deletions
--- a/.env.default
+++ b/.env.default
@ -1,5 +0,0 @@
 CACHE_SIZE=536870912
 MASSDNS_HASHMAP_SIZE=1000
 PROFILE=0
 SINGLE_PROCESS=0
 MASSDNS_BINARY=massdns
--- a/.gitignore
+++ b/.gitignore
@ -1,5 +1,2 @@
 *.log
 *.p
 .env
 __pycache__
 explanations
--- a/21
+++ b/21
@ -1,21 +0,0 @@
 MIT License
 Copyright (c) 2019 Geoffrey 'Frogeye' Preud'homme
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal
 in the Software without restriction, including without limitation the rights
 to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 copies of the Software, and to permit persons to whom the Software is
 furnished to do so, subject to the following conditions:
 The above copyright notice and this permission notice shall be included in all
 copies or substantial portions of the Software.
 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 SOFTWARE.
--- a/README.md
+++ b/README.md
@ -2,7 +2,7 @@
 This program is able to generate a list of every hostnames being a DNS redirection to a list of DNS zones and IP networks.
-It is primarilyy used to generate [Geoffrey Frogeye's block list of first-party trackers](https://hostfiles.frogeye.fr) (learn about first-party trackers by following this link).
+It is primarilyy used to generate [Geoffrey Frogeye's block list of first-party trackers](https://git.frogeye.fr/geoffrey/eulaurarien/src/branch/master/dist/README.md) (learn about first-party trackers by following this link).
 If you want to contribute but don't want to create an account on this forge, contact me the way you like: <https://geoffrey.frogeye.fr>
@ -18,13 +18,13 @@ This program takes as input:
 It will be able to output hostnames being a DNS redirection to any item in the lists provided.
-DNS records can be locally resolved from a list of subdomains using [MassDNS](https://github.com/blechschmidt/massdns).
+DNS records can either come from [Rapid7 Open Data Sets](https://opendata.rapid7.com/sonar.fdns_v2/) or can be locally resolved from a list of subdomains using [MassDNS](https://github.com/blechschmidt/massdns).
 Those subdomains can either be provided as is, come from [Cisco Umbrella Popularity List](http://s3-us-west-1.amazonaws.com/umbrella-static/index.html), from your browsing history, or from analyzing the traffic a web browser makes when opening an URL (the program provides utility to do all that).
 ## Usage
-Remember you can get an already generated and up-to-date list of first-party trackers from [here](https://hostfiles.frogeye.fr).
+Remember you can get an already generated and up-to-date list of first-party trackers from [here](https://git.frogeye.fr/geoffrey/eulaurarien/src/branch/master/dist/README.md).
 The following is for the people wanting to build their own list.
@ -34,26 +34,19 @@ Depending on the sources you'll be using to generate the list, you'll need to in
 - [Bash](https://www.gnu.org/software/bash/bash.html)
 - [Coreutils](https://www.gnu.org/software/coreutils/)
 - [Gawk](https://www.gnu.org/software/gawk/)
 - [curl](https://curl.haxx.se)
 - [pv](http://www.ivarch.com/programs/pv.shtml)
 - [Python 3.4+](https://www.python.org/)
 - [coloredlogs](https://pypi.org/project/coloredlogs/) (sorry I can't help myself)
 - [numpy](https://www.numpy.org/)
 - [python-abp](https://pypi.org/project/python-abp/) (only if you intend to use AdBlock rules as a rule source)
 - [massdns](https://github.com/blechschmidt/massdns) in your `$PATH` (only if you have subdomains as a source)
 - [Firefox](https://www.mozilla.org/firefox/) (only if you have websites as a source)
 - [selenium (Python bindings)](https://pypi.python.org/pypi/selenium) (only if you have websites as a source)
 - [selenium-wire](https://pypi.org/project/selenium-wire/) (only if you have websites as a source)
 - [markdown2](https://pypi.org/project/markdown2/) (only if you intend to generate the index webpage)
 ### Create a new database
 The so-called database (in the form of `blocking.p`) is a file storing all the matching entities (ASN, IPs, hostnames, zones…) and every entity leading to it.
-It exists because the list cannot be generated in one pass, as DNS redirections chain links do not have to be inputed in order.
+For now there's no way to remove data from it, so here's the command to recreate it: `./db.py --initialize`.
 You can purge of old records the database by running `./prune.sh`.
 When you remove a source of data, remove its corresponding file in `last_updates` to fix the pruning process.
 ### Gather external sources
@ -86,13 +79,6 @@ In each folder:
 Then, run `./import_rules.sh`.
 If you removed rules and you want to remove every record depending on those rules immediately,
 run the following command:
 ```
 ./db.py --prune --prune-before "$(cat "last_updates/rules.txt")" --prune-base
 ```
 ### Add subdomains
 If you plan to resolve DNS records yourself (as the DNS records datasets are not exhaustive),
@ -127,36 +113,21 @@ The program will use a list of public nameservers to do that, but you can add yo
 Then, run `./resolve_subdomains.sh`.
 Note that this is a network intensive process, not in term of bandwith, but in terms of packet number.
-> **Note:** Some VPS providers might detect this as a DDoS attack and cut the network access.
+> Some VPS providers might detect this as a DDoS attack and cut the network access.
 > Some Wi-Fi connections can be rendered unusable for other uses, some routers might cease to work.
 > Since massdns does not support yet rate limiting, my best bet was a Raspberry Pi with a slow ethernet link (Raspberry Pi < 4).
 The DNS records will automatically be imported into the database.
 If you want to re-import the records without re-doing the resolving, just run the last line of the `./resolve_subdomains.sh` script.
 ### Import DNS records from Rapid7
 Just run `./import_rapid7.sh`.
 This will download about 35 GiB of data, but only the matching records will be stored (about a few MiB for the tracking rules).
 Note the download speed will most likely be limited by the database operation thoughput (a quick RAM will help).
 ### Export the lists
-For the tracking list, use `./export_lists.sh`, the output will be in the `dist` folder (please change the links before distributing them).
+For the tracking list, use `./export_lists.sh`, the output will be in the `dist` forlder (please change the links before distributing them).
 For other purposes, tinker with the `./export.py` program.
 #### Explanations
 Note that if you created an `explanations` folder at the root of the project, a file with a timestamp will be created in it.
 It contains every rule in the database and the reason of their presence (i.e. their dependency).
 This might be useful to track changes between runs.
 Every rule has an associated tag with four components:
 1. A number: the level of the rule (1 if it is a rule present in the `rules*` folders)
 2. A letter: `F` if first-party, `M` if multi-party.
 3. A letter: `D` if a dupplicate (e.g. `foo.bar.com` if `*.bar.com` is already a rule), `_` if not.
 4. A number: the number of rules relying on this one
 ### Generate the index webpage
 This is the one served on <https://hostfiles.frogeye.fr>.
 Just run `./generate_index.py`.
 ### Everything
 Once you've made sure every step runs fine, you can use `./eulaurarien.sh` to run every step consecutively.
--- a/adblock_to_domain_list.py
+++ b/adblock_to_domain_list.py
@ -16,36 +16,25 @@ import abp.filters
 def get_domains(rule: abp.filters.parser.Filter) -> typing.Iterable[str]:
    if rule.options:
        return
-    selector_type = rule.selector["type"]
+    selector_type = rule.selector['type']
-    selector_value = rule.selector["value"]
+    selector_value = rule.selector['value']
-    if (
+    if selector_type == 'url-pattern' \
-        selector_type == "url-pattern"
+            and selector_value.startswith('||') \
-        and selector_value.startswith("||")
+            and selector_value.endswith('^'):
        and selector_value.endswith("^")
    ):
        yield selector_value[2:-1]
-if __name__ == "__main__":
+if __name__ == '__main__':
    # Parsing arguments
    parser = argparse.ArgumentParser(
-        description="Extract whole domains from an AdBlock blocking list"
+        description="Extract whole domains from an AdBlock blocking list")
    )
    parser.add_argument(
-        "-i",
+        '-i', '--input', type=argparse.FileType('r'), default=sys.stdin,
-        "--input",
+        help="Input file with AdBlock rules")
        type=argparse.FileType("r"),
        default=sys.stdin,
        help="Input file with AdBlock rules",
    )
    parser.add_argument(
-        "-o",
+        '-o', '--output', type=argparse.FileType('w'), default=sys.stdout,
-        "--output",
+        help="Outptut file with one rule tracking subdomain per line")
        type=argparse.FileType("w"),
        default=sys.stdout,
        help="Outptut file with one rule tracking subdomain per line",
    )
    args = parser.parse_args()
    # Reading rules
--- a/collect_subdomains.py
+++ b/collect_subdomains.py
@ -14,28 +14,6 @@ import time
 import progressbar
 import selenium.webdriver.firefox.options
 import seleniumwire.webdriver
 import logging
 log = logging.getLogger("cs")
 DRIVER = None
 SCROLL_TIME = 10.0
 SCROLL_STEPS = 100
 SCROLL_CMD = f"window.scrollBy(0,document.body.scrollHeight/{SCROLL_STEPS})"
 def new_driver() -> seleniumwire.webdriver.browser.Firefox:
    profile = selenium.webdriver.FirefoxProfile()
    profile.set_preference("privacy.trackingprotection.enabled", False)
    profile.set_preference("network.cookie.cookieBehavior", 0)
    profile.set_preference("privacy.trackingprotection.pbmode.enabled", False)
    profile.set_preference("privacy.trackingprotection.cryptomining.enabled", False)
    profile.set_preference("privacy.trackingprotection.fingerprinting.enabled", False)
    options = selenium.webdriver.firefox.options.Options()
    # options.add_argument('-headless')
    driver = seleniumwire.webdriver.Firefox(
        profile, executable_path="geckodriver", options=options
    )
    return driver
 def subdomain_from_url(url: str) -> str:
@ -51,36 +29,34 @@ def collect_subdomains(url: str) -> typing.Iterable[str]:
    Load an URL into an headless browser and return all the domains
    it tried to access.
    """
-    global DRIVER
+    options = selenium.webdriver.firefox.options.Options()
-    if not DRIVER:
+    options.add_argument('-headless')
-        DRIVER = new_driver()
+    driver = seleniumwire.webdriver.Firefox(
        executable_path='geckodriver', options=options)
-    try:
+    driver.get(url)
-        DRIVER.get(url)
+    time.sleep(10)
-        for s in range(SCROLL_STEPS):
+    for request in driver.requests:
            DRIVER.execute_script(SCROLL_CMD)
            time.sleep(SCROLL_TIME / SCROLL_STEPS)
        for request in DRIVER.requests:
        if request.response:
            yield subdomain_from_url(request.path)
-    except Exception:
+    driver.close()
        log.exception("Error")
        DRIVER.quit()
        DRIVER = None
 def collect_subdomains_standalone(url: str) -> None:
    url = url.strip()
    if not url:
        return
    try:
        for subdomain in collect_subdomains(url):
            print(subdomain)
    except:
        pass
-if __name__ == "__main__":
+if __name__ == '__main__':
    assert len(sys.argv) <= 2
    filename = None
-    if len(sys.argv) == 2 and sys.argv[1] != "-":
+    if len(sys.argv) == 2 and sys.argv[1] != '-':
        filename = sys.argv[1]
        num_lines = sum(1 for line in open(filename))
        iterator = progressbar.progressbar(open(filename), max_value=num_lines)
@ -90,8 +66,5 @@ if __name__ == "__main__":
    for line in iterator:
        collect_subdomains_standalone(line)
    if DRIVER:
        DRIVER.quit()
    if filename:
        iterator.close()
--- a/database.py
+++ b/database.py
@ -11,34 +11,37 @@ import coloredlogs
 import pickle
 import numpy
 import math
 import os
 TLD_LIST: typing.Set[str] = set()
-coloredlogs.install(level="DEBUG", fmt="%(asctime)s %(name)s %(levelname)s %(message)s")
+coloredlogs.install(
    level='DEBUG',
    fmt='%(asctime)s %(name)s %(levelname)s %(message)s'
 )
 Asn = int
 Timestamp = int
 Level = int
-class Path:
+class Path():
    # FP add boolean here
    pass
 class RulePath(Path):
    def __str__(self) -> str:
-        return "(rule)"
+        return '(rule)'
 class RuleFirstPath(RulePath):
    def __str__(self) -> str:
-        return "(first-party rule)"
+        return '(first-party rule)'
 class RuleMultiPath(RulePath):
    def __str__(self) -> str:
-        return "(multi-party rule)"
+        return '(multi-party rule)'
 class DomainPath(Path):
@ -46,7 +49,7 @@ class DomainPath(Path):
        self.parts = parts
    def __str__(self) -> str:
-        return "?." + Database.unpack_domain(self)
+        return '?.' + Database.unpack_domain(self)
 class HostnamePath(DomainPath):
@ -56,7 +59,7 @@ class HostnamePath(DomainPath):
 class ZonePath(DomainPath):
    def __str__(self) -> str:
-        return "*." + Database.unpack_domain(self)
+        return '*.' + Database.unpack_domain(self)
 class AsnPath(Path):
@ -76,7 +79,7 @@ class Ip4Path(Path):
        return Database.unpack_ip4network(self)
-class Match:
+class Match():
    def __init__(self) -> None:
        self.source: typing.Optional[Path] = None
        self.updated: int = 0
@ -92,17 +95,14 @@ class Match:
            return False
        return True
    def disable(self) -> None:
        self.updated = 0
 class AsnNode(Match):
    def __init__(self) -> None:
        Match.__init__(self)
-        self.name = ""
+        self.name = ''
-class DomainTreeNode:
+class DomainTreeNode():
    def __init__(self) -> None:
        self.children: typing.Dict[str, DomainTreeNode] = dict()
        self.match_zone = Match()
@ -117,28 +117,20 @@ class IpTreeNode(Match):
 Node = typing.Union[DomainTreeNode, IpTreeNode, AsnNode]
-MatchCallable = typing.Callable[[Path, Match], typing.Any]
+MatchCallable = typing.Callable[[Path,
                                 Match],
                                typing.Any]
-class Profiler:
+class Profiler():
    def __init__(self) -> None:
-        do_profile = int(os.environ.get("PROFILE", "0"))
+        self.log = logging.getLogger('profiler')
        if do_profile:
            self.log = logging.getLogger("profiler")
        self.time_last = time.perf_counter()
-            self.time_step = "init"
+        self.time_step = 'init'
        self.time_dict: typing.Dict[str, float] = dict()
        self.step_dict: typing.Dict[str, int] = dict()
            self.enter_step = self.enter_step_real
            self.profile = self.profile_real
        else:
            self.enter_step = self.enter_step_dummy
            self.profile = self.profile_dummy
-    def enter_step_dummy(self, name: str) -> None:
+    def enter_step(self, name: str) -> None:
        return
    def enter_step_real(self, name: str) -> None:
        now = time.perf_counter()
        try:
            self.time_dict[self.time_step] += now - self.time_last
@ -149,21 +141,15 @@ class Profiler:
        self.time_step = name
        self.time_last = time.perf_counter()
-    def profile_dummy(self) -> None:
+    def profile(self) -> None:
-        return
+        self.enter_step('profile')
    def profile_real(self) -> None:
        self.enter_step("profile")
        total = sum(self.time_dict.values())
        for key, secs in sorted(self.time_dict.items(), key=lambda t: t[1]):
            times = self.step_dict[key]
-            self.log.debug(
+            self.log.debug(f"{key:<20}: {times:9d} × {secs/times:5.3e} "
-                f"{key:<20}: {times:9d} × {secs/times:5.3e} "
+                           f"= {secs:9.2f} s ({secs/total:7.2%}) ")
-                f"= {secs:9.2f} s ({secs/total:7.2%}) "
+        self.log.debug(f"{'total':<20}:                         "
-            )
+                       f"{total:9.2f} s ({1:7.2%})")
        self.log.debug(
            f"{'total':<20}:                         " f"{total:9.2f} s ({1:7.2%})"
        )
 class Database(Profiler):
@ -171,7 +157,9 @@ class Database(Profiler):
    PATH = "blocking.p"
    def initialize(self) -> None:
-        self.log.warning("Creating database version: %d ", Database.VERSION)
+        self.log.warning(
            "Creating database version: %d ",
            Database.VERSION)
        # Dummy match objects that everything refer to
        self.rules: typing.List[Match] = list()
        for first_party in (False, True):
@ -185,77 +173,67 @@ class Database(Profiler):
        self.ip4tree = IpTreeNode()
    def load(self) -> None:
-        self.enter_step("load")
+        self.enter_step('load')
        try:
-            with open(self.PATH, "rb") as db_fdsec:
+            with open(self.PATH, 'rb') as db_fdsec:
                version, data = pickle.load(db_fdsec)
                if version == Database.VERSION:
                    self.rules, self.domtree, self.asns, self.ip4tree = data
                    return
                self.log.warning(
-                    "Outdated database version found: %d, " "it will be rebuilt.",
+                    "Outdated database version found: %d, "
-                    version,
+                    "it will be rebuilt.",
-                )
+                    version)
        except (TypeError, AttributeError, EOFError):
            self.log.error(
-                "Corrupt (or heavily outdated) database found, " "it will be rebuilt."
+                "Corrupt (or heavily outdated) database found, "
-            )
+                "it will be rebuilt.")
        except FileNotFoundError:
            pass
        self.initialize()
    def save(self) -> None:
-        self.enter_step("save")
+        self.enter_step('save')
-        with open(self.PATH, "wb") as db_fdsec:
+        with open(self.PATH, 'wb') as db_fdsec:
            data = self.rules, self.domtree, self.asns, self.ip4tree
            pickle.dump((self.VERSION, data), db_fdsec)
        self.profile()
    def __init__(self) -> None:
        Profiler.__init__(self)
-        self.log = logging.getLogger("db")
+        self.log = logging.getLogger('db')
        self.load()
        self.ip4cache_shift: int = 32
        self.ip4cache = numpy.ones(1)
    def _set_ip4cache(self, path: Path, _: Match) -> None:
        assert isinstance(path, Ip4Path)
-        self.enter_step("set_ip4cache")
+        self.enter_step('set_ip4cache')
        mini = path.value >> self.ip4cache_shift
-        maxi = (path.value + 2 ** (32 - path.prefixlen)) >> self.ip4cache_shift
+        maxi = (path.value + 2**(32-path.prefixlen)) >> self.ip4cache_shift
        if mini == maxi:
            self.ip4cache[mini] = True
        else:
            self.ip4cache[mini:maxi] = True
-    def fill_ip4cache(self, max_size: int = 512 * 1024 ** 2) -> None:
+    def fill_ip4cache(self, max_size: int = 512*1024**2) -> None:
        """
        Size in bytes
        """
-        if max_size > 2 ** 32 / 8:
+        if max_size > 2**32/8:
-            self.log.warning(
+            self.log.warning("Allocating more than 512 MiB of RAM for "
-                "Allocating more than 512 MiB of RAM for "
+                             "the Ip4 cache is not necessary.")
-                "the Ip4 cache is not necessary."
+        max_cache_width = int(math.log2(max(1, max_size*8)))
-            )
+        cache_width = min(2**32, max_cache_width)
-        max_cache_width = int(math.log2(max(1, max_size * 8)))
+        self.ip4cache_shift = 32-cache_width
-        allocated = False
+        cache_size = 2**cache_width
-        cache_width = min(32, max_cache_width)
+        self.ip4cache = numpy.zeros(cache_size, dtype=numpy.bool)
        while not allocated:
            cache_size = 2 ** cache_width
            try:
                self.ip4cache = numpy.zeros(cache_size, dtype=bool)
            except MemoryError:
                self.log.exception("Could not allocate cache. Retrying a smaller one.")
                cache_width -= 1
                continue
            allocated = True
        self.ip4cache_shift = 32 - cache_width
        for _ in self.exec_each_ip4(self._set_ip4cache):
            pass
    @staticmethod
    def populate_tld_list() -> None:
-        with open("temp/all_tld.list", "r") as tld_fdesc:
+        with open('temp/all_tld.list', 'r') as tld_fdesc:
            for tld in tld_fdesc:
                tld = tld.strip()
                TLD_LIST.add(tld)
@ -264,7 +242,7 @@ class Database(Profiler):
    def validate_domain(path: str) -> bool:
        if len(path) > 255:
            return False
-        splits = path.split(".")
+        splits = path.split('.')
        if not TLD_LIST:
            Database.populate_tld_list()
        if splits[-1] not in TLD_LIST:
@ -276,26 +254,26 @@ class Database(Profiler):
    @staticmethod
    def pack_domain(domain: str) -> DomainPath:
-        return DomainPath(domain.split(".")[::-1])
+        return DomainPath(domain.split('.')[::-1])
    @staticmethod
    def unpack_domain(domain: DomainPath) -> str:
-        return ".".join(domain.parts[::-1])
+        return '.'.join(domain.parts[::-1])
    @staticmethod
    def pack_asn(asn: str) -> AsnPath:
        asn = asn.upper()
-        if asn.startswith("AS"):
+        if asn.startswith('AS'):
            asn = asn[2:]
        return AsnPath(int(asn))
    @staticmethod
    def unpack_asn(asn: AsnPath) -> str:
-        return f"AS{asn.asn}"
+        return f'AS{asn.asn}'
    @staticmethod
    def validate_ip4address(path: str) -> bool:
-        splits = path.split(".")
+        splits = path.split('.')
        if len(splits) != 4:
            return False
        for split in splits:
@ -306,17 +284,12 @@ class Database(Profiler):
                return False
        return True
    @staticmethod
    def pack_ip4address_low(address: str) -> int:
        addr = 0
        for split in address.split("."):
            octet = int(split)
            addr = (addr << 8) + octet
        return addr
    @staticmethod
    def pack_ip4address(address: str) -> Ip4Path:
-        return Ip4Path(Database.pack_ip4address_low(address), 32)
+        addr = 0
        for split in address.split('.'):
            addr = (addr << 8) + int(split)
        return Ip4Path(addr, 32)
    @staticmethod
    def unpack_ip4address(address: Ip4Path) -> str:
@ -327,12 +300,12 @@ class Database(Profiler):
        for o in reversed(range(4)):
            octets[o] = addr & 0xFF
            addr >>= 8
-        return ".".join(map(str, octets))
+        return '.'.join(map(str, octets))
    @staticmethod
    def validate_ip4network(path: str) -> bool:
        # A bit generous but ok for our usage
-        splits = path.split("/")
+        splits = path.split('/')
        if len(splits) != 2:
            return False
        if not Database.validate_ip4address(splits[0]):
@ -346,7 +319,7 @@ class Database(Profiler):
    @staticmethod
    def pack_ip4network(network: str) -> Ip4Path:
-        address, prefixlen_str = network.split("/")
+        address, prefixlen_str = network.split('/')
        prefixlen = int(prefixlen_str)
        addr = Database.pack_ip4address(address)
        addr.prefixlen = prefixlen
@ -360,7 +333,7 @@ class Database(Profiler):
        for o in reversed(range(4)):
            octets[o] = addr & 0xFF
            addr >>= 8
-        return ".".join(map(str, octets)) + "/" + str(network.prefixlen)
+        return '.'.join(map(str, octets)) + '/' + str(network.prefixlen)
    def get_match(self, path: Path) -> Match:
        if isinstance(path, RuleMultiPath):
@ -381,7 +354,7 @@ class Database(Profiler):
                raise ValueError
        elif isinstance(path, Ip4Path):
            dici = self.ip4tree
-            for i in range(31, 31 - path.prefixlen, -1):
+            for i in range(31, 31-path.prefixlen, -1):
                bit = (path.value >> i) & 0b1
                dici_next = dici.one if bit else dici.zero
                if not dici_next:
@ -391,8 +364,7 @@ class Database(Profiler):
        else:
            raise ValueError
-    def exec_each_asn(
+    def exec_each_asn(self,
        self,
                      callback: MatchCallable,
                      ) -> typing.Any:
        for asn in self.asns:
@ -407,8 +379,7 @@ class Database(Profiler):
                except TypeError:  # not iterable
                    pass
-    def exec_each_domain(
+    def exec_each_domain(self,
        self,
                         callback: MatchCallable,
                         _dic: DomainTreeNode = None,
                         _par: DomainPath = None,
@ -436,11 +407,12 @@ class Database(Profiler):
        for part in _dic.children:
            dic = _dic.children[part]
            yield from self.exec_each_domain(
-                callback, _dic=dic, _par=DomainPath(_par.parts + [part])
+                callback,
                _dic=dic,
                _par=DomainPath(_par.parts + [part])
            )
-    def exec_each_ip4(
+    def exec_each_ip4(self,
        self,
                      callback: MatchCallable,
                      _dic: IpTreeNode = None,
                      _par: Ip4Path = None,
@ -464,16 +436,23 @@ class Database(Profiler):
            # addr0 = _par.value & (0xFFFFFFFF ^ (1 << (32-pref)))
            # assert addr0 == _par.value
            addr0 = _par.value
-            yield from self.exec_each_ip4(callback, _dic=dic, _par=Ip4Path(addr0, pref))
+            yield from self.exec_each_ip4(
                callback,
                _dic=dic,
                _par=Ip4Path(addr0, pref)
            )
        # 1
        dic = _dic.one
        if dic:
-            addr1 = _par.value | (1 << (32 - pref))
+            addr1 = _par.value | (1 << (32-pref))
            # assert addr1 != _par.value
-            yield from self.exec_each_ip4(callback, _dic=dic, _par=Ip4Path(addr1, pref))
+            yield from self.exec_each_ip4(
                callback,
                _dic=dic,
                _par=Ip4Path(addr1, pref)
            )
-    def exec_each(
+    def exec_each(self,
        self,
                  callback: MatchCallable,
                  ) -> typing.Any:
        yield from self.exec_each_domain(callback)
@ -483,77 +462,36 @@ class Database(Profiler):
    def update_references(self) -> None:
        # Should be correctly calculated normally,
        # keeping this just in case
-        def reset_references_cb(path: Path, match: Match) -> None:
+        def reset_references_cb(path: Path,
                                match: Match
                                ) -> None:
            match.references = 0
        for _ in self.exec_each(reset_references_cb):
            pass
-        def increment_references_cb(path: Path, match: Match) -> None:
+        def increment_references_cb(path: Path,
                                    match: Match
                                    ) -> None:
            if match.source:
                source = self.get_match(match.source)
                source.references += 1
        for _ in self.exec_each(increment_references_cb):
            pass
    def _clean_deps(self) -> None:
        # Disable the matches that depends on the targeted
        # matches until all disabled matches reference count = 0
        did_something = True
        def clean_deps_cb(path: Path, match: Match) -> None:
            nonlocal did_something
            if not match.source:
                return
            source = self.get_match(match.source)
            if not source.active():
                self._unset_match(match)
            elif match.first_party > source.first_party:
                match.first_party = source.first_party
            else:
                return
            did_something = True
        while did_something:
            did_something = False
            self.enter_step("pass_clean_deps")
            for _ in self.exec_each(clean_deps_cb):
                pass
    def prune(self, before: int, base_only: bool = False) -> None:
-        # Disable the matches targeted
+        raise NotImplementedError
        def prune_cb(path: Path, match: Match) -> None:
            if base_only and match.level > 1:
                return
            if match.updated > before:
                return
            self._unset_match(match)
            self.log.debug("Print: disabled %s", path)
        self.enter_step("pass_prune")
        for _ in self.exec_each(prune_cb):
            pass
        self._clean_deps()
        # Remove branches with no match
        # TODO
    def explain(self, path: Path) -> str:
        match = self.get_match(path)
        string = str(path)
        if isinstance(match, AsnNode):
-            string += f" ({match.name})"
+            string = f'{path} ({match.name}) #{match.references}'
-        party_char = "F" if match.first_party else "M"
+        else:
-        dup_char = "D" if match.dupplicate else "_"
+            string = f'{path} #{match.references}'
        string += f" {match.level}{party_char}{dup_char}{match.references}"
        if match.source:
-            string += f" ← {self.explain(match.source)}"
+            string += f' ← {self.explain(match.source)}'
        return string
-    def list_records(
+    def list_records(self,
        self,
                     first_party_only: bool = False,
                     end_chain_only: bool = False,
                     no_dupplicates: bool = False,
@ -561,7 +499,9 @@ class Database(Profiler):
                     hostnames_only: bool = False,
                     explain: bool = False,
                     ) -> typing.Iterable[str]:
-        def export_cb(path: Path, match: Match) -> typing.Iterable[str]:
+
        def export_cb(path: Path, match: Match
                      ) -> typing.Iterable[str]:
            if first_party_only and not match.first_party:
                return
            if end_chain_only and match.references > 0:
@ -580,8 +520,7 @@ class Database(Profiler):
        yield from self.exec_each(export_cb)
-    def count_records(
+    def count_records(self,
        self,
                      first_party_only: bool = False,
                      end_chain_only: bool = False,
                      no_dupplicates: bool = False,
@ -612,64 +551,54 @@ class Database(Profiler):
        split: typing.List[str] = list()
        for key, value in sorted(memo.items(), key=lambda s: s[0]):
-            split.append(f"{key[:-4].lower()}s: {value}")
+            split.append(f'{key[:-4].lower()}s: {value}')
-        return ", ".join(split)
+        return ', '.join(split)
    def get_domain(self, domain_str: str) -> typing.Iterable[DomainPath]:
-        self.enter_step("get_domain_pack")
+        self.enter_step('get_domain_pack')
        domain = self.pack_domain(domain_str)
-        self.enter_step("get_domain_brws")
+        self.enter_step('get_domain_brws')
        dic = self.domtree
        depth = 0
        for part in domain.parts:
            if dic.match_zone.active():
-                self.enter_step("get_domain_yield")
+                self.enter_step('get_domain_yield')
                yield ZonePath(domain.parts[:depth])
-            self.enter_step("get_domain_brws")
+            self.enter_step('get_domain_brws')
            if part not in dic.children:
                return
            dic = dic.children[part]
            depth += 1
        if dic.match_zone.active():
-            self.enter_step("get_domain_yield")
+            self.enter_step('get_domain_yield')
            yield ZonePath(domain.parts)
        if dic.match_hostname.active():
-            self.enter_step("get_domain_yield")
+            self.enter_step('get_domain_yield')
            yield HostnamePath(domain.parts)
    def get_ip4(self, ip4_str: str) -> typing.Iterable[Path]:
-        self.enter_step("get_ip4_pack")
+        self.enter_step('get_ip4_pack')
-        ip4val = self.pack_ip4address_low(ip4_str)
+        ip4 = self.pack_ip4address(ip4_str)
-        self.enter_step("get_ip4_cache")
+        self.enter_step('get_ip4_cache')
-        if not self.ip4cache[ip4val >> self.ip4cache_shift]:
+        if not self.ip4cache[ip4.value >> self.ip4cache_shift]:
            return
-        self.enter_step("get_ip4_brws")
+        self.enter_step('get_ip4_brws')
        dic = self.ip4tree
-        for i in range(31, -1, -1):
+        for i in range(31, 31-ip4.prefixlen, -1):
-            bit = (ip4val >> i) & 0b1
+            bit = (ip4.value >> i) & 0b1
            if dic.active():
-                self.enter_step("get_ip4_yield")
+                self.enter_step('get_ip4_yield')
-                yield Ip4Path(ip4val >> (i + 1) << (i + 1), 31 - i)
+                yield Ip4Path(ip4.value >> (i+1) << (i+1), 31-i)
-                self.enter_step("get_ip4_brws")
+                self.enter_step('get_ip4_brws')
            next_dic = dic.one if bit else dic.zero
            if next_dic is None:
                return
            dic = next_dic
        if dic.active():
-            self.enter_step("get_ip4_yield")
+            self.enter_step('get_ip4_yield')
-            yield Ip4Path(ip4val, 32)
+            yield ip4
-    def _unset_match(
+    def _set_match(self,
        self,
        match: Match,
    ) -> None:
        match.disable()
        if match.source:
            source_match = self.get_match(match.source)
            source_match.references -= 1
    def _set_match(
        self,
                   match: Match,
                   updated: int,
                   source: Path,
@ -681,11 +610,8 @@ class Database(Profiler):
        # so it can pass it to save a traversal
        source_match = source_match or self.get_match(source)
        new_level = source_match.level + 1
-        if (
+        if updated > match.updated or new_level < match.level \
-            updated > match.updated
+                or source_match.first_party > match.first_party:
            or new_level < match.level
            or source_match.first_party > match.first_party
        ):
            # NOTE FP and level of matches referencing this one
            # won't be updated until run or prune
            if match.source:
@ -698,18 +624,20 @@ class Database(Profiler):
            source_match.references += 1
            match.dupplicate = dupplicate
-    def _set_domain(
+    def _set_domain(self,
-        self, hostname: bool, domain_str: str, updated: int, source: Path
+                    hostname: bool,
-    ) -> None:
+                    domain_str: str,
-        self.enter_step("set_domain_val")
+                    updated: int,
                    source: Path) -> None:
        self.enter_step('set_domain_val')
        if not Database.validate_domain(domain_str):
            raise ValueError(f"Invalid domain: {domain_str}")
-        self.enter_step("set_domain_pack")
+        self.enter_step('set_domain_pack')
        domain = self.pack_domain(domain_str)
-        self.enter_step("set_domain_fp")
+        self.enter_step('set_domain_fp')
        source_match = self.get_match(source)
        is_first_party = source_match.first_party
-        self.enter_step("set_domain_brws")
+        self.enter_step('set_domain_brws')
        dic = self.domtree
        dupplicate = False
        for part in domain.parts:
@ -730,14 +658,21 @@ class Database(Profiler):
            dupplicate=dupplicate,
        )
-    def set_hostname(self, *args: typing.Any, **kwargs: typing.Any) -> None:
+    def set_hostname(self,
                     *args: typing.Any, **kwargs: typing.Any
                     ) -> None:
        self._set_domain(True, *args, **kwargs)
-    def set_zone(self, *args: typing.Any, **kwargs: typing.Any) -> None:
+    def set_zone(self,
                 *args: typing.Any, **kwargs: typing.Any
                 ) -> None:
        self._set_domain(False, *args, **kwargs)
-    def set_asn(self, asn_str: str, updated: int, source: Path) -> None:
+    def set_asn(self,
-        self.enter_step("set_asn")
+                asn_str: str,
                updated: int,
                source: Path) -> None:
        self.enter_step('set_asn')
        path = self.pack_asn(asn_str)
        if path.asn in self.asns:
            match = self.asns[path.asn]
@ -750,14 +685,17 @@ class Database(Profiler):
            source,
        )
-    def _set_ip4(self, ip4: Ip4Path, updated: int, source: Path) -> None:
+    def _set_ip4(self,
-        self.enter_step("set_ip4_fp")
+                 ip4: Ip4Path,
                 updated: int,
                 source: Path) -> None:
        self.enter_step('set_ip4_fp')
        source_match = self.get_match(source)
        is_first_party = source_match.first_party
-        self.enter_step("set_ip4_brws")
+        self.enter_step('set_ip4_brws')
        dic = self.ip4tree
        dupplicate = False
-        for i in range(31, 31 - ip4.prefixlen, -1):
+        for i in range(31, 31-ip4.prefixlen, -1):
            bit = (ip4.value >> i) & 0b1
            next_dic = dic.one if bit else dic.zero
            if next_dic is None:
@ -778,22 +716,24 @@ class Database(Profiler):
        )
        self._set_ip4cache(ip4, dic)
-    def set_ip4address(
+    def set_ip4address(self,
-        self, ip4address_str: str, *args: typing.Any, **kwargs: typing.Any
+                       ip4address_str: str,
                       *args: typing.Any, **kwargs: typing.Any
                       ) -> None:
-        self.enter_step("set_ip4add_val")
+        self.enter_step('set_ip4add_val')
        if not Database.validate_ip4address(ip4address_str):
            raise ValueError(f"Invalid ip4address: {ip4address_str}")
-        self.enter_step("set_ip4add_pack")
+        self.enter_step('set_ip4add_pack')
        ip4 = self.pack_ip4address(ip4address_str)
        self._set_ip4(ip4, *args, **kwargs)
-    def set_ip4network(
+    def set_ip4network(self,
-        self, ip4network_str: str, *args: typing.Any, **kwargs: typing.Any
+                       ip4network_str: str,
                       *args: typing.Any, **kwargs: typing.Any
                       ) -> None:
-        self.enter_step("set_ip4net_val")
+        self.enter_step('set_ip4net_val')
        if not Database.validate_ip4network(ip4network_str):
            raise ValueError(f"Invalid ip4network: {ip4network_str}")
-        self.enter_step("set_ip4net_pack")
+        self.enter_step('set_ip4net_pack')
        ip4 = self.pack_ip4network(ip4network_str)
        self._set_ip4(ip4, *args, **kwargs)
--- a/db.py
+++ b/db.py
@ -5,37 +5,29 @@ import database
 import time
 import os
-if __name__ == "__main__":
+if __name__ == '__main__':
    # Parsing arguments
-    parser = argparse.ArgumentParser(description="Database operations")
+    parser = argparse.ArgumentParser(
        description="Database operations")
    parser.add_argument(
-        "-i", "--initialize", action="store_true", help="Reconstruct the whole database"
+        '-i', '--initialize', action='store_true',
-    )
+        help="Reconstruct the whole database")
    parser.add_argument(
-        "-p", "--prune", action="store_true", help="Remove old entries from database"
+        '-p', '--prune', action='store_true',
-    )
+        help="Remove old entries from database")
    parser.add_argument(
-        "-b",
+        '-b', '--prune-base', action='store_true',
        "--prune-base",
        action="store_true",
        help="With --prune, only prune base rules "
-        "(the ones added by ./feed_rules.py)",
+        "(the ones added by ./feed_rules.py)")
    )
    parser.add_argument(
-        "-s",
+        '-s', '--prune-before', type=int,
-        "--prune-before",
+        default=(int(time.time()) - 60*60*24*31*6),
        type=int,
        default=(int(time.time()) - 60 * 60 * 24 * 31 * 6),
        help="With --prune, only rules updated before "
-        "this UNIX timestamp will be deleted",
+        "this UNIX timestamp will be deleted")
    )
    parser.add_argument(
-        "-r",
+        '-r', '--references', action='store_true',
-        "--references",
+        help="DEBUG: Update the reference count")
        action="store_true",
        help="DEBUG: Update the reference count",
    )
    args = parser.parse_args()
    if not args.initialize:
@ -45,7 +37,7 @@ if __name__ == "__main__":
            os.unlink(database.Database.PATH)
        DB = database.Database()
-    DB.enter_step("main")
+    DB.enter_step('main')
    if args.prune:
        DB.prune(before=args.prune_before, base_only=args.prune_base)
    if args.references:
--- a/dist/.gitignore
+++ b/dist/.gitignore
@ -1,2 +1 @@
 *.txt
 *.html
--- a/dist/README.md
+++ b/dist/README.md
@ -12,52 +12,32 @@ In order to block those, one can simply block the hostname `trackercompany.com`,
 However, to circumvent this block, tracker companies made the websites using them load trackers from `somestring.website1.com`.
 The latter is a DNS redirection to `website1.trackercompany.com`, directly to an IP address belonging to the tracking company.
 Those are called first-party trackers.
 On top of aforementionned privacy issues, they also cause some security issue, as websites usually trust those scripts more.
 For more information, learn about [Content Security Policy](https://developer.mozilla.org/en-US/docs/Web/HTTP/CSP), [same-origin policy](https://developer.mozilla.org/en-US/docs/Web/Security/Same-origin_policy) and [Cross-Origin Resource Sharing](https://enable-cors.org/).
 In order to block those trackers, ad blockers would need to block every subdomain pointing to anything under `trackercompany.com` or to their network.
 Unfortunately, most don't support those blocking methods as they are not DNS-aware, e.g. they only see `somestring.website1.com`.
 This list is an inventory of every `somestring.website1.com` found to allow non DNS-aware ad blocker to still block first-party trackers.
 ### Learn more
 - [CNAME Cloaking, the dangerous disguise of third-party trackers](https://medium.com/nextdns/cname-cloaking-the-dangerous-disguise-of-third-party-trackers-195205dc522a) from NextDNS
 - [Trackers first-party](https://blog.imirhil.fr/2019/11/13/first-party-tracker.html) from Aeris, in french
 - [uBlock Origin issue](https://github.com/uBlockOrigin/uBlock-issues/issues/780)
 - [CNAME Cloaking and Bounce Tracking Defense](https://webkit.org/blog/11338/cname-cloaking-and-bounce-tracking-defense/) on WebKit's blog
 - [Characterizing CNAME cloaking-based tracking](https://blog.apnic.net/2020/08/04/characterizing-cname-cloaking-based-tracking/) on APNIC's webiste
 - [Characterizing CNAME Cloaking-Based Tracking on the Web](https://tma.ifip.org/2020/wp-content/uploads/sites/9/2020/06/tma2020-camera-paper66.pdf) is a research paper from Sokendai and ANSSI
 ## List variants
-### First-party trackers
+### First-party trackers (recommended)
 **Recommended for hostfiles-based ad blockers, such as [Pi-hole](https://pi-hole.net/) (&lt;v5.0, as it introduced CNAME blocking).**
 **Recommended for Android ad blockers as applications, such ad [Blokada](https://blokada.org/).**
 - Hosts file: <https://hostfiles.frogeye.fr/firstparty-trackers-hosts.txt>
 - Raw list: <https://hostfiles.frogeye.fr/firstparty-trackers.txt>
 This list contains every hostname redirecting to [a hand-picked list of first-party trackers](https://git.frogeye.fr/geoffrey/eulaurarien/src/branch/master/rules/first-party.list).
 It should be safe from false-positives.
 It also contains all tracking hostnames under company domains (e.g. `website1.trackercompany.com`),
 useful for ad blockers that don't support mass regex blocking,
 while still preventing fallback to third-party trackers.
 Don't be afraid of the size of the list, as this is due to the nature of first-party trackers: a single tracker generates at least one hostname per client (typically two).
 ### First-party only trackers
 **Recommended for ad blockers as web browser extensions, such as [uBlock Origin](https://ublockorigin.com/) (&lt;v1.25.0 or for Chromium-based browsers, as it introduced CNAME uncloaking for Firefox).**
 - Hosts file: <https://hostfiles.frogeye.fr/firstparty-only-trackers-hosts.txt>
 - Raw list: <https://hostfiles.frogeye.fr/firstparty-only-trackers.txt>
-This is the same list as above, albeit not containing the hostnames under the tracking company domains (e.g. `website1.trackercompany.com`).
+This is the same list as above, albeit not containing the hostnames under the tracking company domains.
-This allows for reducing the size of the list for ad-blockers that already block those third-party trackers with their support of regex blocking.
+This reduces the size of the list, but it doesn't prevent from third-party tracking too.
-Use in conjunction with other block lists used in regex-mode, such as [Peter Lowe's](https://pgl.yoyo.org/adservers/)
+Use in conjunction with other block lists.
 ### Multi-party trackers
@ -66,23 +46,22 @@ Use in conjunction with other block lists used in regex-mode, such as [Peter Low
 As first-party trackers usually evolve from third-party trackers, this list contains every hostname redirecting to trackers found in existing lists of third-party trackers (see next section).
 Since the latter were not designed with first-party trackers in mind, they are likely to contain false-positives.
-On the other hand, they might protect against first-party tracker that we're not aware of / have not yet confirmed.
+In the other hand, they might protect against first-party tracker that we're not aware of / have not yet confirmed.
 #### Source of third-party trackers
 - [EasyPrivacy](https://easylist.to/easylist/easyprivacy.txt)
 - [AdGuard](https://github.com/AdguardTeam/AdguardFilters)
-(yes there's only two for now. A lot of existing ones cause a lot of false positives)
+(yes there's only one for now. A lot of existing ones cause a lot of false positives)
 ### Multi-party only trackers
 - Hosts file: <https://hostfiles.frogeye.fr/multiparty-only-trackers-hosts.txt>
 - Raw list: <https://hostfiles.frogeye.fr/multiparty-only-trackers.txt>
-This is the same list as above, albeit not containing the hostnames under the tracking company domains (e.g. `website1.trackercompany.com`).
+This is the same list as above, albeit not containing the hostnames under the tracking company domains.
-This allows for reducing the size of the list for ad-blockers that already block those third-party trackers with their support of regex blocking.
+This reduces the size of the list, but it doesn't prevent from third-party tracking too.
-Use in conjunction with other block lists used in regex-mode, such as the ones in the previous section.
+Use in conjunction with other block lists, especially the ones used to generate this list in the previous section.
 ## Meta
@ -90,25 +69,6 @@ In case of false positives/negatives, or any other question contact me the way y
 The software used to generate this list is available here: <https://git.frogeye.fr/geoffrey/eulaurarien>
 ## Acknowledgements
 Some of the first-party tracker included in this list have been found by:
 - [Aeris](https://imirhil.fr/)
 - NextDNS and [their blocklist](https://github.com/nextdns/cname-cloaking-blocklist)'s contributors
 - Yuki2718 from [Wilders Security Forums](https://www.wilderssecurity.com/threads/ublock-a-lean-and-fast-blocker.365273/page-168#post-2880361)
 - Ha Dao, Johan Mazel, and Kensuke Fukuda, ["Characterizing CNAME Cloaking-Based Tracking on the Web", Proceedings of IFIP/IEEE Traffic Measurement Analysis Conference (TMA), 9 pages, 2020.](https://tma.ifip.org/2020/wp-content/uploads/sites/9/2020/06/tma2020-camera-paper66.pdf)
 - AdGuard and [their blocklist](https://github.com/AdguardTeam/cname-trackers)'s contributors
 The list was generated using data from
 - [Cisco Umbrella Popularity List](http://s3-us-west-1.amazonaws.com/umbrella-static/index.html)
 - [Public DNS Server List](https://public-dns.info/)
 Similar projects:
 - [NextDNS blocklist](https://github.com/nextdns/cname-cloaking-blocklist): for DNS-aware ad blockers
 - [Stefan Froberg's lists](https://www.orwell1984.today/cname/): subset of those lists grouped by tracker
 - [AdGuard blocklist](https://github.com/AdguardTeam/cname-trackers): same thing with a bigger scope, maintained by a bigger team
--- a/dist/markdown7.min.css
+++ b/dist/markdown7.min.css
@ -1,2 +0,0 @@
 /* Source: https://github.com/jasonm23/markdown-css-themes */
 body{font-family:Helvetica,arial,sans-serif;font-size:14px;line-height:1.6;padding-top:10px;padding-bottom:10px;background-color:#fff;padding:30px}body>:first-child{margin-top:0!important}body>:last-child{margin-bottom:0!important}a{color:#4183c4}a.absent{color:#c00}a.anchor{display:block;padding-left:30px;margin-left:-30px;cursor:pointer;position:absolute;top:0;left:0;bottom:0}h1,h2,h3,h4,h5,h6{margin:20px 0 10px;padding:0;font-weight:700;-webkit-font-smoothing:antialiased;cursor:text;position:relative}h1:hover a.anchor,h2:hover a.anchor,h3:hover a.anchor,h4:hover a.anchor,h5:hover a.anchor,h6:hover a.anchor{text-decoration:none}h1 code,h1 tt{font-size:inherit}h2 code,h2 tt{font-size:inherit}h3 code,h3 tt{font-size:inherit}h4 code,h4 tt{font-size:inherit}h5 code,h5 tt{font-size:inherit}h6 code,h6 tt{font-size:inherit}h1{font-size:28px;color:#000}h2{font-size:24px;border-bottom:1px solid #ccc;color:#000}h3{font-size:18px}h4{font-size:16px}h5{font-size:14px}h6{color:#777;font-size:14px}blockquote,dl,li,ol,p,pre,table,ul{margin:15px 0}hr{border:0 none;color:#ccc;height:4px;padding:0}body>h2:first-child{margin-top:0;padding-top:0}body>h1:first-child{margin-top:0;padding-top:0}body>h1:first-child+h2{margin-top:0;padding-top:0}body>h3:first-child,body>h4:first-child,body>h5:first-child,body>h6:first-child{margin-top:0;padding-top:0}a:first-child h1,a:first-child h2,a:first-child h3,a:first-child h4,a:first-child h5,a:first-child h6{margin-top:0;padding-top:0}h1 p,h2 p,h3 p,h4 p,h5 p,h6 p{margin-top:0}li p.first{display:inline-block}li{margin:0}ol,ul{padding-left:30px}ol :first-child,ul :first-child{margin-top:0}dl{padding:0}dl dt{font-size:14px;font-weight:700;font-style:italic;padding:0;margin:15px 0 5px}dl dt:first-child{padding:0}dl dt>:first-child{margin-top:0}dl dt>:last-child{margin-bottom:0}dl dd{margin:0 0 15px;padding:0 15px}dl dd>:first-child{margin-top:0}dl dd>:last-child{margin-bottom:0}blockquote{border-left:4px solid #ddd;padding:0 15px;color:#777}blockquote>:first-child{margin-top:0}blockquote>:last-child{margin-bottom:0}table{padding:0;border-collapse:collapse}table tr{border-top:1px solid #ccc;background-color:#fff;margin:0;padding:0}table tr:nth-child(2n){background-color:#f8f8f8}table tr th{font-weight:700;border:1px solid #ccc;margin:0;padding:6px 13px}table tr td{border:1px solid #ccc;margin:0;padding:6px 13px}table tr td :first-child,table tr th :first-child{margin-top:0}table tr td :last-child,table tr th :last-child{margin-bottom:0}img{max-width:100%}span.frame{display:block;overflow:hidden}span.frame>span{border:1px solid #ddd;display:block;float:left;overflow:hidden;margin:13px 0 0;padding:7px;width:auto}span.frame span img{display:block;float:left}span.frame span span{clear:both;color:#333;display:block;padding:5px 0 0}span.align-center{display:block;overflow:hidden;clear:both}span.align-center>span{display:block;overflow:hidden;margin:13px auto 0;text-align:center}span.align-center span img{margin:0 auto;text-align:center}span.align-right{display:block;overflow:hidden;clear:both}span.align-right>span{display:block;overflow:hidden;margin:13px 0 0;text-align:right}span.align-right span img{margin:0;text-align:right}span.float-left{display:block;margin-right:13px;overflow:hidden;float:left}span.float-left span{margin:13px 0 0}span.float-right{display:block;margin-left:13px;overflow:hidden;float:right}span.float-right>span{display:block;overflow:hidden;margin:13px auto 0;text-align:right}code,tt{margin:0 2px;padding:0 5px;white-space:nowrap;border:1px solid #eaeaea;background-color:#f8f8f8;border-radius:3px}pre code{margin:0;padding:0;white-space:pre;border:none;background:0 0}.highlight pre{background-color:#f8f8f8;border:1px solid #ccc;font-size:13px;line-height:19px;overflow:auto;padding:6px 10px;border-radius:3px}pre{background-color:#f8f8f8;border:1px solid #ccc;font-size:13px;line-height:19px;overflow:auto;padding:6px 10px;border-radius:3px}pre code,pre tt{background-color:transparent;border:none}sup{font-size:.83em;vertical-align:super;line-height:0}*{-webkit-print-color-adjust:exact}@media screen and (min-width:914px){body{width:854px;margin:0 auto}}@media print{pre,table{page-break-inside:avoid}pre{word-wrap:break-word}}
--- a/eulaurarien.sh
+++ b/eulaurarien.sh
@ -2,13 +2,8 @@
 # Main script for eulaurarien
 [ ! -f .env ] && touch .env
 ./fetch_resources.sh
 ./collect_subdomains.sh
 ./import_rules.sh
 ./resolve_subdomains.sh
-./prune.sh
+./filter_subdomains.sh
 ./export_lists.sh
 ./generate_index.py
--- a/export.py
+++ b/export.py
@ -5,80 +5,53 @@ import argparse
 import sys
-if __name__ == "__main__":
+if __name__ == '__main__':
    # Parsing arguments
    parser = argparse.ArgumentParser(
-        description="Export the hostnames rules stored " "in the Database as plain text"
+        description="Export the hostnames rules stored "
-    )
+        "in the Database as plain text")
    parser.add_argument(
-        "-o",
+        '-o', '--output', type=argparse.FileType('w'), default=sys.stdout,
-        "--output",
+        help="Output file, one rule per line")
        type=argparse.FileType("w"),
        default=sys.stdout,
        help="Output file, one rule per line",
    )
    parser.add_argument(
-        "-f",
+        '-f', '--first-party', action='store_true',
-        "--first-party",
+        help="Only output rules issued from first-party sources")
        action="store_true",
        help="Only output rules issued from first-party sources",
    )
    parser.add_argument(
-        "-e",
+        '-e', '--end-chain', action='store_true',
-        "--end-chain",
+        help="Only output rules that are not referenced by any other")
        action="store_true",
        help="Only output rules that are not referenced by any other",
    )
    parser.add_argument(
-        "-r",
+        '-r', '--rules', action='store_true',
-        "--rules",
+        help="Output all kinds of rules, not just hostnames")
        action="store_true",
        help="Output all kinds of rules, not just hostnames",
    )
    parser.add_argument(
-        "-b",
+        '-b', '--base-rules', action='store_true',
        "--base-rules",
        action="store_true",
        help="Output base rules "
        "(the ones added by ./feed_rules.py) "
-        "(implies --rules)",
+        "(implies --rules)")
    )
    parser.add_argument(
-        "-d",
+        '-d', '--no-dupplicates', action='store_true',
        "--no-dupplicates",
        action="store_true",
        help="Do not output rules that already match a zone/network rule "
-        "(e.g. dummy.example.com when there's a zone example.com rule)",
+        "(e.g. dummy.example.com when there's a zone example.com rule)")
    )
    parser.add_argument(
-        "-x",
+        '-x', '--explain', action='store_true',
        "--explain",
        action="store_true",
        help="Show the chain of rules leading to one "
-        "(and the number of references they have)",
+        "(and the number of references they have)")
    )
    parser.add_argument(
-        "-c",
+        '-c', '--count', action='store_true',
-        "--count",
+        help="Show the number of rules per type instead of listing them")
        action="store_true",
        help="Show the number of rules per type instead of listing them",
    )
    args = parser.parse_args()
    DB = database.Database()
    if args.count:
        assert not args.explain
-        print(
+        print(DB.count_records(
            DB.count_records(
            first_party_only=args.first_party,
            end_chain_only=args.end_chain,
            no_dupplicates=args.no_dupplicates,
            rules_only=args.base_rules,
            hostnames_only=not (args.rules or args.base_rules),
-            )
+        ))
        )
    else:
        for domain in DB.list_records(
            first_party_only=args.first_party,
--- a/export_lists.sh
+++ b/export_lists.sh
@ -5,13 +5,11 @@ function log() {
 }
 log "Calculating statistics…"
 oldest="$(cat last_updates/*.txt | sort -n | head -1)"
 oldest_date=$(date -Isec -d @$oldest)
 gen_date=$(date -Isec)
 gen_software=$(git describe --tags)
 number_websites=$(wc -l < temp/all_websites.list)
 number_subdomains=$(wc -l < temp/all_subdomains.list)
-number_dns=$(grep 'NOERROR' temp/all_resolved.txt | wc -l)
+number_dns=$(grep '^$' temp/all_resolved.txt | wc -l)
 for partyness in {first,multi}
 do
@ -22,19 +20,15 @@ do
        partyness_flags=""
    fi
    rules_input=$(./export.py --count --base-rules $partyness_flags)
    rules_found=$(./export.py --count --rules $partyness_flags)
    rules_found_nd=$(./export.py --count --rules --no-dupplicates $partyness_flags)
    echo
    echo "Statistics for ${partyness}-party trackers"
-    echo "Input rules: $rules_input"
+    echo "Input rules: $(./export.py --count --base-rules $partyness_flags)"
-    echo "Subsequent rules: $rules_found"
+    echo "Subsequent rules: $(./export.py --count --rules $partyness_flags)"
-    echo "Subsequent rules (no dupplicate): $rules_found_nd"
+    echo "Subsequent rules (no dupplicate): $(./export.py --count --rules --no-dupplicates $partyness_flags)"
    echo "Output hostnames: $(./export.py --count $partyness_flags)"
    echo "Output hostnames (no dupplicate): $(./export.py --count --no-dupplicates $partyness_flags)"
    echo "Output hostnames (end-chain only): $(./export.py --count --end-chain $partyness_flags)"
    echo "Output hostnames (no dupplicate, end-chain only): $(./export.py --count --no-dupplicates --end-chain $partyness_flags)"
    echo
    for trackerness in {trackers,only-trackers}
    do
@ -42,7 +36,7 @@ do
        then
            trackerness_flags=""
        else
-            trackerness_flags="--no-dupplicates"
+            trackerness_flags="--end-chain --no-dupplicates"
        fi
        file_list="dist/${partyness}party-${trackerness}.txt"
        file_host="dist/${partyness}party-${trackerness}-hosts.txt"
@ -55,32 +49,45 @@ do
        # so this is done in two steps
        sort -u $file_list -o $file_list
        rules_input=$(./export.py --count --base-rules $partyness_flags)
        rules_found=$(./export.py --count --rules $partyness_flags)
        rules_output=$(./export.py --count $partyness_flags $trackerness_flags)
        function link() { # link partyness, link trackerness
            url="https://hostfiles.frogeye.fr/${1}party-${2}-hosts.txt"
            if [ "$1" = "$partyness" ] && [ "$2" = "$trackerness" ]
            then
                url="$url (this one)"
            fi
            echo $url
        }
        (
            echo "# First-party trackers host list"
            echo "# Variant: ${partyness}-party ${trackerness}"
            echo "#"
-            echo "# About first-party trackers: https://hostfiles.frogeye.fr/#whats-a-first-party-tracker"
+            echo "# About first-party trackers: TODO"
            echo "# Source code: https://git.frogeye.fr/geoffrey/eulaurarien"
            echo "#"
            echo "# In case of false positives/negatives, or any other question,"
            echo "# contact me the way you like: https://geoffrey.frogeye.fr"
            echo "#"
-            echo "# Latest versions and variants: https://hostfiles.frogeye.fr/#list-variants"
+            echo "# Latest versions and variants:"
-            echo "# Source code: https://git.frogeye.fr/geoffrey/eulaurarien"
+            echo "# - First-party trackers  : $(link first trackers)"
-            echo "# License: https://git.frogeye.fr/geoffrey/eulaurarien/src/branch/master/LICENSE"
+            echo "# - … excluding redirected: $(link first only-trackers)"
-            echo "# Acknowledgements: https://hostfiles.frogeye.fr/#acknowledgements"
+            echo "# - First and third party : $(link multi trackers)"
            echo "# - … excluding redirected: $(link multi only-trackers)"
            echo '# (variants informations: TODO)'
            echo '# (you can remove `-hosts` to get the raw list)'
            echo "#"
            echo "# Generation date: $gen_date"
            echo "# Generation software: eulaurarien $gen_software"
            echo "# List generation date: $gen_date"
            echo "# Oldest record: $oldest_date"
            echo "# Number of source websites: $number_websites"
            echo "# Number of source subdomains: $number_subdomains"
-            echo "# Number of source DNS records: $number_dns"
+            echo "# Number of source DNS records: ~2E9 + $number_dns"
            echo "#"
            echo "# Input rules: $rules_input"
            echo "# Subsequent rules: $rules_found"
            echo "# … no dupplicates: $rules_found_nd"
            echo "# Output rules: $rules_output"
            echo "#"
            echo
@ -89,10 +96,3 @@ do
    done
 done
 if [ -d explanations ]
 then
    filename="$(date -Isec).txt"
    ./export.py --explain > "explanations/$filename"
    ln --force --symbolic "$filename" "explanations/latest.txt"
 fi
--- a/feed_asn.py
+++ b/feed_asn.py
@ -13,54 +13,57 @@ IPNetwork = typing.Union[ipaddress.IPv4Network, ipaddress.IPv6Network]
 def get_ranges(asn: str) -> typing.Iterable[str]:
    req = requests.get(
-        "https://stat.ripe.net/data/as-routing-consistency/data.json",
+        'https://stat.ripe.net/data/as-routing-consistency/data.json',
-        params={"resource": asn},
+        params={'resource': asn}
    )
    data = req.json()
-    for pref in data["data"]["prefixes"]:
+    for pref in data['data']['prefixes']:
-        yield pref["prefix"]
+        yield pref['prefix']
 def get_name(asn: str) -> str:
    req = requests.get(
-        "https://stat.ripe.net/data/as-overview/data.json", params={"resource": asn}
+        'https://stat.ripe.net/data/as-overview/data.json',
        params={'resource': asn}
    )
    data = req.json()
-    return data["data"]["holder"]
+    return data['data']['holder']
-if __name__ == "__main__":
+if __name__ == '__main__':
-    log = logging.getLogger("feed_asn")
+    log = logging.getLogger('feed_asn')
    # Parsing arguments
    parser = argparse.ArgumentParser(
-        description="Add the IP ranges associated to the AS in the database"
+        description="Add the IP ranges associated to the AS in the database")
    )
    args = parser.parse_args()
    DB = database.Database()
-    def add_ranges(
+    def add_ranges(path: database.Path,
        path: database.Path,
                   match: database.Match,
                   ) -> None:
        assert isinstance(path, database.AsnPath)
        assert isinstance(match, database.AsnNode)
        asn_str = database.Database.unpack_asn(path)
-        DB.enter_step("asn_get_name")
+        DB.enter_step('asn_get_name')
        name = get_name(asn_str)
        match.name = name
-        DB.enter_step("asn_get_ranges")
+        DB.enter_step('asn_get_ranges')
        for prefix in get_ranges(asn_str):
            parsed_prefix: IPNetwork = ipaddress.ip_network(prefix)
            if parsed_prefix.version == 4:
-                DB.set_ip4network(prefix, source=path, updated=int(time.time()))
+                DB.set_ip4network(
-                log.info("Added %s from %s (%s)", prefix, path, name)
+                    prefix,
                    source=path,
                    updated=int(time.time())
                )
                log.info('Added %s from %s (%s)', prefix, path, name)
            elif parsed_prefix.version == 6:
-                log.warning("Unimplemented prefix version: %s", prefix)
+                log.warning('Unimplemented prefix version: %s', prefix)
            else:
-                log.error("Unknown prefix version: %s", prefix)
+                log.error('Unknown prefix version: %s', prefix)
    for _ in DB.exec_each_asn(add_ranges):
        pass
--- a/feed_dns.py
+++ b/feed_dns.py
@ -12,15 +12,15 @@ Record = typing.Tuple[typing.Callable, typing.Callable, int, str, str]
 # select, write
 FUNCTION_MAP: typing.Any = {
-    "a": (
+    'a': (
        database.Database.get_ip4,
        database.Database.set_hostname,
    ),
-    "cname": (
+    'cname': (
        database.Database.get_domain,
        database.Database.set_hostname,
    ),
-    "ptr": (
+    'ptr': (
        database.Database.get_domain,
        database.Database.set_ip4address,
    ),
@ -28,56 +28,41 @@ FUNCTION_MAP: typing.Any = {
 class Writer(multiprocessing.Process):
-    def __init__(
+    def __init__(self,
-        self,
+                 recs_queue: multiprocessing.Queue,
        recs_queue: multiprocessing.Queue = None,
                 autosave_interval: int = 0,
                 ip4_cache: int = 0,
                 ):
        if recs_queue:  # MP
        super(Writer, self).__init__()
        self.log = logging.getLogger(f'wr')
        self.recs_queue = recs_queue
        self.log = logging.getLogger("wr")
        self.autosave_interval = autosave_interval
        self.ip4_cache = ip4_cache
        if not recs_queue:  # No MP
            self.open_db()
    def open_db(self) -> None:
        self.db = database.Database()
        self.db.log = logging.getLogger("wr")
        self.db.fill_ip4cache(max_size=self.ip4_cache)
    def exec_record(self, record: Record) -> None:
        self.db.enter_step("exec_record")
        select, write, updated, name, value = record
        try:
            for source in select(self.db, value):
                write(self.db, name, updated, source=source)
        except (ValueError, IndexError):
            # ValueError: non-number in IP
            # IndexError: IP too big
            self.log.exception("Cannot execute: %s", record)
    def end(self) -> None:
        self.db.enter_step("end")
        self.db.save()
    def run(self) -> None:
-        self.open_db()
+        self.db = database.Database()
        self.db.log = logging.getLogger(f'wr')
        self.db.fill_ip4cache(max_size=self.ip4_cache)
        if self.autosave_interval > 0:
            next_save = time.time() + self.autosave_interval
        else:
            next_save = 0
-        self.db.enter_step("block_wait")
+        self.db.enter_step('block_wait')
        block: typing.List[Record]
        for block in iter(self.recs_queue.get, None):
            assert block
            record: Record
            for record in block:
-                self.exec_record(record)
+
                select, write, updated, name, value = record
                self.db.enter_step('feed_switch')
                try:
                    for source in select(self.db, value):
                        write(self.db, name, updated, source=source)
                except ValueError:
                    self.log.exception("Cannot execute: %s", record)
            if next_save > 0 and time.time() > next_save:
                self.log.info("Saving database...")
@ -85,44 +70,37 @@ class Writer(multiprocessing.Process):
                self.log.info("Done!")
                next_save = time.time() + self.autosave_interval
-            self.db.enter_step("block_wait")
+            self.db.enter_step('block_wait')
-        self.end()
+
        self.db.enter_step('end')
        self.db.save()
-class Parser:
+class Parser():
-    def __init__(
+    def __init__(self,
        self,
                 buf: typing.Any,
-        recs_queue: multiprocessing.Queue = None,
+                 recs_queue: multiprocessing.Queue,
-        block_size: int = 0,
+                 block_size: int,
        writer: Writer = None,
                 ):
-        assert bool(writer) ^ bool(block_size and recs_queue)
+        super(Parser, self).__init__()
        self.buf = buf
-        self.log = logging.getLogger("pr")
+        self.log = logging.getLogger('pr')
        self.recs_queue = recs_queue
        if writer:  # No MP
            self.prof: database.Profiler = writer.db
            self.register = writer.exec_record
        else:  # MP
        self.block: typing.List[Record] = list()
        self.block_size = block_size
        self.prof = database.Profiler()
-            self.prof.log = logging.getLogger("pr")
+        self.prof.log = logging.getLogger('pr')
            self.register = self.add_to_queue
-    def add_to_queue(self, record: Record) -> None:
+    def register(self, record: Record) -> None:
-        self.prof.enter_step("register")
+        self.prof.enter_step('register')
        self.block.append(record)
        if len(self.block) >= self.block_size:
-            self.prof.enter_step("put_block")
+            self.prof.enter_step('put_block')
            assert self.recs_queue
            self.recs_queue.put(self.block)
            self.block = list()
    def run(self) -> None:
        self.consume()
        if self.recs_queue:
        self.recs_queue.put(self.block)
        self.prof.profile()
@ -130,17 +108,43 @@ class Parser:
        raise NotImplementedError
 class Rapid7Parser(Parser):
    def consume(self) -> None:
        data = dict()
        for line in self.buf:
            self.prof.enter_step('parse_rapid7')
            split = line.split('"')
            try:
                for k in range(1, 14, 4):
                    key = split[k]
                    val = split[k+2]
                    data[key] = val
                select, writer = FUNCTION_MAP[data['type']]
                record = (
                    select,
                    writer,
                    int(data['timestamp']),
                    data['name'],
                    data['value']
                )
            except IndexError:
                self.log.exception("Cannot parse: %s", line)
            self.register(record)
 class MassDnsParser(Parser):
    # massdns --output Snrql
    # --retry REFUSED,SERVFAIL --resolvers nameservers-ipv4
    TYPES = {
-        "A": (FUNCTION_MAP["a"][0], FUNCTION_MAP["a"][1], -1, None),
+        'A': (FUNCTION_MAP['a'][0], FUNCTION_MAP['a'][1], -1, None),
        # 'AAAA': (FUNCTION_MAP['aaaa'][0], FUNCTION_MAP['aaaa'][1], -1, None),
-        "CNAME": (FUNCTION_MAP["cname"][0], FUNCTION_MAP["cname"][1], -1, -1),
+        'CNAME': (FUNCTION_MAP['cname'][0], FUNCTION_MAP['cname'][1], -1, -1),
    }
    def consume(self) -> None:
-        self.prof.enter_step("parse_massdns")
+        self.prof.enter_step('parse_massdns')
        timestamp = 0
        header = True
        for line in self.buf:
@ -149,102 +153,74 @@ class MassDnsParser(Parser):
                header = True
                continue
-            split = line.split(" ")
+            split = line.split(' ')
            try:
                if header:
                    timestamp = int(split[1])
                    header = False
                else:
-                    select, write, name_offset, value_offset = MassDnsParser.TYPES[
+                    select, write, name_offset, value_offset = \
-                        split[1]
+                        MassDnsParser.TYPES[split[1]]
                    ]
                    record = (
                        select,
                        write,
                        timestamp,
-                        split[0][:name_offset].lower(),
+                        split[0][:name_offset],
-                        split[2][:value_offset].lower(),
+                        split[2][:value_offset],
                    )
                    self.register(record)
-                    self.prof.enter_step("parse_massdns")
+                    self.prof.enter_step('parse_massdns')
            except KeyError:
                continue
 PARSERS = {
-    "massdns": MassDnsParser,
+    'rapid7': Rapid7Parser,
    'massdns': MassDnsParser,
 }
-if __name__ == "__main__":
+if __name__ == '__main__':
    # Parsing arguments
-    log = logging.getLogger("feed_dns")
+    log = logging.getLogger('feed_dns')
    args_parser = argparse.ArgumentParser(
        description="Read DNS records and import "
-        "tracking-relevant data into the database"
+        "tracking-relevant data into the database")
    )
    args_parser.add_argument("parser", choices=PARSERS.keys(), help="Input format")
    args_parser.add_argument(
-        "-i",
+        'parser',
-        "--input",
+        choices=PARSERS.keys(),
-        type=argparse.FileType("r"),
+        help="Input format")
        default=sys.stdin,
        help="Input file",
    )
    args_parser.add_argument(
-        "-b", "--block-size", type=int, default=1024, help="Performance tuning value"
+        '-i', '--input', type=argparse.FileType('r'), default=sys.stdin,
-    )
+        help="Input file")
    args_parser.add_argument(
-        "-q", "--queue-size", type=int, default=128, help="Performance tuning value"
+        '-b', '--block-size', type=int, default=1024,
-    )
+        help="Performance tuning value")
    args_parser.add_argument(
-        "-a",
+        '-q', '--queue-size', type=int, default=128,
-        "--autosave-interval",
+        help="Performance tuning value")
        type=int,
        default=900,
        help="Interval to which the database will save in seconds. " "0 to disable.",
    )
    args_parser.add_argument(
-        "-s",
+        '-a', '--autosave-interval', type=int, default=900,
-        "--single-process",
+        help="Interval to which the database will save in seconds. "
-        action="store_true",
+        "0 to disable.")
        help="Only use one process. " "Might be useful for single core computers.",
    )
    args_parser.add_argument(
-        "-4",
+        '-4', '--ip4-cache', type=int, default=0,
        "--ip4-cache",
        type=int,
        default=0,
        help="RAM cache for faster IPv4 lookup. "
        "Maximum useful value: 512 MiB (536870912). "
        "Warning: Depending on the rules, this might already "
-        "be a memory-heavy process, even without the cache.",
+        "be a memory-heavy process, even without the cache.")
    )
    args = args_parser.parse_args()
    parser_cls = PARSERS[args.parser]
    if args.single_process:
        writer = Writer(
            autosave_interval=args.autosave_interval, ip4_cache=args.ip4_cache
        )
        parser = parser_cls(args.input, writer=writer)
        parser.run()
        writer.end()
    else:
    recs_queue: multiprocessing.Queue = multiprocessing.Queue(
-            maxsize=args.queue_size
+        maxsize=args.queue_size)
        )
-        writer = Writer(
+    writer = Writer(recs_queue,
            recs_queue,
                    autosave_interval=args.autosave_interval,
-            ip4_cache=args.ip4_cache,
+                    ip4_cache=args.ip4_cache
                    )
    writer.start()
-        parser = parser_cls(
+    parser = PARSERS[args.parser](args.input, recs_queue, args.block_size)
            args.input, recs_queue=recs_queue, block_size=args.block_size
        )
    parser.run()
    recs_queue.put(None)
--- a/feed_rules.py
+++ b/feed_rules.py
@ -4,36 +4,30 @@ import database
 import argparse
 import sys
 import time
 import typing
 FUNCTION_MAP = {
-    "zone": database.Database.set_zone,
+    'zone': database.Database.set_zone,
-    "hostname": database.Database.set_hostname,
+    'hostname': database.Database.set_hostname,
-    "asn": database.Database.set_asn,
+    'asn': database.Database.set_asn,
-    "ip4network": database.Database.set_ip4network,
+    'ip4network': database.Database.set_ip4network,
-    "ip4address": database.Database.set_ip4address,
+    'ip4address': database.Database.set_ip4address,
 }
-if __name__ == "__main__":
+if __name__ == '__main__':
    # Parsing arguments
-    parser = argparse.ArgumentParser(description="Import base rules to the database")
+    parser = argparse.ArgumentParser(
        description="Import base rules to the database")
    parser.add_argument(
-        "type", choices=FUNCTION_MAP.keys(), help="Type of rule inputed"
+        'type',
-    )
+        choices=FUNCTION_MAP.keys(),
        help="Type of rule inputed")
    parser.add_argument(
-        "-i",
+        '-i', '--input', type=argparse.FileType('r'), default=sys.stdin,
-        "--input",
+        help="File with one rule per line")
        type=argparse.FileType("r"),
        default=sys.stdin,
        help="File with one rule per line",
    )
    parser.add_argument(
-        "-f",
+        '-f', '--first-party', action='store_true',
-        "--first-party",
+        help="The input only comes from verified first-party sources")
        action="store_true",
        help="The input only comes from verified first-party sources",
    )
    args = parser.parse_args()
    DB = database.Database()
@ -49,8 +43,7 @@ if __name__ == "__main__":
    for rule in args.input:
        rule = rule.strip()
        try:
-            fun(
+            fun(DB,
                DB,
                rule,
                source=source,
                updated=int(time.time()),
--- a/fetch_resources.sh
+++ b/fetch_resources.sh
@ -13,15 +13,10 @@ function dl() {
    fi
 }
 log "Retrieving tests…"
 rm -f tests/*.cache.csv
 dl https://raw.githubusercontent.com/fukuda-lab/cname_cloaking/master/Subdomain_CNAME-cloaking-based-tracking.csv temp/fukuda.csv
 (echo "url,allow,deny,comment"; tail -n +2 temp/fukuda.csv | awk -F, '{ print "https://" $2 "/,," $3 "," $5 }') > tests/fukuda.cache.csv
 log "Retrieving rules…"
 rm -f rules*/*.cache.*
 dl https://easylist.to/easylist/easyprivacy.txt rules_adblock/easyprivacy.cache.txt
 dl https://filters.adtidy.org/extension/chromium/filters/3.txt rules_adblock/adguard.cache.txt
 log "Retrieving TLD list…"
 dl http://data.iana.org/TLD/tlds-alpha-by-domain.txt temp/all_tld.temp.list
@ -38,7 +33,7 @@ rm top-1m.csv top-1m.csv.zip
 if [ -f subdomains/cisco-umbrella_popularity.cache.list ]
 then
    cp subdomains/cisco-umbrella_popularity.cache.list temp/cisco-umbrella_popularity.old.list
-    pv -f temp/cisco-umbrella_popularity.old.list temp/cisco-umbrella_popularity.fresh.list | sort -u > subdomains/cisco-umbrella_popularity.cache.list
+    pv temp/cisco-umbrella_popularity.old.list temp/cisco-umbrella_popularity.fresh.list | sort -u > subdomains/cisco-umbrella_popularity.cache.list
    rm temp/cisco-umbrella_popularity.old.list temp/cisco-umbrella_popularity.fresh.list
 else
    mv temp/cisco-umbrella_popularity.fresh.list subdomains/cisco-umbrella_popularity.cache.list
--- a/generate_index.py
+++ b/generate_index.py
@ -1,25 +0,0 @@
 #!/usr/bin/env python3
 import markdown2
 extras = ["header-ids"]
 with open("dist/README.md", "r") as fdesc:
    body = markdown2.markdown(fdesc.read(), extras=extras)
 output = f"""<!DOCTYPE html>
 <html lang="en">
 <head>
 <title>Geoffrey Frogeye's block list of first-party trackers</title>
 <meta charset="utf-8">
 <meta name="author" content="Geoffrey 'Frogeye' Preud'homme" />
 <link rel="stylesheet" type="text/css" href="markdown7.min.css">
 </head>
 <body>
 {body}
 </body>
 </html>
 """
 with open("dist/index.html", "w") as fdesc:
    fdesc.write(output)
--- a/import_rapid7.sh
+++ b/import_rapid7.sh
@ -0,0 +1,26 @@
 #!/usr/bin/env bash
 function log() {
    echo -e "\033[33m$@\033[0m"
 }
 function feed_rapid7_fdns { # dataset
    dataset=$1
    line=$(curl -s https://opendata.rapid7.com/sonar.fdns_v2/ | grep "href=\".\+-fdns_$dataset.json.gz\"")
    link="https://opendata.rapid7.com$(echo "$line" | cut -d'"' -f2)"
    log "Reading $(echo "$dataset" | awk '{print toupper($0)}') records from $link"
    curl -L "$link" | gunzip
 }
 function feed_rapid7_rdns {
    dataset=$1
    line=$(curl -s https://opendata.rapid7.com/sonar.rdns_v2/ | grep "href=\".\+-rdns.json.gz\"")
    link="https://opendata.rapid7.com$(echo "$line" | cut -d'"' -f2)"
    log "Reading PTR records from $link"
    curl -L "$link" | gunzip
 }
 feed_rapid7_rdns | ./feed_dns.py rapid7
 feed_rapid7_fdns a | ./feed_dns.py rapid7 --ip4-cache 536870912
 # feed_rapid7_fdns aaaa | ./feed_dns.py rapid7 --ip6-cache 536870912
 feed_rapid7_fdns cname | ./feed_dns.py rapid7
--- a/import_rules.sh
+++ b/import_rules.sh
@ -5,7 +5,7 @@ function log() {
 }
 log "Importing rules…"
-date +%s > "last_updates/rules.txt"
+BEFORE="$(date +%s)"
 cat rules_adblock/*.txt | grep -v '^!' | grep -v '^\[Adblock' | ./adblock_to_domain_list.py | ./feed_rules.py zone
 cat rules_hosts/*.txt | grep -v '^#' | grep -v '^$' | cut -d ' ' -f2 | ./feed_rules.py zone
 cat rules/*.list | grep -v '^#' | grep -v '^$' | ./feed_rules.py zone
@ -18,3 +18,5 @@ cat rules_asn/first-party.txt | grep -v '^#' | grep -v '^$' | ./feed_rules.py as
 ./feed_asn.py
 # log "Pruning old rules…"
 # ./db.py --prune --prune-before "$BEFORE" --prune-base
--- a/last_updates/.gitignore
+++ b/last_updates/.gitignore
@ -1 +0,0 @@
 *.txt
--- a/prune.sh
+++ b/prune.sh
@ -1,9 +0,0 @@
 #!/usr/bin/env bash
 function log() {
    echo -e "\033[33m$@\033[0m"
 }
 oldest="$(cat last_updates/*.txt | sort -n | head -1)"
 log "Pruning every record before ${oldest}…"
 ./db.py --prune --prune-before "$oldest"
--- a/requirements.txt
+++ b/requirements.txt
@ -1,4 +0,0 @@
 coloredlogs>=10
 markdown2>=2.4<3
 numpy>=1.21<2
 python-abp>=0.2<0.3
--- a/resolve_subdomains.sh
+++ b/resolve_subdomains.sh
@ -1,24 +1,19 @@
 #!/usr/bin/env bash
 source .env.default
 source .env
 function log() {
    echo -e "\033[33m$@\033[0m"
 }
 log "Compiling nameservers…"
-pv -f nameservers/*.list | ./validate_list.py --ip4 | sort -u > temp/all_nameservers_ip4.list
+pv nameservers/*.list | ./validate_list.py --ip4 | sort -u > temp/all_nameservers_ip4.list
-log "Compiling subdomains…"
+log "Compiling subdomain…"
 # Sort by last character to utilize the DNS server caching mechanism
 # (not as efficient with massdns but it's almost free so why not)
-pv -f subdomains/*.list | ./validate_list.py --domain | rev | sort -u | rev > temp/all_subdomains.list
+pv subdomains/*.list | ./validate_list.py --domain | rev | sort -u | rev > temp/all_subdomains.list
 log "Resolving subdomain…"
-date +%s > "last_updates/massdns.txt"
+massdns --output Snrql --retry REFUSED,SERVFAIL --resolvers temp/all_nameservers_ip4.list --outfile temp/all_resolved.txt temp/all_subdomains.list
 "$MASSDNS_BINARY" --output Snrql --hashmap-size "$MASSDNS_HASHMAP_SIZE" --resolvers temp/all_nameservers_ip4.list --outfile temp/all_resolved.txt temp/all_subdomains.list
 log "Importing into database…"
-[ $SINGLE_PROCESS -eq 1 ] && EXTRA_ARGS="--single-process"
+pv temp/all_resolved.txt | ./feed_dns.py massdns
 pv -f temp/all_resolved.txt | ./feed_dns.py massdns --ip4-cache "$CACHE_SIZE" $EXTRA_ARGS
--- a/rules/first-party.list
+++ b/rules/first-party.list
@ -12,17 +12,16 @@ storetail.io
 # Keyade
 keyade.com
 # Adobe Experience Cloud
 # https://experienceleague.adobe.com/docs/analytics/implementation/vars/config-vars/trackingserversecure.html?lang=en#ssl-tracking-server-in-adobe-experience-platform-launch
 omtrdc.net
 2o7.net
-data.adobedc.net
+# ThreatMetrix
-sc.adobedc.net
+online-metrix.net
 # Webtrekk
 wt-eu02.net
 webtrekk.net
 # Otto Group
 oghub.io
-# Intent Media
+# Intent.com
 partner.intentmedia.net
 # Wizaly
 wizaly.com
@ -30,62 +29,3 @@ wizaly.com
 tagcommander.com
 # Ingenious Technologies
 affex.org
 # TraceDock
 a351fec2c318c11ea9b9b0a0ae18fb0b-1529426863.eu-central-1.elb.amazonaws.com
 a5e652663674a11e997c60ac8a4ec150-1684524385.eu-central-1.elb.amazonaws.com
 a88045584548111e997c60ac8a4ec150-1610510072.eu-central-1.elb.amazonaws.com
 afc4d9aa2a91d11e997c60ac8a4ec150-2082092489.eu-central-1.elb.amazonaws.com
 # A8
 trck.a8.net
 # AD EBiS
 # https://prtimes.jp/main/html/rd/p/000000215.000009812.html
 ebis.ne.jp
 # GENIEE
 genieesspv.jp
 # SP-Prod
 sp-prod.net
 # Act-On Software
 actonsoftware.com
 actonservice.com
 # eum-appdynamics.com
 eum-appdynamics.com
 # Extole
 extole.io
 extole.com
 # Eloqua
 hs.eloqua.com
 # segment.com
 xid.segment.com
 # exponea.com
 exponea.com
 # adclear.net
 adclear.net
 # contentsfeed.com
 contentsfeed.com
 # postaffiliatepro.com
 postaffiliatepro.com
 # Sugar Market (Salesfusion)
 msgapp.com
 # Exactag
 exactag.com
 # GMO Internet Group
 ad-cloud.jp
 # Pardot
 pardot.com
 # Fathom
 # https://usefathom.com/docs/settings/custom-domains
 starman.fathomdns.com
 # Lead Forensics
 # https://www.reddit.com/r/pihole/comments/g7qv3e/leadforensics_tracking_domains_blacklist/
 # No real-world data but the website doesn't hide what it does
 ghochv3eng.trafficmanager.net
 # Branch.io
 thirdparty.bnc.lt
 # Plausible.io
 custom.plausible.io
 # DataUnlocker
 # Bit different as it is a proxy to non first-party trackers scripts
 # but it fits I guess.
 smartproxy.dataunlocker.com
 # SAS
 ci360.sas.com
--- a/rules_asn/first-party.txt
+++ b/rules_asn/first-party.txt
@ -4,7 +4,7 @@ AS50234
 AS44788
 AS19750
 AS55569
 # ThreatMetrix
 AS30286
 # Webtrekk
 AS60164
 # Act-On Software
 AS393648
--- a/rules_ip/first-party.txt
+++ b/rules_ip/first-party.txt
--- a/run_tests.py
+++ b/run_tests.py
@ -5,71 +5,30 @@ import os
 import logging
 import csv
-TESTS_DIR = "tests"
+TESTS_DIR = 'tests'
-if __name__ == "__main__":
+if __name__ == '__main__':
    DB = database.Database()
-    log = logging.getLogger("tests")
+    log = logging.getLogger('tests')
    for filename in os.listdir(TESTS_DIR):
        if not filename.lower().endswith(".csv"):
            continue
        log.info("")
        log.info("Running tests from %s", filename)
        path = os.path.join(TESTS_DIR, filename)
-        with open(path, "rt") as fdesc:
+        with open(path, 'rt') as fdesc:
            count_ent = 0
            count_all = 0
            count_den = 0
            pass_ent = 0
            pass_all = 0
            pass_den = 0
            reader = csv.DictReader(fdesc)
            for test in reader:
-                log.debug("Testing %s (%s)", test["url"], test["comment"])
+                log.info("Testing %s (%s)", test['url'], test['comment'])
                count_ent += 1
                passed = True
-                for allow in test["allow"].split(":"):
+                for white in test['white'].split(':'):
-                    if not allow:
+                    if not white:
                        continue
-                    count_all += 1
+                    if any(DB.get_domain(white)):
-                    if any(DB.get_domain(allow)):
+                        log.error("False positive: %s", white)
                        log.error("False positive: %s", allow)
                        passed = False
                    else:
                        pass_all += 1
-                for deny in test["deny"].split(":"):
+                for black in test['black'].split(':'):
-                    if not deny:
+                    if not black:
                        continue
-                    count_den += 1
+                    if not any(DB.get_domain(black)):
-                    if not any(DB.get_domain(deny)):
+                        log.error("False negative: %s", black)
                        log.error("False negative: %s", deny)
                        passed = False
                    else:
                        pass_den += 1
                if passed:
                    pass_ent += 1
            perc_ent = (100 * pass_ent / count_ent) if count_ent else 100
            perc_all = (100 * pass_all / count_all) if count_all else 100
            perc_den = (100 * pass_den / count_den) if count_den else 100
            log.info(
                (
                    "%s: Entries %d/%d (%.2f%%)"
                    " | Allow %d/%d (%.2f%%)"
                    "| Deny %d/%d (%.2f%%)"
                ),
                filename,
                pass_ent,
                count_ent,
                perc_ent,
                pass_all,
                count_all,
                perc_all,
                pass_den,
                count_den,
                perc_den,
            )
--- a/tests/.gitignore
+++ b/tests/.gitignore
@ -1 +0,0 @@
 *.cache.csv
--- a/tests/false-positives.csv
+++ b/tests/false-positives.csv
@ -1,6 +1,5 @@
-url,allow,deny,comment
+url,white,black,comment
 https://support.apple.com,support.apple.com,,EdgeKey / AkamaiEdge
 https://www.pinterest.fr/,i.pinimg.com,,Cedexis
 https://www.tumblr.com/,66.media.tumblr.com,,ChiCDN
 https://www.skype.com/fr/,www.skype.com,,TrafficManager
 https://www.mitsubishicars.com/,www.mitsubishicars.com,,Tracking domain as reverse DNS
--- a/tests/first-party.csv
+++ b/tests/first-party.csv
@ -1,28 +1,10 @@
-url,allow,deny,comment
+url,white,black,comment
 https://www.red-by-sfr.fr/,static.s-sfr.fr,nrg.red-by-sfr.fr,Eulerian
 https://www.cbc.ca/,,smetrics.cbc.ca,2o7 | Ominuture | Adobe Experience Cloud
 https://www.discover.com/,,content.discover.com,ThreatMetrix
 https://www.mytoys.de/,,web.mytoys.de,Webtrekk
 https://www.baur.de/,,tp.baur.de,Otto Group
 https://www.liligo.com/,,compare.liligo.com,???
 https://www.boulanger.com/,,tag.boulanger.fr,TagCommander
 https://www.airfrance.fr/FR/,,tk.airfrance.fr,Wizaly
 https://www.vsgamers.es/,,marketing.net.vsgamers.es,Affex
 https://www.vacansoleil.fr/,,tdep.vacansoleil.fr,TraceDock
 https://www.ozmall.co.jp/,,js.enhance.co.jp,GENIEE
 https://www.thetimes.co.uk/,,cmp.thetimes.co.uk,SP-Prod
 https://agilent.com/,,seahorseinfo.agilent.com,Act-On Software
 https://halifax.co.uk/,,cem.halifax.co.uk,eum-appdynamics.com
 https://www.reallygoodstuff.com/,,refer.reallygoodstuff.com,Extole
 https://unity.com/,,eloqua-trackings.unity.com,Eloqua
 https://www.notino.gr/,,api.campaigns.notino.com,Exponea
 https://www.mytoys.de/,,0815.mytoys.de.adclear.net
 https://www.imbc.com/,,ads.imbc.com.contentsfeed.com
 https://www.cbdbiocare.com/,,affiliate.cbdbiocare.com,postaffiliatepro.com
 https://www.seatadvisor.com/,,marketing.seatadvisor.com,Sugar Market (Salesfusion)
 https://www.tchibo.de/,,tagm.tchibo.de,Exactag
 https://www.bouygues-immobilier.com/,,go.bouygues-immobilier.fr,Pardot
 https://caddyserver.com/,,mule.caddysever.com,Fathom
 Reddit.com mail notifications,,click.redditmail.com,Branch.io
 https://www.phpliveregex.com/,,yolo.phpliveregex.xom,Plausible.io
 https://www.earthclassmail.com/,,1avhg3kanx9.www.earthclassmail.com,DataUnlocker
 https://paulfredrick.com/,,execution-ci360.paulfredrick.com,SAS
--- a/validate_list.py
+++ b/validate_list.py
@ -29,7 +29,7 @@ if __name__ == '__main__':
    args = parser.parse_args()
    for line in args.input:
-        line = line[:-1].lower()
+        line = line.strip()
        if (args.domain and database.Database.validate_domain(line)) or \
                (args.ip4 and database.Database.validate_ip4address(line)):
            print(line, file=args.output)
		`@ -1,2 +0,0 @@`
			`/* Source: https://github.com/jasonm23/markdown-css-themes */`
			body{font-family:Helvetica,arial,sans-serif;font-size:14px;line-height:1.6;padding-top:10px;padding-bottom:10px;background-color:#fff;padding:30px}body>:first-child{margin-top:0!important}body>:last-child{margin-bottom:0!important}a{color:#4183c4}a.absent{color:#c00}a.anchor{display:block;padding-left:30px;margin-left:-30px;cursor:pointer;position:absolute;top:0;left:0;bottom:0}h1,h2,h3,h4,h5,h6{margin:20px 0 10px;padding:0;font-weight:700;-webkit-font-smoothing:antialiased;cursor:text;position:relative}h1:hover a.anchor,h2:hover a.anchor,h3:hover a.anchor,h4:hover a.anchor,h5:hover a.anchor,h6:hover a.anchor{text-decoration:none}h1 code,h1 tt{font-size:inherit}h2 code,h2 tt{font-size:inherit}h3 code,h3 tt{font-size:inherit}h4 code,h4 tt{font-size:inherit}h5 code,h5 tt{font-size:inherit}h6 code,h6 tt{font-size:inherit}h1{font-size:28px;color:#000}h2{font-size:24px;border-bottom:1px solid #ccc;color:#000}h3{font-size:18px}h4{font-size:16px}h5{font-size:14px}h6{color:#777;font-size:14px}blockquote,dl,li,ol,p,pre,table,ul{margin:15px 0}hr{border:0 none;color:#ccc;height:4px;padding:0}body>h2:first-child{margin-top:0;padding-top:0}body>h1:first-child{margin-top:0;padding-top:0}body>h1:first-child+h2{margin-top:0;padding-top:0}body>h3:first-child,body>h4:first-child,body>h5:first-child,body>h6:first-child{margin-top:0;padding-top:0}a:first-child h1,a:first-child h2,a:first-child h3,a:first-child h4,a:first-child h5,a:first-child h6{margin-top:0;padding-top:0}h1 p,h2 p,h3 p,h4 p,h5 p,h6 p{margin-top:0}li p.first{display:inline-block}li{margin:0}ol,ul{padding-left:30px}ol :first-child,ul :first-child{margin-top:0}dl{padding:0}dl dt{font-size:14px;font-weight:700;font-style:italic;padding:0;margin:15px 0 5px}dl dt:first-child{padding:0}dl dt>:first-child{margin-top:0}dl dt>:last-child{margin-bottom:0}dl dd{margin:0 0 15px;padding:0 15px}dl dd>:first-child{margin-top:0}dl dd>:last-child{margin-bottom:0}blockquote{border-left:4px solid #ddd;padding:0 15px;color:#777}blockquote>:first-child{margin-top:0}blockquote>:last-child{margin-bottom:0}table{padding:0;border-collapse:collapse}table tr{border-top:1px solid #ccc;background-color:#fff;margin:0;padding:0}table tr:nth-child(2n){background-color:#f8f8f8}table tr th{font-weight:700;border:1px solid #ccc;margin:0;padding:6px 13px}table tr td{border:1px solid #ccc;margin:0;padding:6px 13px}table tr td :first-child,table tr th :first-child{margin-top:0}table tr td :last-child,table tr th :last-child{margin-bottom:0}img{max-width:100%}span.frame{display:block;overflow:hidden}span.frame>span{border:1px solid #ddd;display:block;float:left;overflow:hidden;margin:13px 0 0;padding:7px;width:auto}span.frame span img{display:block;float:left}span.frame span span{clear:both;color:#333;display:block;padding:5px 0 0}span.align-center{display:block;overflow:hidden;clear:both}span.align-center>span{display:block;overflow:hidden;margin:13px auto 0;text-align:center}span.align-center span img{margin:0 auto;text-align:center}span.align-right{display:block;overflow:hidden;clear:both}span.align-right>span{display:block;overflow:hidden;margin:13px 0 0;text-align:right}span.align-right span img{margin:0;text-align:right}span.float-left{display:block;margin-right:13px;overflow:hidden;float:left}span.float-left span{margin:13px 0 0}span.float-right{display:block;margin-left:13px;overflow:hidden;float:right}span.float-right>span{display:block;overflow:hidden;margin:13px auto 0;text-align:right}code,tt{margin:0 2px;padding:0 5px;white-space:nowrap;border:1px solid #eaeaea;background-color:#f8f8f8;border-radius:3px}pre code{margin:0;padding:0;white-space:pre;border:none;background:0 0}.highlight pre{background-color:#f8f8f8;border:1px solid #ccc;font-size:13px;line-height:19px;overflow:auto;padding:6px 10px;border-radius:3px}pre{background-color:#f8f8f8;border:1px solid #ccc;font-size:13px;line-height:19px;overflow:auto;padding:6px 10px;border-radius:3px}pre code,pre tt{background-color:transparent;border:none}sup{font-size:.83em;vertical-align:super;line-height:0}*{-webkit-print-color-adjust:exact}@media screen and (min-width:914px){body{width:854px;margin:0 auto}}@media print{pre,table{page-break-inside:avoid}pre{word-wrap:break-word}}