33 changed files with 522 additions and 931 deletions
--- a/.env.default
+++ b/.env.default
@ -1,5 +0,0 @@
-CACHE_SIZE=536870912
-MASSDNS_HASHMAP_SIZE=1000
-PROFILE=0
-SINGLE_PROCESS=0
-MASSDNS_BINARY=massdns
--- a/.gitignore
+++ b/.gitignore
@ -1,5 +1,2 @@
 *.log
 *.p
-.env
-__pycache__
-explanations
--- a/21
+++ b/21
@ -1,21 +0,0 @@
-MIT License
-
-Copyright (c) 2019 Geoffrey 'Frogeye' Preud'homme
-
-Permission is hereby granted, free of charge, to any person obtaining a copy
-of this software and associated documentation files (the "Software"), to deal
-in the Software without restriction, including without limitation the rights
-to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
-copies of the Software, and to permit persons to whom the Software is
-furnished to do so, subject to the following conditions:
-
-The above copyright notice and this permission notice shall be included in all
-copies or substantial portions of the Software.
-
-THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
-IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
-FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
-AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
-LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
-OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
-SOFTWARE.
--- a/README.md
+++ b/README.md
@ -2,7 +2,7 @@

 This program is able to generate a list of every hostnames being a DNS redirection to a list of DNS zones and IP networks.

-It is primarilyy used to generate [Geoffrey Frogeye's block list of first-party trackers](https://hostfiles.frogeye.fr) (learn about first-party trackers by following this link).
+It is primarilyy used to generate [Geoffrey Frogeye's block list of first-party trackers](https://git.frogeye.fr/geoffrey/eulaurarien/src/branch/master/dist/README.md) (learn about first-party trackers by following this link).

 If you want to contribute but don't want to create an account on this forge, contact me the way you like: <https://geoffrey.frogeye.fr>

@ -18,13 +18,13 @@ This program takes as input:

 It will be able to output hostnames being a DNS redirection to any item in the lists provided.

-DNS records can be locally resolved from a list of subdomains using [MassDNS](https://github.com/blechschmidt/massdns).
+DNS records can either come from [Rapid7 Open Data Sets](https://opendata.rapid7.com/sonar.fdns_v2/) or can be locally resolved from a list of subdomains using [MassDNS](https://github.com/blechschmidt/massdns).

 Those subdomains can either be provided as is, come from [Cisco Umbrella Popularity List](http://s3-us-west-1.amazonaws.com/umbrella-static/index.html), from your browsing history, or from analyzing the traffic a web browser makes when opening an URL (the program provides utility to do all that).

 ## Usage

-Remember you can get an already generated and up-to-date list of first-party trackers from [here](https://hostfiles.frogeye.fr).
+Remember you can get an already generated and up-to-date list of first-party trackers from [here](https://git.frogeye.fr/geoffrey/eulaurarien/src/branch/master/dist/README.md).

 The following is for the people wanting to build their own list.

@ -34,26 +34,19 @@ Depending on the sources you'll be using to generate the list, you'll need to in

 - [Bash](https://www.gnu.org/software/bash/bash.html)
 - [Coreutils](https://www.gnu.org/software/coreutils/)
- [Gawk](https://www.gnu.org/software/gawk/)
 - [curl](https://curl.haxx.se)
 - [pv](http://www.ivarch.com/programs/pv.shtml)
 - [Python 3.4+](https://www.python.org/)
 - [coloredlogs](https://pypi.org/project/coloredlogs/) (sorry I can't help myself)
- [numpy](https://www.numpy.org/)
- [python-abp](https://pypi.org/project/python-abp/) (only if you intend to use AdBlock rules as a rule source)
 - [massdns](https://github.com/blechschmidt/massdns) in your `$PATH` (only if you have subdomains as a source)
 - [Firefox](https://www.mozilla.org/firefox/) (only if you have websites as a source)
 - [selenium (Python bindings)](https://pypi.python.org/pypi/selenium) (only if you have websites as a source)
 - [selenium-wire](https://pypi.org/project/selenium-wire/) (only if you have websites as a source)
- [markdown2](https://pypi.org/project/markdown2/) (only if you intend to generate the index webpage)

 ### Create a new database

 The so-called database (in the form of `blocking.p`) is a file storing all the matching entities (ASN, IPs, hostnames, zones…) and every entity leading to it.
-It exists because the list cannot be generated in one pass, as DNS redirections chain links do not have to be inputed in order.
-
-You can purge of old records the database by running `./prune.sh`.
-When you remove a source of data, remove its corresponding file in `last_updates` to fix the pruning process.
+For now there's no way to remove data from it, so here's the command to recreate it: `./db.py --initialize`.

 ### Gather external sources

@ -86,13 +79,6 @@ In each folder:

 Then, run `./import_rules.sh`.

-If you removed rules and you want to remove every record depending on those rules immediately,
-run the following command:
-
-```
-./db.py --prune --prune-before "$(cat "last_updates/rules.txt")" --prune-base
-```
-
 ### Add subdomains

 If you plan to resolve DNS records yourself (as the DNS records datasets are not exhaustive),
@ -127,36 +113,21 @@ The program will use a list of public nameservers to do that, but you can add yo
 Then, run `./resolve_subdomains.sh`.
 Note that this is a network intensive process, not in term of bandwith, but in terms of packet number.

-> **Note:** Some VPS providers might detect this as a DDoS attack and cut the network access.
+> Some VPS providers might detect this as a DDoS attack and cut the network access.
 > Some Wi-Fi connections can be rendered unusable for other uses, some routers might cease to work.
 > Since massdns does not support yet rate limiting, my best bet was a Raspberry Pi with a slow ethernet link (Raspberry Pi < 4).

 The DNS records will automatically be imported into the database.
 If you want to re-import the records without re-doing the resolving, just run the last line of the `./resolve_subdomains.sh` script.

+### Import DNS records from Rapid7
+
+Just run `./import_rapid7.sh`.
+This will download about 35 GiB of data, but only the matching records will be stored (about a few MiB for the tracking rules).
+Note the download speed will most likely be limited by the database operation thoughput (a quick RAM will help).
+
 ### Export the lists

-For the tracking list, use `./export_lists.sh`, the output will be in the `dist` folder (please change the links before distributing them).
+For the tracking list, use `./export_lists.sh`, the output will be in the `dist` forlder (please change the links before distributing them).
 For other purposes, tinker with the `./export.py` program.

-#### Explanations
-
-Note that if you created an `explanations` folder at the root of the project, a file with a timestamp will be created in it.
-It contains every rule in the database and the reason of their presence (i.e. their dependency).
-This might be useful to track changes between runs.
-
-Every rule has an associated tag with four components:
-
-1. A number: the level of the rule (1 if it is a rule present in the `rules*` folders)
-2. A letter: `F` if first-party, `M` if multi-party.
-3. A letter: `D` if a dupplicate (e.g. `foo.bar.com` if `*.bar.com` is already a rule), `_` if not.
-4. A number: the number of rules relying on this one
-
-### Generate the index webpage
-
-This is the one served on <https://hostfiles.frogeye.fr>.
-Just run `./generate_index.py`.
-
-### Everything
-
-Once you've made sure every step runs fine, you can use `./eulaurarien.sh` to run every step consecutively.
--- a/adblock_to_domain_list.py
+++ b/adblock_to_domain_list.py
@ -16,36 +16,25 @@ import abp.filters
 def get_domains(rule: abp.filters.parser.Filter) -> typing.Iterable[str]:
    if rule.options:
        return
-    selector_type = rule.selector["type"]
-    selector_value = rule.selector["value"]
-    if (
-        selector_type == "url-pattern"
-        and selector_value.startswith("||")
-        and selector_value.endswith("^")
-    ):
+    selector_type = rule.selector['type']
+    selector_value = rule.selector['value']
+    if selector_type == 'url-pattern' \
+            and selector_value.startswith('||') \
+            and selector_value.endswith('^'):
        yield selector_value[2:-1]


-if __name__ == "__main__":
+if __name__ == '__main__':

    # Parsing arguments
    parser = argparse.ArgumentParser(
-        description="Extract whole domains from an AdBlock blocking list"
-    )
+        description="Extract whole domains from an AdBlock blocking list")
    parser.add_argument(
-        "-i",
-        "--input",
-        type=argparse.FileType("r"),
-        default=sys.stdin,
-        help="Input file with AdBlock rules",
-    )
+        '-i', '--input', type=argparse.FileType('r'), default=sys.stdin,
+        help="Input file with AdBlock rules")
    parser.add_argument(
-        "-o",
-        "--output",
-        type=argparse.FileType("w"),
-        default=sys.stdout,
-        help="Outptut file with one rule tracking subdomain per line",
-    )
+        '-o', '--output', type=argparse.FileType('w'), default=sys.stdout,
+        help="Outptut file with one rule tracking subdomain per line")
    args = parser.parse_args()

    # Reading rules
--- a/collect_subdomains.py
+++ b/collect_subdomains.py
@ -14,28 +14,6 @@ import time
 import progressbar
 import selenium.webdriver.firefox.options
 import seleniumwire.webdriver
-import logging
-
-log = logging.getLogger("cs")
-DRIVER = None
-SCROLL_TIME = 10.0
-SCROLL_STEPS = 100
-SCROLL_CMD = f"window.scrollBy(0,document.body.scrollHeight/{SCROLL_STEPS})"
-
-
-def new_driver() -> seleniumwire.webdriver.browser.Firefox:
-    profile = selenium.webdriver.FirefoxProfile()
-    profile.set_preference("privacy.trackingprotection.enabled", False)
-    profile.set_preference("network.cookie.cookieBehavior", 0)
-    profile.set_preference("privacy.trackingprotection.pbmode.enabled", False)
-    profile.set_preference("privacy.trackingprotection.cryptomining.enabled", False)
-    profile.set_preference("privacy.trackingprotection.fingerprinting.enabled", False)
-    options = selenium.webdriver.firefox.options.Options()
-    # options.add_argument('-headless')
-    driver = seleniumwire.webdriver.Firefox(
-        profile, executable_path="geckodriver", options=options
-    )
-    return driver


 def subdomain_from_url(url: str) -> str:
@ -51,36 +29,34 @@ def collect_subdomains(url: str) -> typing.Iterable[str]:
    Load an URL into an headless browser and return all the domains
    it tried to access.
    """
-    global DRIVER
-    if not DRIVER:
-        DRIVER = new_driver()
+    options = selenium.webdriver.firefox.options.Options()
+    options.add_argument('-headless')
+    driver = seleniumwire.webdriver.Firefox(
+        executable_path='geckodriver', options=options)

-    try:
-        DRIVER.get(url)
-        for s in range(SCROLL_STEPS):
-            DRIVER.execute_script(SCROLL_CMD)
-            time.sleep(SCROLL_TIME / SCROLL_STEPS)
-        for request in DRIVER.requests:
+    driver.get(url)
+    time.sleep(10)
+    for request in driver.requests:
        if request.response:
            yield subdomain_from_url(request.path)
-    except Exception:
-        log.exception("Error")
-        DRIVER.quit()
-        DRIVER = None
+    driver.close()


 def collect_subdomains_standalone(url: str) -> None:
    url = url.strip()
    if not url:
        return
+    try:
        for subdomain in collect_subdomains(url):
            print(subdomain)
+    except:
+        pass


-if __name__ == "__main__":
+if __name__ == '__main__':
    assert len(sys.argv) <= 2
    filename = None
-    if len(sys.argv) == 2 and sys.argv[1] != "-":
+    if len(sys.argv) == 2 and sys.argv[1] != '-':
        filename = sys.argv[1]
        num_lines = sum(1 for line in open(filename))
        iterator = progressbar.progressbar(open(filename), max_value=num_lines)
@ -90,8 +66,5 @@ if __name__ == "__main__":
    for line in iterator:
        collect_subdomains_standalone(line)

-    if DRIVER:
-        DRIVER.quit()
-
    if filename:
        iterator.close()
--- a/database.py
+++ b/database.py
@ -11,34 +11,37 @@ import coloredlogs
 import pickle
 import numpy
 import math
-import os

 TLD_LIST: typing.Set[str] = set()

-coloredlogs.install(level="DEBUG", fmt="%(asctime)s %(name)s %(levelname)s %(message)s")
+coloredlogs.install(
+    level='DEBUG',
+    fmt='%(asctime)s %(name)s %(levelname)s %(message)s'
+)

 Asn = int
 Timestamp = int
 Level = int


-class Path:
+class Path():
+    # FP add boolean here
    pass


 class RulePath(Path):
    def __str__(self) -> str:
-        return "(rule)"
+        return '(rule)'


 class RuleFirstPath(RulePath):
    def __str__(self) -> str:
-        return "(first-party rule)"
+        return '(first-party rule)'


 class RuleMultiPath(RulePath):
    def __str__(self) -> str:
-        return "(multi-party rule)"
+        return '(multi-party rule)'


 class DomainPath(Path):
@ -46,7 +49,7 @@ class DomainPath(Path):
        self.parts = parts

    def __str__(self) -> str:
-        return "?." + Database.unpack_domain(self)
+        return '?.' + Database.unpack_domain(self)


 class HostnamePath(DomainPath):
@ -56,7 +59,7 @@ class HostnamePath(DomainPath):

 class ZonePath(DomainPath):
    def __str__(self) -> str:
-        return "*." + Database.unpack_domain(self)
+        return '*.' + Database.unpack_domain(self)


 class AsnPath(Path):
@ -76,7 +79,7 @@ class Ip4Path(Path):
        return Database.unpack_ip4network(self)


-class Match:
+class Match():
    def __init__(self) -> None:
        self.source: typing.Optional[Path] = None
        self.updated: int = 0
@ -92,17 +95,14 @@ class Match:
            return False
        return True

-    def disable(self) -> None:
-        self.updated = 0
-

 class AsnNode(Match):
    def __init__(self) -> None:
        Match.__init__(self)
-        self.name = ""
+        self.name = ''


-class DomainTreeNode:
+class DomainTreeNode():
    def __init__(self) -> None:
        self.children: typing.Dict[str, DomainTreeNode] = dict()
        self.match_zone = Match()
@ -117,28 +117,20 @@ class IpTreeNode(Match):


 Node = typing.Union[DomainTreeNode, IpTreeNode, AsnNode]
-MatchCallable = typing.Callable[[Path, Match], typing.Any]
+MatchCallable = typing.Callable[[Path,
+                                 Match],
+                                typing.Any]


-class Profiler:
+class Profiler():
    def __init__(self) -> None:
-        do_profile = int(os.environ.get("PROFILE", "0"))
-        if do_profile:
-            self.log = logging.getLogger("profiler")
+        self.log = logging.getLogger('profiler')
        self.time_last = time.perf_counter()
-            self.time_step = "init"
+        self.time_step = 'init'
        self.time_dict: typing.Dict[str, float] = dict()
        self.step_dict: typing.Dict[str, int] = dict()
-            self.enter_step = self.enter_step_real
-            self.profile = self.profile_real
-        else:
-            self.enter_step = self.enter_step_dummy
-            self.profile = self.profile_dummy

-    def enter_step_dummy(self, name: str) -> None:
-        return
-
-    def enter_step_real(self, name: str) -> None:
+    def enter_step(self, name: str) -> None:
        now = time.perf_counter()
        try:
            self.time_dict[self.time_step] += now - self.time_last
@ -149,21 +141,15 @@ class Profiler:
        self.time_step = name
        self.time_last = time.perf_counter()

-    def profile_dummy(self) -> None:
-        return
-
-    def profile_real(self) -> None:
-        self.enter_step("profile")
+    def profile(self) -> None:
+        self.enter_step('profile')
        total = sum(self.time_dict.values())
        for key, secs in sorted(self.time_dict.items(), key=lambda t: t[1]):
            times = self.step_dict[key]
-            self.log.debug(
-                f"{key:<20}: {times:9d} × {secs/times:5.3e} "
-                f"= {secs:9.2f} s ({secs/total:7.2%}) "
-            )
-        self.log.debug(
-            f"{'total':<20}:                         " f"{total:9.2f} s ({1:7.2%})"
-        )
+            self.log.debug(f"{key:<20}: {times:9d} × {secs/times:5.3e} "
+                           f"= {secs:9.2f} s ({secs/total:7.2%}) ")
+        self.log.debug(f"{'total':<20}:                         "
+                       f"{total:9.2f} s ({1:7.2%})")


 class Database(Profiler):
@ -171,7 +157,9 @@ class Database(Profiler):
    PATH = "blocking.p"

    def initialize(self) -> None:
-        self.log.warning("Creating database version: %d ", Database.VERSION)
+        self.log.warning(
+            "Creating database version: %d ",
+            Database.VERSION)
        # Dummy match objects that everything refer to
        self.rules: typing.List[Match] = list()
        for first_party in (False, True):
@ -185,77 +173,67 @@ class Database(Profiler):
        self.ip4tree = IpTreeNode()

    def load(self) -> None:
-        self.enter_step("load")
+        self.enter_step('load')
        try:
-            with open(self.PATH, "rb") as db_fdsec:
+            with open(self.PATH, 'rb') as db_fdsec:
                version, data = pickle.load(db_fdsec)
                if version == Database.VERSION:
                    self.rules, self.domtree, self.asns, self.ip4tree = data
                    return
                self.log.warning(
-                    "Outdated database version found: %d, " "it will be rebuilt.",
-                    version,
-                )
+                    "Outdated database version found: %d, "
+                    "it will be rebuilt.",
+                    version)
        except (TypeError, AttributeError, EOFError):
            self.log.error(
-                "Corrupt (or heavily outdated) database found, " "it will be rebuilt."
-            )
+                "Corrupt (or heavily outdated) database found, "
+                "it will be rebuilt.")
        except FileNotFoundError:
            pass
        self.initialize()

    def save(self) -> None:
-        self.enter_step("save")
-        with open(self.PATH, "wb") as db_fdsec:
+        self.enter_step('save')
+        with open(self.PATH, 'wb') as db_fdsec:
            data = self.rules, self.domtree, self.asns, self.ip4tree
            pickle.dump((self.VERSION, data), db_fdsec)
        self.profile()

    def __init__(self) -> None:
        Profiler.__init__(self)
-        self.log = logging.getLogger("db")
+        self.log = logging.getLogger('db')
        self.load()
        self.ip4cache_shift: int = 32
        self.ip4cache = numpy.ones(1)

    def _set_ip4cache(self, path: Path, _: Match) -> None:
        assert isinstance(path, Ip4Path)
-        self.enter_step("set_ip4cache")
+        self.enter_step('set_ip4cache')
        mini = path.value >> self.ip4cache_shift
-        maxi = (path.value + 2 ** (32 - path.prefixlen)) >> self.ip4cache_shift
+        maxi = (path.value + 2**(32-path.prefixlen)) >> self.ip4cache_shift
        if mini == maxi:
            self.ip4cache[mini] = True
        else:
            self.ip4cache[mini:maxi] = True

-    def fill_ip4cache(self, max_size: int = 512 * 1024 ** 2) -> None:
+    def fill_ip4cache(self, max_size: int = 512*1024**2) -> None:
        """
        Size in bytes
        """
-        if max_size > 2 ** 32 / 8:
-            self.log.warning(
-                "Allocating more than 512 MiB of RAM for "
-                "the Ip4 cache is not necessary."
-            )
-        max_cache_width = int(math.log2(max(1, max_size * 8)))
-        allocated = False
-        cache_width = min(32, max_cache_width)
-        while not allocated:
-            cache_size = 2 ** cache_width
-            try:
-                self.ip4cache = numpy.zeros(cache_size, dtype=bool)
-            except MemoryError:
-                self.log.exception("Could not allocate cache. Retrying a smaller one.")
-                cache_width -= 1
-                continue
-            allocated = True
-        self.ip4cache_shift = 32 - cache_width
+        if max_size > 2**32/8:
+            self.log.warning("Allocating more than 512 MiB of RAM for "
+                             "the Ip4 cache is not necessary.")
+        max_cache_width = int(math.log2(max(1, max_size*8)))
+        cache_width = min(2**32, max_cache_width)
+        self.ip4cache_shift = 32-cache_width
+        cache_size = 2**cache_width
+        self.ip4cache = numpy.zeros(cache_size, dtype=numpy.bool)
        for _ in self.exec_each_ip4(self._set_ip4cache):
            pass

    @staticmethod
    def populate_tld_list() -> None:
-        with open("temp/all_tld.list", "r") as tld_fdesc:
+        with open('temp/all_tld.list', 'r') as tld_fdesc:
            for tld in tld_fdesc:
                tld = tld.strip()
                TLD_LIST.add(tld)
@ -264,7 +242,7 @@ class Database(Profiler):
    def validate_domain(path: str) -> bool:
        if len(path) > 255:
            return False
-        splits = path.split(".")
+        splits = path.split('.')
        if not TLD_LIST:
            Database.populate_tld_list()
        if splits[-1] not in TLD_LIST:
@ -276,26 +254,26 @@ class Database(Profiler):

    @staticmethod
    def pack_domain(domain: str) -> DomainPath:
-        return DomainPath(domain.split(".")[::-1])
+        return DomainPath(domain.split('.')[::-1])

    @staticmethod
    def unpack_domain(domain: DomainPath) -> str:
-        return ".".join(domain.parts[::-1])
+        return '.'.join(domain.parts[::-1])

    @staticmethod
    def pack_asn(asn: str) -> AsnPath:
        asn = asn.upper()
-        if asn.startswith("AS"):
+        if asn.startswith('AS'):
            asn = asn[2:]
        return AsnPath(int(asn))

    @staticmethod
    def unpack_asn(asn: AsnPath) -> str:
-        return f"AS{asn.asn}"
+        return f'AS{asn.asn}'

    @staticmethod
    def validate_ip4address(path: str) -> bool:
-        splits = path.split(".")
+        splits = path.split('.')
        if len(splits) != 4:
            return False
        for split in splits:
@ -306,17 +284,12 @@ class Database(Profiler):
                return False
        return True

-    @staticmethod
-    def pack_ip4address_low(address: str) -> int:
-        addr = 0
-        for split in address.split("."):
-            octet = int(split)
-            addr = (addr << 8) + octet
-        return addr
-
    @staticmethod
    def pack_ip4address(address: str) -> Ip4Path:
-        return Ip4Path(Database.pack_ip4address_low(address), 32)
+        addr = 0
+        for split in address.split('.'):
+            addr = (addr << 8) + int(split)
+        return Ip4Path(addr, 32)

    @staticmethod
    def unpack_ip4address(address: Ip4Path) -> str:
@ -327,12 +300,12 @@ class Database(Profiler):
        for o in reversed(range(4)):
            octets[o] = addr & 0xFF
            addr >>= 8
-        return ".".join(map(str, octets))
+        return '.'.join(map(str, octets))

    @staticmethod
    def validate_ip4network(path: str) -> bool:
        # A bit generous but ok for our usage
-        splits = path.split("/")
+        splits = path.split('/')
        if len(splits) != 2:
            return False
        if not Database.validate_ip4address(splits[0]):
@ -346,7 +319,7 @@ class Database(Profiler):

    @staticmethod
    def pack_ip4network(network: str) -> Ip4Path:
-        address, prefixlen_str = network.split("/")
+        address, prefixlen_str = network.split('/')
        prefixlen = int(prefixlen_str)
        addr = Database.pack_ip4address(address)
        addr.prefixlen = prefixlen
@ -360,7 +333,7 @@ class Database(Profiler):
        for o in reversed(range(4)):
            octets[o] = addr & 0xFF
            addr >>= 8
-        return ".".join(map(str, octets)) + "/" + str(network.prefixlen)
+        return '.'.join(map(str, octets)) + '/' + str(network.prefixlen)

    def get_match(self, path: Path) -> Match:
        if isinstance(path, RuleMultiPath):
@ -381,7 +354,7 @@ class Database(Profiler):
                raise ValueError
        elif isinstance(path, Ip4Path):
            dici = self.ip4tree
-            for i in range(31, 31 - path.prefixlen, -1):
+            for i in range(31, 31-path.prefixlen, -1):
                bit = (path.value >> i) & 0b1
                dici_next = dici.one if bit else dici.zero
                if not dici_next:
@ -391,8 +364,7 @@ class Database(Profiler):
        else:
            raise ValueError

-    def exec_each_asn(
-        self,
+    def exec_each_asn(self,
                      callback: MatchCallable,
                      ) -> typing.Any:
        for asn in self.asns:
@ -407,8 +379,7 @@ class Database(Profiler):
                except TypeError:  # not iterable
                    pass

-    def exec_each_domain(
-        self,
+    def exec_each_domain(self,
                         callback: MatchCallable,
                         _dic: DomainTreeNode = None,
                         _par: DomainPath = None,
@ -436,11 +407,12 @@ class Database(Profiler):
        for part in _dic.children:
            dic = _dic.children[part]
            yield from self.exec_each_domain(
-                callback, _dic=dic, _par=DomainPath(_par.parts + [part])
+                callback,
+                _dic=dic,
+                _par=DomainPath(_par.parts + [part])
            )

-    def exec_each_ip4(
-        self,
+    def exec_each_ip4(self,
                      callback: MatchCallable,
                      _dic: IpTreeNode = None,
                      _par: Ip4Path = None,
@ -464,16 +436,23 @@ class Database(Profiler):
            # addr0 = _par.value & (0xFFFFFFFF ^ (1 << (32-pref)))
            # assert addr0 == _par.value
            addr0 = _par.value
-            yield from self.exec_each_ip4(callback, _dic=dic, _par=Ip4Path(addr0, pref))
+            yield from self.exec_each_ip4(
+                callback,
+                _dic=dic,
+                _par=Ip4Path(addr0, pref)
+            )
        # 1
        dic = _dic.one
        if dic:
-            addr1 = _par.value | (1 << (32 - pref))
+            addr1 = _par.value | (1 << (32-pref))
            # assert addr1 != _par.value
-            yield from self.exec_each_ip4(callback, _dic=dic, _par=Ip4Path(addr1, pref))
+            yield from self.exec_each_ip4(
+                callback,
+                _dic=dic,
+                _par=Ip4Path(addr1, pref)
+            )

-    def exec_each(
-        self,
+    def exec_each(self,
                  callback: MatchCallable,
                  ) -> typing.Any:
        yield from self.exec_each_domain(callback)
@ -483,77 +462,36 @@ class Database(Profiler):
    def update_references(self) -> None:
        # Should be correctly calculated normally,
        # keeping this just in case
-        def reset_references_cb(path: Path, match: Match) -> None:
+        def reset_references_cb(path: Path,
+                                match: Match
+                                ) -> None:
            match.references = 0
-
        for _ in self.exec_each(reset_references_cb):
            pass

-        def increment_references_cb(path: Path, match: Match) -> None:
+        def increment_references_cb(path: Path,
+                                    match: Match
+                                    ) -> None:
            if match.source:
                source = self.get_match(match.source)
                source.references += 1
-
        for _ in self.exec_each(increment_references_cb):
            pass

-    def _clean_deps(self) -> None:
-        # Disable the matches that depends on the targeted
-        # matches until all disabled matches reference count = 0
-        did_something = True
-
-        def clean_deps_cb(path: Path, match: Match) -> None:
-            nonlocal did_something
-            if not match.source:
-                return
-            source = self.get_match(match.source)
-            if not source.active():
-                self._unset_match(match)
-            elif match.first_party > source.first_party:
-                match.first_party = source.first_party
-            else:
-                return
-            did_something = True
-
-        while did_something:
-            did_something = False
-            self.enter_step("pass_clean_deps")
-            for _ in self.exec_each(clean_deps_cb):
-                pass
-
    def prune(self, before: int, base_only: bool = False) -> None:
-        # Disable the matches targeted
-        def prune_cb(path: Path, match: Match) -> None:
-            if base_only and match.level > 1:
-                return
-            if match.updated > before:
-                return
-            self._unset_match(match)
-            self.log.debug("Print: disabled %s", path)
-
-        self.enter_step("pass_prune")
-        for _ in self.exec_each(prune_cb):
-            pass
-
-        self._clean_deps()
-
-        # Remove branches with no match
-        # TODO
+        raise NotImplementedError

    def explain(self, path: Path) -> str:
        match = self.get_match(path)
-        string = str(path)
        if isinstance(match, AsnNode):
-            string += f" ({match.name})"
-        party_char = "F" if match.first_party else "M"
-        dup_char = "D" if match.dupplicate else "_"
-        string += f" {match.level}{party_char}{dup_char}{match.references}"
+            string = f'{path} ({match.name}) #{match.references}'
+        else:
+            string = f'{path} #{match.references}'
        if match.source:
-            string += f" ← {self.explain(match.source)}"
+            string += f' ← {self.explain(match.source)}'
        return string

-    def list_records(
-        self,
+    def list_records(self,
                     first_party_only: bool = False,
                     end_chain_only: bool = False,
                     no_dupplicates: bool = False,
@ -561,7 +499,9 @@ class Database(Profiler):
                     hostnames_only: bool = False,
                     explain: bool = False,
                     ) -> typing.Iterable[str]:
-        def export_cb(path: Path, match: Match) -> typing.Iterable[str]:
+
+        def export_cb(path: Path, match: Match
+                      ) -> typing.Iterable[str]:
            if first_party_only and not match.first_party:
                return
            if end_chain_only and match.references > 0:
@ -580,8 +520,7 @@ class Database(Profiler):

        yield from self.exec_each(export_cb)

-    def count_records(
-        self,
+    def count_records(self,
                      first_party_only: bool = False,
                      end_chain_only: bool = False,
                      no_dupplicates: bool = False,
@ -612,64 +551,54 @@ class Database(Profiler):

        split: typing.List[str] = list()
        for key, value in sorted(memo.items(), key=lambda s: s[0]):
-            split.append(f"{key[:-4].lower()}s: {value}")
-        return ", ".join(split)
+            split.append(f'{key[:-4].lower()}s: {value}')
+        return ', '.join(split)

    def get_domain(self, domain_str: str) -> typing.Iterable[DomainPath]:
-        self.enter_step("get_domain_pack")
+        self.enter_step('get_domain_pack')
        domain = self.pack_domain(domain_str)
-        self.enter_step("get_domain_brws")
+        self.enter_step('get_domain_brws')
        dic = self.domtree
        depth = 0
        for part in domain.parts:
            if dic.match_zone.active():
-                self.enter_step("get_domain_yield")
+                self.enter_step('get_domain_yield')
                yield ZonePath(domain.parts[:depth])
-            self.enter_step("get_domain_brws")
+            self.enter_step('get_domain_brws')
            if part not in dic.children:
                return
            dic = dic.children[part]
            depth += 1
        if dic.match_zone.active():
-            self.enter_step("get_domain_yield")
+            self.enter_step('get_domain_yield')
            yield ZonePath(domain.parts)
        if dic.match_hostname.active():
-            self.enter_step("get_domain_yield")
+            self.enter_step('get_domain_yield')
            yield HostnamePath(domain.parts)

    def get_ip4(self, ip4_str: str) -> typing.Iterable[Path]:
-        self.enter_step("get_ip4_pack")
-        ip4val = self.pack_ip4address_low(ip4_str)
-        self.enter_step("get_ip4_cache")
-        if not self.ip4cache[ip4val >> self.ip4cache_shift]:
+        self.enter_step('get_ip4_pack')
+        ip4 = self.pack_ip4address(ip4_str)
+        self.enter_step('get_ip4_cache')
+        if not self.ip4cache[ip4.value >> self.ip4cache_shift]:
            return
-        self.enter_step("get_ip4_brws")
+        self.enter_step('get_ip4_brws')
        dic = self.ip4tree
-        for i in range(31, -1, -1):
-            bit = (ip4val >> i) & 0b1
+        for i in range(31, 31-ip4.prefixlen, -1):
+            bit = (ip4.value >> i) & 0b1
            if dic.active():
-                self.enter_step("get_ip4_yield")
-                yield Ip4Path(ip4val >> (i + 1) << (i + 1), 31 - i)
-                self.enter_step("get_ip4_brws")
+                self.enter_step('get_ip4_yield')
+                yield Ip4Path(ip4.value >> (i+1) << (i+1), 31-i)
+                self.enter_step('get_ip4_brws')
            next_dic = dic.one if bit else dic.zero
            if next_dic is None:
                return
            dic = next_dic
        if dic.active():
-            self.enter_step("get_ip4_yield")
-            yield Ip4Path(ip4val, 32)
+            self.enter_step('get_ip4_yield')
+            yield ip4

-    def _unset_match(
-        self,
-        match: Match,
-    ) -> None:
-        match.disable()
-        if match.source:
-            source_match = self.get_match(match.source)
-            source_match.references -= 1
-
-    def _set_match(
-        self,
+    def _set_match(self,
                   match: Match,
                   updated: int,
                   source: Path,
@ -681,11 +610,8 @@ class Database(Profiler):
        # so it can pass it to save a traversal
        source_match = source_match or self.get_match(source)
        new_level = source_match.level + 1
-        if (
-            updated > match.updated
-            or new_level < match.level
-            or source_match.first_party > match.first_party
-        ):
+        if updated > match.updated or new_level < match.level \
+                or source_match.first_party > match.first_party:
            # NOTE FP and level of matches referencing this one
            # won't be updated until run or prune
            if match.source:
@ -698,18 +624,20 @@ class Database(Profiler):
            source_match.references += 1
            match.dupplicate = dupplicate

-    def _set_domain(
-        self, hostname: bool, domain_str: str, updated: int, source: Path
-    ) -> None:
-        self.enter_step("set_domain_val")
+    def _set_domain(self,
+                    hostname: bool,
+                    domain_str: str,
+                    updated: int,
+                    source: Path) -> None:
+        self.enter_step('set_domain_val')
        if not Database.validate_domain(domain_str):
            raise ValueError(f"Invalid domain: {domain_str}")
-        self.enter_step("set_domain_pack")
+        self.enter_step('set_domain_pack')
        domain = self.pack_domain(domain_str)
-        self.enter_step("set_domain_fp")
+        self.enter_step('set_domain_fp')
        source_match = self.get_match(source)
        is_first_party = source_match.first_party
-        self.enter_step("set_domain_brws")
+        self.enter_step('set_domain_brws')
        dic = self.domtree
        dupplicate = False
        for part in domain.parts:
@ -730,14 +658,21 @@ class Database(Profiler):
            dupplicate=dupplicate,
        )

-    def set_hostname(self, *args: typing.Any, **kwargs: typing.Any) -> None:
+    def set_hostname(self,
+                     *args: typing.Any, **kwargs: typing.Any
+                     ) -> None:
        self._set_domain(True, *args, **kwargs)

-    def set_zone(self, *args: typing.Any, **kwargs: typing.Any) -> None:
+    def set_zone(self,
+                 *args: typing.Any, **kwargs: typing.Any
+                 ) -> None:
        self._set_domain(False, *args, **kwargs)

-    def set_asn(self, asn_str: str, updated: int, source: Path) -> None:
-        self.enter_step("set_asn")
+    def set_asn(self,
+                asn_str: str,
+                updated: int,
+                source: Path) -> None:
+        self.enter_step('set_asn')
        path = self.pack_asn(asn_str)
        if path.asn in self.asns:
            match = self.asns[path.asn]
@ -750,14 +685,17 @@ class Database(Profiler):
            source,
        )

-    def _set_ip4(self, ip4: Ip4Path, updated: int, source: Path) -> None:
-        self.enter_step("set_ip4_fp")
+    def _set_ip4(self,
+                 ip4: Ip4Path,
+                 updated: int,
+                 source: Path) -> None:
+        self.enter_step('set_ip4_fp')
        source_match = self.get_match(source)
        is_first_party = source_match.first_party
-        self.enter_step("set_ip4_brws")
+        self.enter_step('set_ip4_brws')
        dic = self.ip4tree
        dupplicate = False
-        for i in range(31, 31 - ip4.prefixlen, -1):
+        for i in range(31, 31-ip4.prefixlen, -1):
            bit = (ip4.value >> i) & 0b1
            next_dic = dic.one if bit else dic.zero
            if next_dic is None:
@ -778,22 +716,24 @@ class Database(Profiler):
        )
        self._set_ip4cache(ip4, dic)

-    def set_ip4address(
-        self, ip4address_str: str, *args: typing.Any, **kwargs: typing.Any
+    def set_ip4address(self,
+                       ip4address_str: str,
+                       *args: typing.Any, **kwargs: typing.Any
                       ) -> None:
-        self.enter_step("set_ip4add_val")
+        self.enter_step('set_ip4add_val')
        if not Database.validate_ip4address(ip4address_str):
            raise ValueError(f"Invalid ip4address: {ip4address_str}")
-        self.enter_step("set_ip4add_pack")
+        self.enter_step('set_ip4add_pack')
        ip4 = self.pack_ip4address(ip4address_str)
        self._set_ip4(ip4, *args, **kwargs)

-    def set_ip4network(
-        self, ip4network_str: str, *args: typing.Any, **kwargs: typing.Any
+    def set_ip4network(self,
+                       ip4network_str: str,
+                       *args: typing.Any, **kwargs: typing.Any
                       ) -> None:
-        self.enter_step("set_ip4net_val")
+        self.enter_step('set_ip4net_val')
        if not Database.validate_ip4network(ip4network_str):
            raise ValueError(f"Invalid ip4network: {ip4network_str}")
-        self.enter_step("set_ip4net_pack")
+        self.enter_step('set_ip4net_pack')
        ip4 = self.pack_ip4network(ip4network_str)
        self._set_ip4(ip4, *args, **kwargs)
--- a/db.py
+++ b/db.py
@ -5,37 +5,29 @@ import database
 import time
 import os

-if __name__ == "__main__":
+if __name__ == '__main__':

    # Parsing arguments
-    parser = argparse.ArgumentParser(description="Database operations")
+    parser = argparse.ArgumentParser(
+        description="Database operations")
    parser.add_argument(
-        "-i", "--initialize", action="store_true", help="Reconstruct the whole database"
-    )
+        '-i', '--initialize', action='store_true',
+        help="Reconstruct the whole database")
    parser.add_argument(
-        "-p", "--prune", action="store_true", help="Remove old entries from database"
-    )
+        '-p', '--prune', action='store_true',
+        help="Remove old entries from database")
    parser.add_argument(
-        "-b",
-        "--prune-base",
-        action="store_true",
+        '-b', '--prune-base', action='store_true',
        help="With --prune, only prune base rules "
-        "(the ones added by ./feed_rules.py)",
-    )
+        "(the ones added by ./feed_rules.py)")
    parser.add_argument(
-        "-s",
-        "--prune-before",
-        type=int,
-        default=(int(time.time()) - 60 * 60 * 24 * 31 * 6),
+        '-s', '--prune-before', type=int,
+        default=(int(time.time()) - 60*60*24*31*6),
        help="With --prune, only rules updated before "
-        "this UNIX timestamp will be deleted",
-    )
+        "this UNIX timestamp will be deleted")
    parser.add_argument(
-        "-r",
-        "--references",
-        action="store_true",
-        help="DEBUG: Update the reference count",
-    )
+        '-r', '--references', action='store_true',
+        help="DEBUG: Update the reference count")
    args = parser.parse_args()

    if not args.initialize:
@ -45,7 +37,7 @@ if __name__ == "__main__":
            os.unlink(database.Database.PATH)
        DB = database.Database()

-    DB.enter_step("main")
+    DB.enter_step('main')
    if args.prune:
        DB.prune(before=args.prune_before, base_only=args.prune_base)
    if args.references:
--- a/dist/.gitignore
+++ b/dist/.gitignore
@ -1,2 +1 @@
 *.txt
-*.html
--- a/dist/README.md
+++ b/dist/README.md
@ -12,52 +12,32 @@ In order to block those, one can simply block the hostname `trackercompany.com`,

 However, to circumvent this block, tracker companies made the websites using them load trackers from `somestring.website1.com`.
 The latter is a DNS redirection to `website1.trackercompany.com`, directly to an IP address belonging to the tracking company.
-
 Those are called first-party trackers.
-On top of aforementionned privacy issues, they also cause some security issue, as websites usually trust those scripts more.
-For more information, learn about [Content Security Policy](https://developer.mozilla.org/en-US/docs/Web/HTTP/CSP), [same-origin policy](https://developer.mozilla.org/en-US/docs/Web/Security/Same-origin_policy) and [Cross-Origin Resource Sharing](https://enable-cors.org/).

 In order to block those trackers, ad blockers would need to block every subdomain pointing to anything under `trackercompany.com` or to their network.
 Unfortunately, most don't support those blocking methods as they are not DNS-aware, e.g. they only see `somestring.website1.com`.

 This list is an inventory of every `somestring.website1.com` found to allow non DNS-aware ad blocker to still block first-party trackers.

-### Learn more
-
- [CNAME Cloaking, the dangerous disguise of third-party trackers](https://medium.com/nextdns/cname-cloaking-the-dangerous-disguise-of-third-party-trackers-195205dc522a) from NextDNS
- [Trackers first-party](https://blog.imirhil.fr/2019/11/13/first-party-tracker.html) from Aeris, in french
- [uBlock Origin issue](https://github.com/uBlockOrigin/uBlock-issues/issues/780)
- [CNAME Cloaking and Bounce Tracking Defense](https://webkit.org/blog/11338/cname-cloaking-and-bounce-tracking-defense/) on WebKit's blog
- [Characterizing CNAME cloaking-based tracking](https://blog.apnic.net/2020/08/04/characterizing-cname-cloaking-based-tracking/) on APNIC's webiste
- [Characterizing CNAME Cloaking-Based Tracking on the Web](https://tma.ifip.org/2020/wp-content/uploads/sites/9/2020/06/tma2020-camera-paper66.pdf) is a research paper from Sokendai and ANSSI
-
 ## List variants

-### First-party trackers
-
-**Recommended for hostfiles-based ad blockers, such as [Pi-hole](https://pi-hole.net/) (&lt;v5.0, as it introduced CNAME blocking).**
-**Recommended for Android ad blockers as applications, such ad [Blokada](https://blokada.org/).**
+### First-party trackers (recommended)

 - Hosts file: <https://hostfiles.frogeye.fr/firstparty-trackers-hosts.txt>
 - Raw list: <https://hostfiles.frogeye.fr/firstparty-trackers.txt>

 This list contains every hostname redirecting to [a hand-picked list of first-party trackers](https://git.frogeye.fr/geoffrey/eulaurarien/src/branch/master/rules/first-party.list).
 It should be safe from false-positives.
-It also contains all tracking hostnames under company domains (e.g. `website1.trackercompany.com`),
-useful for ad blockers that don't support mass regex blocking,
-while still preventing fallback to third-party trackers.
 Don't be afraid of the size of the list, as this is due to the nature of first-party trackers: a single tracker generates at least one hostname per client (typically two).

 ### First-party only trackers

-**Recommended for ad blockers as web browser extensions, such as [uBlock Origin](https://ublockorigin.com/) (&lt;v1.25.0 or for Chromium-based browsers, as it introduced CNAME uncloaking for Firefox).**
-
 - Hosts file: <https://hostfiles.frogeye.fr/firstparty-only-trackers-hosts.txt>
 - Raw list: <https://hostfiles.frogeye.fr/firstparty-only-trackers.txt>

-This is the same list as above, albeit not containing the hostnames under the tracking company domains (e.g. `website1.trackercompany.com`).
-This allows for reducing the size of the list for ad-blockers that already block those third-party trackers with their support of regex blocking.
-Use in conjunction with other block lists used in regex-mode, such as [Peter Lowe's](https://pgl.yoyo.org/adservers/)
+This is the same list as above, albeit not containing the hostnames under the tracking company domains.
+This reduces the size of the list, but it doesn't prevent from third-party tracking too.
+Use in conjunction with other block lists.

 ### Multi-party trackers

@ -66,23 +46,22 @@ Use in conjunction with other block lists used in regex-mode, such as [Peter Low

 As first-party trackers usually evolve from third-party trackers, this list contains every hostname redirecting to trackers found in existing lists of third-party trackers (see next section).
 Since the latter were not designed with first-party trackers in mind, they are likely to contain false-positives.
-On the other hand, they might protect against first-party tracker that we're not aware of / have not yet confirmed.
+In the other hand, they might protect against first-party tracker that we're not aware of / have not yet confirmed.

 #### Source of third-party trackers

 - [EasyPrivacy](https://easylist.to/easylist/easyprivacy.txt)
- [AdGuard](https://github.com/AdguardTeam/AdguardFilters)

-(yes there's only two for now. A lot of existing ones cause a lot of false positives)
+(yes there's only one for now. A lot of existing ones cause a lot of false positives)

 ### Multi-party only trackers

 - Hosts file: <https://hostfiles.frogeye.fr/multiparty-only-trackers-hosts.txt>
 - Raw list: <https://hostfiles.frogeye.fr/multiparty-only-trackers.txt>

-This is the same list as above, albeit not containing the hostnames under the tracking company domains (e.g. `website1.trackercompany.com`).
-This allows for reducing the size of the list for ad-blockers that already block those third-party trackers with their support of regex blocking.
-Use in conjunction with other block lists used in regex-mode, such as the ones in the previous section.
+This is the same list as above, albeit not containing the hostnames under the tracking company domains.
+This reduces the size of the list, but it doesn't prevent from third-party tracking too.
+Use in conjunction with other block lists, especially the ones used to generate this list in the previous section.

 ## Meta

@ -90,25 +69,6 @@ In case of false positives/negatives, or any other question contact me the way y

 The software used to generate this list is available here: <https://git.frogeye.fr/geoffrey/eulaurarien>

-## Acknowledgements
-
 Some of the first-party tracker included in this list have been found by:
-
 - [Aeris](https://imirhil.fr/)
 - NextDNS and [their blocklist](https://github.com/nextdns/cname-cloaking-blocklist)'s contributors
- Yuki2718 from [Wilders Security Forums](https://www.wilderssecurity.com/threads/ublock-a-lean-and-fast-blocker.365273/page-168#post-2880361)
- Ha Dao, Johan Mazel, and Kensuke Fukuda, ["Characterizing CNAME Cloaking-Based Tracking on the Web", Proceedings of IFIP/IEEE Traffic Measurement Analysis Conference (TMA), 9 pages, 2020.](https://tma.ifip.org/2020/wp-content/uploads/sites/9/2020/06/tma2020-camera-paper66.pdf)
- AdGuard and [their blocklist](https://github.com/AdguardTeam/cname-trackers)'s contributors
-
-The list was generated using data from
-
- [Cisco Umbrella Popularity List](http://s3-us-west-1.amazonaws.com/umbrella-static/index.html)
- [Public DNS Server List](https://public-dns.info/)
-
-
-Similar projects:
-
- [NextDNS blocklist](https://github.com/nextdns/cname-cloaking-blocklist): for DNS-aware ad blockers
- [Stefan Froberg's lists](https://www.orwell1984.today/cname/): subset of those lists grouped by tracker
- [AdGuard blocklist](https://github.com/AdguardTeam/cname-trackers): same thing with a bigger scope, maintained by a bigger team
-
--- a/dist/markdown7.min.css
+++ b/dist/markdown7.min.css
@ -1,2 +0,0 @@
-/* Source: https://github.com/jasonm23/markdown-css-themes */
-body{font-family:Helvetica,arial,sans-serif;font-size:14px;line-height:1.6;padding-top:10px;padding-bottom:10px;background-color:#fff;padding:30px}body>:first-child{margin-top:0!important}body>:last-child{margin-bottom:0!important}a{color:#4183c4}a.absent{color:#c00}a.anchor{display:block;padding-left:30px;margin-left:-30px;cursor:pointer;position:absolute;top:0;left:0;bottom:0}h1,h2,h3,h4,h5,h6{margin:20px 0 10px;padding:0;font-weight:700;-webkit-font-smoothing:antialiased;cursor:text;position:relative}h1:hover a.anchor,h2:hover a.anchor,h3:hover a.anchor,h4:hover a.anchor,h5:hover a.anchor,h6:hover a.anchor{text-decoration:none}h1 code,h1 tt{font-size:inherit}h2 code,h2 tt{font-size:inherit}h3 code,h3 tt{font-size:inherit}h4 code,h4 tt{font-size:inherit}h5 code,h5 tt{font-size:inherit}h6 code,h6 tt{font-size:inherit}h1{font-size:28px;color:#000}h2{font-size:24px;border-bottom:1px solid #ccc;color:#000}h3{font-size:18px}h4{font-size:16px}h5{font-size:14px}h6{color:#777;font-size:14px}blockquote,dl,li,ol,p,pre,table,ul{margin:15px 0}hr{border:0 none;color:#ccc;height:4px;padding:0}body>h2:first-child{margin-top:0;padding-top:0}body>h1:first-child{margin-top:0;padding-top:0}body>h1:first-child+h2{margin-top:0;padding-top:0}body>h3:first-child,body>h4:first-child,body>h5:first-child,body>h6:first-child{margin-top:0;padding-top:0}a:first-child h1,a:first-child h2,a:first-child h3,a:first-child h4,a:first-child h5,a:first-child h6{margin-top:0;padding-top:0}h1 p,h2 p,h3 p,h4 p,h5 p,h6 p{margin-top:0}li p.first{display:inline-block}li{margin:0}ol,ul{padding-left:30px}ol :first-child,ul :first-child{margin-top:0}dl{padding:0}dl dt{font-size:14px;font-weight:700;font-style:italic;padding:0;margin:15px 0 5px}dl dt:first-child{padding:0}dl dt>:first-child{margin-top:0}dl dt>:last-child{margin-bottom:0}dl dd{margin:0 0 15px;padding:0 15px}dl dd>:first-child{margin-top:0}dl dd>:last-child{margin-bottom:0}blockquote{border-left:4px solid #ddd;padding:0 15px;color:#777}blockquote>:first-child{margin-top:0}blockquote>:last-child{margin-bottom:0}table{padding:0;border-collapse:collapse}table tr{border-top:1px solid #ccc;background-color:#fff;margin:0;padding:0}table tr:nth-child(2n){background-color:#f8f8f8}table tr th{font-weight:700;border:1px solid #ccc;margin:0;padding:6px 13px}table tr td{border:1px solid #ccc;margin:0;padding:6px 13px}table tr td :first-child,table tr th :first-child{margin-top:0}table tr td :last-child,table tr th :last-child{margin-bottom:0}img{max-width:100%}span.frame{display:block;overflow:hidden}span.frame>span{border:1px solid #ddd;display:block;float:left;overflow:hidden;margin:13px 0 0;padding:7px;width:auto}span.frame span img{display:block;float:left}span.frame span span{clear:both;color:#333;display:block;padding:5px 0 0}span.align-center{display:block;overflow:hidden;clear:both}span.align-center>span{display:block;overflow:hidden;margin:13px auto 0;text-align:center}span.align-center span img{margin:0 auto;text-align:center}span.align-right{display:block;overflow:hidden;clear:both}span.align-right>span{display:block;overflow:hidden;margin:13px 0 0;text-align:right}span.align-right span img{margin:0;text-align:right}span.float-left{display:block;margin-right:13px;overflow:hidden;float:left}span.float-left span{margin:13px 0 0}span.float-right{display:block;margin-left:13px;overflow:hidden;float:right}span.float-right>span{display:block;overflow:hidden;margin:13px auto 0;text-align:right}code,tt{margin:0 2px;padding:0 5px;white-space:nowrap;border:1px solid #eaeaea;background-color:#f8f8f8;border-radius:3px}pre code{margin:0;padding:0;white-space:pre;border:none;background:0 0}.highlight pre{background-color:#f8f8f8;border:1px solid #ccc;font-size:13px;line-height:19px;overflow:auto;padding:6px 10px;border-radius:3px}pre{background-color:#f8f8f8;border:1px solid #ccc;font-size:13px;line-height:19px;overflow:auto;padding:6px 10px;border-radius:3px}pre code,pre tt{background-color:transparent;border:none}sup{font-size:.83em;vertical-align:super;line-height:0}*{-webkit-print-color-adjust:exact}@media screen and (min-width:914px){body{width:854px;margin:0 auto}}@media print{pre,table{page-break-inside:avoid}pre{word-wrap:break-word}}
--- a/eulaurarien.sh
+++ b/eulaurarien.sh
@ -2,13 +2,8 @@

 # Main script for eulaurarien

-[ ! -f .env ] && touch .env
-
 ./fetch_resources.sh
 ./collect_subdomains.sh
-./import_rules.sh
 ./resolve_subdomains.sh
-./prune.sh
-./export_lists.sh
-./generate_index.py
+./filter_subdomains.sh

--- a/export.py
+++ b/export.py
@ -5,80 +5,53 @@ import argparse
 import sys


-if __name__ == "__main__":
+if __name__ == '__main__':

    # Parsing arguments
    parser = argparse.ArgumentParser(
-        description="Export the hostnames rules stored " "in the Database as plain text"
-    )
+        description="Export the hostnames rules stored "
+        "in the Database as plain text")
    parser.add_argument(
-        "-o",
-        "--output",
-        type=argparse.FileType("w"),
-        default=sys.stdout,
-        help="Output file, one rule per line",
-    )
+        '-o', '--output', type=argparse.FileType('w'), default=sys.stdout,
+        help="Output file, one rule per line")
    parser.add_argument(
-        "-f",
-        "--first-party",
-        action="store_true",
-        help="Only output rules issued from first-party sources",
-    )
+        '-f', '--first-party', action='store_true',
+        help="Only output rules issued from first-party sources")
    parser.add_argument(
-        "-e",
-        "--end-chain",
-        action="store_true",
-        help="Only output rules that are not referenced by any other",
-    )
+        '-e', '--end-chain', action='store_true',
+        help="Only output rules that are not referenced by any other")
    parser.add_argument(
-        "-r",
-        "--rules",
-        action="store_true",
-        help="Output all kinds of rules, not just hostnames",
-    )
+        '-r', '--rules', action='store_true',
+        help="Output all kinds of rules, not just hostnames")
    parser.add_argument(
-        "-b",
-        "--base-rules",
-        action="store_true",
+        '-b', '--base-rules', action='store_true',
        help="Output base rules "
        "(the ones added by ./feed_rules.py) "
-        "(implies --rules)",
-    )
+        "(implies --rules)")
    parser.add_argument(
-        "-d",
-        "--no-dupplicates",
-        action="store_true",
+        '-d', '--no-dupplicates', action='store_true',
        help="Do not output rules that already match a zone/network rule "
-        "(e.g. dummy.example.com when there's a zone example.com rule)",
-    )
+        "(e.g. dummy.example.com when there's a zone example.com rule)")
    parser.add_argument(
-        "-x",
-        "--explain",
-        action="store_true",
+        '-x', '--explain', action='store_true',
        help="Show the chain of rules leading to one "
-        "(and the number of references they have)",
-    )
+        "(and the number of references they have)")
    parser.add_argument(
-        "-c",
-        "--count",
-        action="store_true",
-        help="Show the number of rules per type instead of listing them",
-    )
+        '-c', '--count', action='store_true',
+        help="Show the number of rules per type instead of listing them")
    args = parser.parse_args()

    DB = database.Database()

    if args.count:
        assert not args.explain
-        print(
-            DB.count_records(
+        print(DB.count_records(
            first_party_only=args.first_party,
            end_chain_only=args.end_chain,
            no_dupplicates=args.no_dupplicates,
            rules_only=args.base_rules,
            hostnames_only=not (args.rules or args.base_rules),
-            )
-        )
+        ))
    else:
        for domain in DB.list_records(
            first_party_only=args.first_party,
--- a/export_lists.sh
+++ b/export_lists.sh
@ -5,13 +5,11 @@ function log() {
 }

 log "Calculating statistics…"
-oldest="$(cat last_updates/*.txt | sort -n | head -1)"
-oldest_date=$(date -Isec -d @$oldest)
 gen_date=$(date -Isec)
 gen_software=$(git describe --tags)
 number_websites=$(wc -l < temp/all_websites.list)
 number_subdomains=$(wc -l < temp/all_subdomains.list)
-number_dns=$(grep 'NOERROR' temp/all_resolved.txt | wc -l)
+number_dns=$(grep '^$' temp/all_resolved.txt | wc -l)

 for partyness in {first,multi}
 do
@ -22,19 +20,15 @@ do
        partyness_flags=""
    fi

-    rules_input=$(./export.py --count --base-rules $partyness_flags)
-    rules_found=$(./export.py --count --rules $partyness_flags)
-    rules_found_nd=$(./export.py --count --rules --no-dupplicates $partyness_flags)
-
-    echo
    echo "Statistics for ${partyness}-party trackers"
-    echo "Input rules: $rules_input"
-    echo "Subsequent rules: $rules_found"
-    echo "Subsequent rules (no dupplicate): $rules_found_nd"
+    echo "Input rules: $(./export.py --count --base-rules $partyness_flags)"
+    echo "Subsequent rules: $(./export.py --count --rules $partyness_flags)"
+    echo "Subsequent rules (no dupplicate): $(./export.py --count --rules --no-dupplicates $partyness_flags)"
    echo "Output hostnames: $(./export.py --count $partyness_flags)"
    echo "Output hostnames (no dupplicate): $(./export.py --count --no-dupplicates $partyness_flags)"
    echo "Output hostnames (end-chain only): $(./export.py --count --end-chain $partyness_flags)"
    echo "Output hostnames (no dupplicate, end-chain only): $(./export.py --count --no-dupplicates --end-chain $partyness_flags)"
+    echo

    for trackerness in {trackers,only-trackers}
    do
@ -42,7 +36,7 @@ do
        then
            trackerness_flags=""
        else
-            trackerness_flags="--no-dupplicates"
+            trackerness_flags="--end-chain --no-dupplicates"
        fi
        file_list="dist/${partyness}party-${trackerness}.txt"
        file_host="dist/${partyness}party-${trackerness}-hosts.txt"
@ -55,32 +49,45 @@ do
        # so this is done in two steps
        sort -u $file_list -o $file_list

+        rules_input=$(./export.py --count --base-rules $partyness_flags)
+        rules_found=$(./export.py --count --rules $partyness_flags)
        rules_output=$(./export.py --count $partyness_flags $trackerness_flags)

+        function link() { # link partyness, link trackerness
+            url="https://hostfiles.frogeye.fr/${1}party-${2}-hosts.txt"
+            if [ "$1" = "$partyness" ] && [ "$2" = "$trackerness" ]
+            then
+                url="$url (this one)"
+            fi
+            echo $url
+        }
+
        (
            echo "# First-party trackers host list"
            echo "# Variant: ${partyness}-party ${trackerness}"
            echo "#"
-            echo "# About first-party trackers: https://hostfiles.frogeye.fr/#whats-a-first-party-tracker"
+            echo "# About first-party trackers: TODO"
+            echo "# Source code: https://git.frogeye.fr/geoffrey/eulaurarien"
            echo "#"
            echo "# In case of false positives/negatives, or any other question,"
            echo "# contact me the way you like: https://geoffrey.frogeye.fr"
            echo "#"
-            echo "# Latest versions and variants: https://hostfiles.frogeye.fr/#list-variants"
-            echo "# Source code: https://git.frogeye.fr/geoffrey/eulaurarien"
-            echo "# License: https://git.frogeye.fr/geoffrey/eulaurarien/src/branch/master/LICENSE"
-            echo "# Acknowledgements: https://hostfiles.frogeye.fr/#acknowledgements"
+            echo "# Latest versions and variants:"
+            echo "# - First-party trackers  : $(link first trackers)"
+            echo "# - … excluding redirected: $(link first only-trackers)"
+            echo "# - First and third party : $(link multi trackers)"
+            echo "# - … excluding redirected: $(link multi only-trackers)"
+            echo '# (variants informations: TODO)'
+            echo '# (you can remove `-hosts` to get the raw list)'
            echo "#"
+            echo "# Generation date: $gen_date"
            echo "# Generation software: eulaurarien $gen_software"
-            echo "# List generation date: $gen_date"
-            echo "# Oldest record: $oldest_date"
            echo "# Number of source websites: $number_websites"
            echo "# Number of source subdomains: $number_subdomains"
-            echo "# Number of source DNS records: $number_dns"
+            echo "# Number of source DNS records: ~2E9 + $number_dns"
            echo "#"
            echo "# Input rules: $rules_input"
            echo "# Subsequent rules: $rules_found"
-            echo "# … no dupplicates: $rules_found_nd"
            echo "# Output rules: $rules_output"
            echo "#"
            echo
@ -89,10 +96,3 @@ do

    done
 done
-
-if [ -d explanations ]
-then
-    filename="$(date -Isec).txt"
-    ./export.py --explain > "explanations/$filename"
-    ln --force --symbolic "$filename" "explanations/latest.txt"
-fi
--- a/feed_asn.py
+++ b/feed_asn.py
@ -13,54 +13,57 @@ IPNetwork = typing.Union[ipaddress.IPv4Network, ipaddress.IPv6Network]

 def get_ranges(asn: str) -> typing.Iterable[str]:
    req = requests.get(
-        "https://stat.ripe.net/data/as-routing-consistency/data.json",
-        params={"resource": asn},
+        'https://stat.ripe.net/data/as-routing-consistency/data.json',
+        params={'resource': asn}
    )
    data = req.json()
-    for pref in data["data"]["prefixes"]:
-        yield pref["prefix"]
+    for pref in data['data']['prefixes']:
+        yield pref['prefix']


 def get_name(asn: str) -> str:
    req = requests.get(
-        "https://stat.ripe.net/data/as-overview/data.json", params={"resource": asn}
+        'https://stat.ripe.net/data/as-overview/data.json',
+        params={'resource': asn}
    )
    data = req.json()
-    return data["data"]["holder"]
+    return data['data']['holder']


-if __name__ == "__main__":
+if __name__ == '__main__':

-    log = logging.getLogger("feed_asn")
+    log = logging.getLogger('feed_asn')

    # Parsing arguments
    parser = argparse.ArgumentParser(
-        description="Add the IP ranges associated to the AS in the database"
-    )
+        description="Add the IP ranges associated to the AS in the database")
    args = parser.parse_args()

    DB = database.Database()

-    def add_ranges(
-        path: database.Path,
+    def add_ranges(path: database.Path,
                   match: database.Match,
                   ) -> None:
        assert isinstance(path, database.AsnPath)
        assert isinstance(match, database.AsnNode)
        asn_str = database.Database.unpack_asn(path)
-        DB.enter_step("asn_get_name")
+        DB.enter_step('asn_get_name')
        name = get_name(asn_str)
        match.name = name
-        DB.enter_step("asn_get_ranges")
+        DB.enter_step('asn_get_ranges')
        for prefix in get_ranges(asn_str):
            parsed_prefix: IPNetwork = ipaddress.ip_network(prefix)
            if parsed_prefix.version == 4:
-                DB.set_ip4network(prefix, source=path, updated=int(time.time()))
-                log.info("Added %s from %s (%s)", prefix, path, name)
+                DB.set_ip4network(
+                    prefix,
+                    source=path,
+                    updated=int(time.time())
+                )
+                log.info('Added %s from %s (%s)', prefix, path, name)
            elif parsed_prefix.version == 6:
-                log.warning("Unimplemented prefix version: %s", prefix)
+                log.warning('Unimplemented prefix version: %s', prefix)
            else:
-                log.error("Unknown prefix version: %s", prefix)
+                log.error('Unknown prefix version: %s', prefix)

    for _ in DB.exec_each_asn(add_ranges):
        pass
--- a/feed_dns.py
+++ b/feed_dns.py
@ -12,15 +12,15 @@ Record = typing.Tuple[typing.Callable, typing.Callable, int, str, str]

 # select, write
 FUNCTION_MAP: typing.Any = {
-    "a": (
+    'a': (
        database.Database.get_ip4,
        database.Database.set_hostname,
    ),
-    "cname": (
+    'cname': (
        database.Database.get_domain,
        database.Database.set_hostname,
    ),
-    "ptr": (
+    'ptr': (
        database.Database.get_domain,
        database.Database.set_ip4address,
    ),
@ -28,56 +28,41 @@ FUNCTION_MAP: typing.Any = {


 class Writer(multiprocessing.Process):
-    def __init__(
-        self,
-        recs_queue: multiprocessing.Queue = None,
+    def __init__(self,
+                 recs_queue: multiprocessing.Queue,
                 autosave_interval: int = 0,
                 ip4_cache: int = 0,
                 ):
-        if recs_queue:  # MP
        super(Writer, self).__init__()
+        self.log = logging.getLogger(f'wr')
        self.recs_queue = recs_queue
-        self.log = logging.getLogger("wr")
        self.autosave_interval = autosave_interval
        self.ip4_cache = ip4_cache
-        if not recs_queue:  # No MP
-            self.open_db()
-
-    def open_db(self) -> None:
-        self.db = database.Database()
-        self.db.log = logging.getLogger("wr")
-        self.db.fill_ip4cache(max_size=self.ip4_cache)
-
-    def exec_record(self, record: Record) -> None:
-        self.db.enter_step("exec_record")
-        select, write, updated, name, value = record
-        try:
-            for source in select(self.db, value):
-                write(self.db, name, updated, source=source)
-        except (ValueError, IndexError):
-            # ValueError: non-number in IP
-            # IndexError: IP too big
-            self.log.exception("Cannot execute: %s", record)
-
-    def end(self) -> None:
-        self.db.enter_step("end")
-        self.db.save()

    def run(self) -> None:
-        self.open_db()
+        self.db = database.Database()
+        self.db.log = logging.getLogger(f'wr')
+        self.db.fill_ip4cache(max_size=self.ip4_cache)
        if self.autosave_interval > 0:
            next_save = time.time() + self.autosave_interval
        else:
            next_save = 0

-        self.db.enter_step("block_wait")
+        self.db.enter_step('block_wait')
        block: typing.List[Record]
        for block in iter(self.recs_queue.get, None):

-            assert block
            record: Record
            for record in block:
-                self.exec_record(record)
+
+                select, write, updated, name, value = record
+                self.db.enter_step('feed_switch')
+
+                try:
+                    for source in select(self.db, value):
+                        write(self.db, name, updated, source=source)
+                except ValueError:
+                    self.log.exception("Cannot execute: %s", record)

            if next_save > 0 and time.time() > next_save:
                self.log.info("Saving database...")
@ -85,44 +70,37 @@ class Writer(multiprocessing.Process):
                self.log.info("Done!")
                next_save = time.time() + self.autosave_interval

-            self.db.enter_step("block_wait")
-        self.end()
+            self.db.enter_step('block_wait')
+
+        self.db.enter_step('end')
+        self.db.save()


-class Parser:
-    def __init__(
-        self,
+class Parser():
+    def __init__(self,
                 buf: typing.Any,
-        recs_queue: multiprocessing.Queue = None,
-        block_size: int = 0,
-        writer: Writer = None,
+                 recs_queue: multiprocessing.Queue,
+                 block_size: int,
                 ):
-        assert bool(writer) ^ bool(block_size and recs_queue)
+        super(Parser, self).__init__()
        self.buf = buf
-        self.log = logging.getLogger("pr")
+        self.log = logging.getLogger('pr')
        self.recs_queue = recs_queue
-        if writer:  # No MP
-            self.prof: database.Profiler = writer.db
-            self.register = writer.exec_record
-        else:  # MP
        self.block: typing.List[Record] = list()
        self.block_size = block_size
        self.prof = database.Profiler()
-            self.prof.log = logging.getLogger("pr")
-            self.register = self.add_to_queue
+        self.prof.log = logging.getLogger('pr')

-    def add_to_queue(self, record: Record) -> None:
-        self.prof.enter_step("register")
+    def register(self, record: Record) -> None:
+        self.prof.enter_step('register')
        self.block.append(record)
        if len(self.block) >= self.block_size:
-            self.prof.enter_step("put_block")
-            assert self.recs_queue
+            self.prof.enter_step('put_block')
            self.recs_queue.put(self.block)
            self.block = list()

    def run(self) -> None:
        self.consume()
-        if self.recs_queue:
        self.recs_queue.put(self.block)
        self.prof.profile()

@ -130,17 +108,43 @@ class Parser:
        raise NotImplementedError


+class Rapid7Parser(Parser):
+    def consume(self) -> None:
+        data = dict()
+        for line in self.buf:
+            self.prof.enter_step('parse_rapid7')
+            split = line.split('"')
+
+            try:
+                for k in range(1, 14, 4):
+                    key = split[k]
+                    val = split[k+2]
+                    data[key] = val
+
+                select, writer = FUNCTION_MAP[data['type']]
+                record = (
+                    select,
+                    writer,
+                    int(data['timestamp']),
+                    data['name'],
+                    data['value']
+                )
+            except IndexError:
+                self.log.exception("Cannot parse: %s", line)
+            self.register(record)
+
+
 class MassDnsParser(Parser):
    # massdns --output Snrql
    # --retry REFUSED,SERVFAIL --resolvers nameservers-ipv4
    TYPES = {
-        "A": (FUNCTION_MAP["a"][0], FUNCTION_MAP["a"][1], -1, None),
+        'A': (FUNCTION_MAP['a'][0], FUNCTION_MAP['a'][1], -1, None),
        # 'AAAA': (FUNCTION_MAP['aaaa'][0], FUNCTION_MAP['aaaa'][1], -1, None),
-        "CNAME": (FUNCTION_MAP["cname"][0], FUNCTION_MAP["cname"][1], -1, -1),
+        'CNAME': (FUNCTION_MAP['cname'][0], FUNCTION_MAP['cname'][1], -1, -1),
    }

    def consume(self) -> None:
-        self.prof.enter_step("parse_massdns")
+        self.prof.enter_step('parse_massdns')
        timestamp = 0
        header = True
        for line in self.buf:
@ -149,102 +153,74 @@ class MassDnsParser(Parser):
                header = True
                continue

-            split = line.split(" ")
+            split = line.split(' ')
            try:
                if header:
                    timestamp = int(split[1])
                    header = False
                else:
-                    select, write, name_offset, value_offset = MassDnsParser.TYPES[
-                        split[1]
-                    ]
+                    select, write, name_offset, value_offset = \
+                        MassDnsParser.TYPES[split[1]]
                    record = (
                        select,
                        write,
                        timestamp,
-                        split[0][:name_offset].lower(),
-                        split[2][:value_offset].lower(),
+                        split[0][:name_offset],
+                        split[2][:value_offset],
                    )
                    self.register(record)
-                    self.prof.enter_step("parse_massdns")
+                    self.prof.enter_step('parse_massdns')
            except KeyError:
                continue


 PARSERS = {
-    "massdns": MassDnsParser,
+    'rapid7': Rapid7Parser,
+    'massdns': MassDnsParser,
 }

-if __name__ == "__main__":
+if __name__ == '__main__':

    # Parsing arguments
-    log = logging.getLogger("feed_dns")
+    log = logging.getLogger('feed_dns')
    args_parser = argparse.ArgumentParser(
        description="Read DNS records and import "
-        "tracking-relevant data into the database"
-    )
-    args_parser.add_argument("parser", choices=PARSERS.keys(), help="Input format")
+        "tracking-relevant data into the database")
    args_parser.add_argument(
-        "-i",
-        "--input",
-        type=argparse.FileType("r"),
-        default=sys.stdin,
-        help="Input file",
-    )
+        'parser',
+        choices=PARSERS.keys(),
+        help="Input format")
    args_parser.add_argument(
-        "-b", "--block-size", type=int, default=1024, help="Performance tuning value"
-    )
+        '-i', '--input', type=argparse.FileType('r'), default=sys.stdin,
+        help="Input file")
    args_parser.add_argument(
-        "-q", "--queue-size", type=int, default=128, help="Performance tuning value"
-    )
+        '-b', '--block-size', type=int, default=1024,
+        help="Performance tuning value")
    args_parser.add_argument(
-        "-a",
-        "--autosave-interval",
-        type=int,
-        default=900,
-        help="Interval to which the database will save in seconds. " "0 to disable.",
-    )
+        '-q', '--queue-size', type=int, default=128,
+        help="Performance tuning value")
    args_parser.add_argument(
-        "-s",
-        "--single-process",
-        action="store_true",
-        help="Only use one process. " "Might be useful for single core computers.",
-    )
+        '-a', '--autosave-interval', type=int, default=900,
+        help="Interval to which the database will save in seconds. "
+        "0 to disable.")
    args_parser.add_argument(
-        "-4",
-        "--ip4-cache",
-        type=int,
-        default=0,
+        '-4', '--ip4-cache', type=int, default=0,
        help="RAM cache for faster IPv4 lookup. "
        "Maximum useful value: 512 MiB (536870912). "
        "Warning: Depending on the rules, this might already "
-        "be a memory-heavy process, even without the cache.",
-    )
+        "be a memory-heavy process, even without the cache.")
    args = args_parser.parse_args()

-    parser_cls = PARSERS[args.parser]
-    if args.single_process:
-        writer = Writer(
-            autosave_interval=args.autosave_interval, ip4_cache=args.ip4_cache
-        )
-        parser = parser_cls(args.input, writer=writer)
-        parser.run()
-        writer.end()
-    else:
    recs_queue: multiprocessing.Queue = multiprocessing.Queue(
-            maxsize=args.queue_size
-        )
+        maxsize=args.queue_size)

-        writer = Writer(
-            recs_queue,
+    writer = Writer(recs_queue,
                    autosave_interval=args.autosave_interval,
-            ip4_cache=args.ip4_cache,
+                    ip4_cache=args.ip4_cache
                    )
    writer.start()

-        parser = parser_cls(
-            args.input, recs_queue=recs_queue, block_size=args.block_size
-        )
+    parser = PARSERS[args.parser](args.input, recs_queue, args.block_size)
    parser.run()

    recs_queue.put(None)
--- a/feed_rules.py
+++ b/feed_rules.py
@ -4,36 +4,30 @@ import database
 import argparse
 import sys
 import time
-import typing

 FUNCTION_MAP = {
-    "zone": database.Database.set_zone,
-    "hostname": database.Database.set_hostname,
-    "asn": database.Database.set_asn,
-    "ip4network": database.Database.set_ip4network,
-    "ip4address": database.Database.set_ip4address,
+    'zone': database.Database.set_zone,
+    'hostname': database.Database.set_hostname,
+    'asn': database.Database.set_asn,
+    'ip4network': database.Database.set_ip4network,
+    'ip4address': database.Database.set_ip4address,
 }

-if __name__ == "__main__":
+if __name__ == '__main__':

    # Parsing arguments
-    parser = argparse.ArgumentParser(description="Import base rules to the database")
+    parser = argparse.ArgumentParser(
+        description="Import base rules to the database")
    parser.add_argument(
-        "type", choices=FUNCTION_MAP.keys(), help="Type of rule inputed"
-    )
+        'type',
+        choices=FUNCTION_MAP.keys(),
+        help="Type of rule inputed")
    parser.add_argument(
-        "-i",
-        "--input",
-        type=argparse.FileType("r"),
-        default=sys.stdin,
-        help="File with one rule per line",
-    )
+        '-i', '--input', type=argparse.FileType('r'), default=sys.stdin,
+        help="File with one rule per line")
    parser.add_argument(
-        "-f",
-        "--first-party",
-        action="store_true",
-        help="The input only comes from verified first-party sources",
-    )
+        '-f', '--first-party', action='store_true',
+        help="The input only comes from verified first-party sources")
    args = parser.parse_args()

    DB = database.Database()
@ -49,8 +43,7 @@ if __name__ == "__main__":
    for rule in args.input:
        rule = rule.strip()
        try:
-            fun(
-                DB,
+            fun(DB,
                rule,
                source=source,
                updated=int(time.time()),
--- a/fetch_resources.sh
+++ b/fetch_resources.sh
@ -13,15 +13,10 @@ function dl() {
    fi
 }

-log "Retrieving tests…"
-rm -f tests/*.cache.csv
-dl https://raw.githubusercontent.com/fukuda-lab/cname_cloaking/master/Subdomain_CNAME-cloaking-based-tracking.csv temp/fukuda.csv
-(echo "url,allow,deny,comment"; tail -n +2 temp/fukuda.csv | awk -F, '{ print "https://" $2 "/,," $3 "," $5 }') > tests/fukuda.cache.csv

 log "Retrieving rules…"
 rm -f rules*/*.cache.*
 dl https://easylist.to/easylist/easyprivacy.txt rules_adblock/easyprivacy.cache.txt
-dl https://filters.adtidy.org/extension/chromium/filters/3.txt rules_adblock/adguard.cache.txt

 log "Retrieving TLD list…"
 dl http://data.iana.org/TLD/tlds-alpha-by-domain.txt temp/all_tld.temp.list
@ -38,7 +33,7 @@ rm top-1m.csv top-1m.csv.zip
 if [ -f subdomains/cisco-umbrella_popularity.cache.list ]
 then
    cp subdomains/cisco-umbrella_popularity.cache.list temp/cisco-umbrella_popularity.old.list
-    pv -f temp/cisco-umbrella_popularity.old.list temp/cisco-umbrella_popularity.fresh.list | sort -u > subdomains/cisco-umbrella_popularity.cache.list
+    pv temp/cisco-umbrella_popularity.old.list temp/cisco-umbrella_popularity.fresh.list | sort -u > subdomains/cisco-umbrella_popularity.cache.list
    rm temp/cisco-umbrella_popularity.old.list temp/cisco-umbrella_popularity.fresh.list
 else
    mv temp/cisco-umbrella_popularity.fresh.list subdomains/cisco-umbrella_popularity.cache.list
--- a/generate_index.py
+++ b/generate_index.py
@ -1,25 +0,0 @@
-#!/usr/bin/env python3
-
-import markdown2
-
-extras = ["header-ids"]
-
-with open("dist/README.md", "r") as fdesc:
-    body = markdown2.markdown(fdesc.read(), extras=extras)
-
-output = f"""<!DOCTYPE html>
-<html lang="en">
-<head>
-<title>Geoffrey Frogeye's block list of first-party trackers</title>
-<meta charset="utf-8">
-<meta name="author" content="Geoffrey 'Frogeye' Preud'homme" />
-<link rel="stylesheet" type="text/css" href="markdown7.min.css">
-</head>
-<body>
-{body}
-</body>
-</html>
-"""
-
-with open("dist/index.html", "w") as fdesc:
-    fdesc.write(output)
--- a/import_rapid7.sh
+++ b/import_rapid7.sh
@ -0,0 +1,26 @@
+#!/usr/bin/env bash
+
+function log() {
+    echo -e "\033[33m$@\033[0m"
+}
+
+function feed_rapid7_fdns { # dataset
+    dataset=$1
+    line=$(curl -s https://opendata.rapid7.com/sonar.fdns_v2/ | grep "href=\".\+-fdns_$dataset.json.gz\"")
+    link="https://opendata.rapid7.com$(echo "$line" | cut -d'"' -f2)"
+    log "Reading $(echo "$dataset" | awk '{print toupper($0)}') records from $link"
+    curl -L "$link" | gunzip
+}
+
+function feed_rapid7_rdns {
+    dataset=$1
+    line=$(curl -s https://opendata.rapid7.com/sonar.rdns_v2/ | grep "href=\".\+-rdns.json.gz\"")
+    link="https://opendata.rapid7.com$(echo "$line" | cut -d'"' -f2)"
+    log "Reading PTR records from $link"
+    curl -L "$link" | gunzip
+}
+
+feed_rapid7_rdns | ./feed_dns.py rapid7
+feed_rapid7_fdns a | ./feed_dns.py rapid7 --ip4-cache 536870912
+# feed_rapid7_fdns aaaa | ./feed_dns.py rapid7 --ip6-cache 536870912
+feed_rapid7_fdns cname | ./feed_dns.py rapid7
--- a/import_rules.sh
+++ b/import_rules.sh
@ -5,7 +5,7 @@ function log() {
 }

 log "Importing rules…"
-date +%s > "last_updates/rules.txt"
+BEFORE="$(date +%s)"
 cat rules_adblock/*.txt | grep -v '^!' | grep -v '^\[Adblock' | ./adblock_to_domain_list.py | ./feed_rules.py zone
 cat rules_hosts/*.txt | grep -v '^#' | grep -v '^$' | cut -d ' ' -f2 | ./feed_rules.py zone
 cat rules/*.list | grep -v '^#' | grep -v '^$' | ./feed_rules.py zone
@ -18,3 +18,5 @@ cat rules_asn/first-party.txt | grep -v '^#' | grep -v '^$' | ./feed_rules.py as

 ./feed_asn.py

+# log "Pruning old rules…"
+# ./db.py --prune --prune-before "$BEFORE" --prune-base
--- a/last_updates/.gitignore
+++ b/last_updates/.gitignore
@ -1 +0,0 @@
-*.txt
--- a/prune.sh
+++ b/prune.sh
@ -1,9 +0,0 @@
-#!/usr/bin/env bash
-
-function log() {
-    echo -e "\033[33m$@\033[0m"
-}
-
-oldest="$(cat last_updates/*.txt | sort -n | head -1)"
-log "Pruning every record before ${oldest}…"
-./db.py --prune --prune-before "$oldest"
--- a/requirements.txt
+++ b/requirements.txt
@ -1,4 +0,0 @@
-coloredlogs>=10
-markdown2>=2.4<3
-numpy>=1.21<2
-python-abp>=0.2<0.3
--- a/resolve_subdomains.sh
+++ b/resolve_subdomains.sh
@ -1,24 +1,19 @@
 #!/usr/bin/env bash

-source .env.default
-source .env
-
 function log() {
    echo -e "\033[33m$@\033[0m"
 }

 log "Compiling nameservers…"
-pv -f nameservers/*.list | ./validate_list.py --ip4 | sort -u > temp/all_nameservers_ip4.list
+pv nameservers/*.list | ./validate_list.py --ip4 | sort -u > temp/all_nameservers_ip4.list

-log "Compiling subdomains…"
+log "Compiling subdomain…"
 # Sort by last character to utilize the DNS server caching mechanism
 # (not as efficient with massdns but it's almost free so why not)
-pv -f subdomains/*.list | ./validate_list.py --domain | rev | sort -u | rev > temp/all_subdomains.list
+pv subdomains/*.list | ./validate_list.py --domain | rev | sort -u | rev > temp/all_subdomains.list

 log "Resolving subdomain…"
-date +%s > "last_updates/massdns.txt"
-"$MASSDNS_BINARY" --output Snrql --hashmap-size "$MASSDNS_HASHMAP_SIZE" --resolvers temp/all_nameservers_ip4.list --outfile temp/all_resolved.txt temp/all_subdomains.list
+massdns --output Snrql --retry REFUSED,SERVFAIL --resolvers temp/all_nameservers_ip4.list --outfile temp/all_resolved.txt temp/all_subdomains.list

 log "Importing into database…"
-[ $SINGLE_PROCESS -eq 1 ] && EXTRA_ARGS="--single-process"
-pv -f temp/all_resolved.txt | ./feed_dns.py massdns --ip4-cache "$CACHE_SIZE" $EXTRA_ARGS
+pv temp/all_resolved.txt | ./feed_dns.py massdns
--- a/rules/first-party.list
+++ b/rules/first-party.list
@ -12,17 +12,16 @@ storetail.io
 # Keyade
 keyade.com
 # Adobe Experience Cloud
-# https://experienceleague.adobe.com/docs/analytics/implementation/vars/config-vars/trackingserversecure.html?lang=en#ssl-tracking-server-in-adobe-experience-platform-launch
 omtrdc.net
 2o7.net
-data.adobedc.net
-sc.adobedc.net
+# ThreatMetrix
+online-metrix.net
 # Webtrekk
 wt-eu02.net
 webtrekk.net
 # Otto Group
 oghub.io
-# Intent Media
+# Intent.com
 partner.intentmedia.net
 # Wizaly
 wizaly.com
@ -30,62 +29,3 @@ wizaly.com
 tagcommander.com
 # Ingenious Technologies
 affex.org
-# TraceDock
-a351fec2c318c11ea9b9b0a0ae18fb0b-1529426863.eu-central-1.elb.amazonaws.com
-a5e652663674a11e997c60ac8a4ec150-1684524385.eu-central-1.elb.amazonaws.com
-a88045584548111e997c60ac8a4ec150-1610510072.eu-central-1.elb.amazonaws.com
-afc4d9aa2a91d11e997c60ac8a4ec150-2082092489.eu-central-1.elb.amazonaws.com
-# A8
-trck.a8.net
-# AD EBiS
-# https://prtimes.jp/main/html/rd/p/000000215.000009812.html
-ebis.ne.jp
-# GENIEE
-genieesspv.jp
-# SP-Prod
-sp-prod.net
-# Act-On Software
-actonsoftware.com
-actonservice.com
-# eum-appdynamics.com
-eum-appdynamics.com
-# Extole
-extole.io
-extole.com
-# Eloqua
-hs.eloqua.com
-# segment.com
-xid.segment.com
-# exponea.com
-exponea.com
-# adclear.net
-adclear.net
-# contentsfeed.com
-contentsfeed.com
-# postaffiliatepro.com
-postaffiliatepro.com
-# Sugar Market (Salesfusion)
-msgapp.com
-# Exactag
-exactag.com
-# GMO Internet Group
-ad-cloud.jp
-# Pardot
-pardot.com
-# Fathom
-# https://usefathom.com/docs/settings/custom-domains
-starman.fathomdns.com
-# Lead Forensics
-# https://www.reddit.com/r/pihole/comments/g7qv3e/leadforensics_tracking_domains_blacklist/
-# No real-world data but the website doesn't hide what it does
-ghochv3eng.trafficmanager.net
-# Branch.io
-thirdparty.bnc.lt
-# Plausible.io
-custom.plausible.io
-# DataUnlocker
-# Bit different as it is a proxy to non first-party trackers scripts
-# but it fits I guess.
-smartproxy.dataunlocker.com
-# SAS
-ci360.sas.com
--- a/rules_asn/first-party.txt
+++ b/rules_asn/first-party.txt
@ -4,7 +4,7 @@ AS50234
 AS44788
 AS19750
 AS55569
+# ThreatMetrix
+AS30286
 # Webtrekk
 AS60164
-# Act-On Software
-AS393648
--- a/rules_ip/first-party.txt
+++ b/rules_ip/first-party.txt
--- a/run_tests.py
+++ b/run_tests.py
@ -5,71 +5,30 @@ import os
 import logging
 import csv

-TESTS_DIR = "tests"
+TESTS_DIR = 'tests'

-if __name__ == "__main__":
+if __name__ == '__main__':

    DB = database.Database()
-    log = logging.getLogger("tests")
+    log = logging.getLogger('tests')

    for filename in os.listdir(TESTS_DIR):
-        if not filename.lower().endswith(".csv"):
-            continue
        log.info("")
        log.info("Running tests from %s", filename)
        path = os.path.join(TESTS_DIR, filename)
-        with open(path, "rt") as fdesc:
-            count_ent = 0
-            count_all = 0
-            count_den = 0
-            pass_ent = 0
-            pass_all = 0
-            pass_den = 0
+        with open(path, 'rt') as fdesc:
            reader = csv.DictReader(fdesc)
            for test in reader:
-                log.debug("Testing %s (%s)", test["url"], test["comment"])
-                count_ent += 1
-                passed = True
+                log.info("Testing %s (%s)", test['url'], test['comment'])

-                for allow in test["allow"].split(":"):
-                    if not allow:
+                for white in test['white'].split(':'):
+                    if not white:
                        continue
-                    count_all += 1
-                    if any(DB.get_domain(allow)):
-                        log.error("False positive: %s", allow)
-                        passed = False
-                    else:
-                        pass_all += 1
+                    if any(DB.get_domain(white)):
+                        log.error("False positive: %s", white)

-                for deny in test["deny"].split(":"):
-                    if not deny:
+                for black in test['black'].split(':'):
+                    if not black:
                        continue
-                    count_den += 1
-                    if not any(DB.get_domain(deny)):
-                        log.error("False negative: %s", deny)
-                        passed = False
-                    else:
-                        pass_den += 1
-
-                if passed:
-                    pass_ent += 1
-            perc_ent = (100 * pass_ent / count_ent) if count_ent else 100
-            perc_all = (100 * pass_all / count_all) if count_all else 100
-            perc_den = (100 * pass_den / count_den) if count_den else 100
-            log.info(
-                (
-                    "%s: Entries %d/%d (%.2f%%)"
-                    " | Allow %d/%d (%.2f%%)"
-                    "| Deny %d/%d (%.2f%%)"
-                ),
-                filename,
-                pass_ent,
-                count_ent,
-                perc_ent,
-                pass_all,
-                count_all,
-                perc_all,
-                pass_den,
-                count_den,
-                perc_den,
-            )
+                    if not any(DB.get_domain(black)):
+                        log.error("False negative: %s", black)
--- a/tests/.gitignore
+++ b/tests/.gitignore
@ -1 +0,0 @@
-*.cache.csv
--- a/tests/false-positives.csv
+++ b/tests/false-positives.csv
@ -1,6 +1,5 @@
-url,allow,deny,comment
+url,white,black,comment
 https://support.apple.com,support.apple.com,,EdgeKey / AkamaiEdge
 https://www.pinterest.fr/,i.pinimg.com,,Cedexis
 https://www.tumblr.com/,66.media.tumblr.com,,ChiCDN
 https://www.skype.com/fr/,www.skype.com,,TrafficManager
-https://www.mitsubishicars.com/,www.mitsubishicars.com,,Tracking domain as reverse DNS
--- a/tests/first-party.csv
+++ b/tests/first-party.csv
@ -1,28 +1,10 @@
-url,allow,deny,comment
+url,white,black,comment
 https://www.red-by-sfr.fr/,static.s-sfr.fr,nrg.red-by-sfr.fr,Eulerian
 https://www.cbc.ca/,,smetrics.cbc.ca,2o7 | Ominuture | Adobe Experience Cloud
+https://www.discover.com/,,content.discover.com,ThreatMetrix
 https://www.mytoys.de/,,web.mytoys.de,Webtrekk
 https://www.baur.de/,,tp.baur.de,Otto Group
 https://www.liligo.com/,,compare.liligo.com,???
 https://www.boulanger.com/,,tag.boulanger.fr,TagCommander
 https://www.airfrance.fr/FR/,,tk.airfrance.fr,Wizaly
 https://www.vsgamers.es/,,marketing.net.vsgamers.es,Affex
-https://www.vacansoleil.fr/,,tdep.vacansoleil.fr,TraceDock
-https://www.ozmall.co.jp/,,js.enhance.co.jp,GENIEE
-https://www.thetimes.co.uk/,,cmp.thetimes.co.uk,SP-Prod
-https://agilent.com/,,seahorseinfo.agilent.com,Act-On Software
-https://halifax.co.uk/,,cem.halifax.co.uk,eum-appdynamics.com
-https://www.reallygoodstuff.com/,,refer.reallygoodstuff.com,Extole
-https://unity.com/,,eloqua-trackings.unity.com,Eloqua
-https://www.notino.gr/,,api.campaigns.notino.com,Exponea
-https://www.mytoys.de/,,0815.mytoys.de.adclear.net
-https://www.imbc.com/,,ads.imbc.com.contentsfeed.com
-https://www.cbdbiocare.com/,,affiliate.cbdbiocare.com,postaffiliatepro.com
-https://www.seatadvisor.com/,,marketing.seatadvisor.com,Sugar Market (Salesfusion)
-https://www.tchibo.de/,,tagm.tchibo.de,Exactag
-https://www.bouygues-immobilier.com/,,go.bouygues-immobilier.fr,Pardot
-https://caddyserver.com/,,mule.caddysever.com,Fathom
-Reddit.com mail notifications,,click.redditmail.com,Branch.io
-https://www.phpliveregex.com/,,yolo.phpliveregex.xom,Plausible.io
-https://www.earthclassmail.com/,,1avhg3kanx9.www.earthclassmail.com,DataUnlocker
-https://paulfredrick.com/,,execution-ci360.paulfredrick.com,SAS
--- a/validate_list.py
+++ b/validate_list.py
@ -29,7 +29,7 @@ if __name__ == '__main__':
    args = parser.parse_args()

    for line in args.input:
-        line = line[:-1].lower()
+        line = line.strip()
        if (args.domain and database.Database.validate_domain(line)) or \
                (args.ip4 and database.Database.validate_ip4address(line)):
            print(line, file=args.output)