Workflow: JSON parser acceleration

Sadly is even worse because of the ctypes-induced conversions.
2019-12-09 10:42:37 +01:00
42 changed files with 1154 additions and 2026 deletions
--- a/.env.default
+++ b/.env.default
@ -1,6 +0,0 @@
 CACHE_SIZE=536870912
 MASSDNS_HASHMAP_SIZE=1000
 PROFILE=0
 SINGLE_PROCESS=0
 MASSDNS_BINARY=massdns
 RESOLVERS_REGION=us
--- a/.gitignore
+++ b/.gitignore
@ -1,5 +1,7 @@
 *.log
-*.p
+*.db
-.env
+*.db-journal
-__pycache__
+nameservers
-explanations
+nameservers.head
 *.o
 *.so
--- a/21
+++ b/21
@ -1,21 +0,0 @@
 MIT License
 Copyright (c) 2019 Geoffrey 'Frogeye' Preud'homme
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal
 in the Software without restriction, including without limitation the rights
 to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 copies of the Software, and to permit persons to whom the Software is
 furnished to do so, subject to the following conditions:
 The above copyright notice and this permission notice shall be included in all
 copies or substantial portions of the Software.
 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 SOFTWARE.
--- a/5
+++ b/5
@ -0,0 +1,5 @@
 libaccel.so: accel.o
 	clang -shared -Wl,-soname,libaccel.so -o libaccel.so  accel.o
 accel.o: accel.c
 	clang -c -fPIC -O3 accel.c -o accel.o
--- a/README.md
+++ b/README.md
@ -1,162 +1,92 @@
 # eulaurarien
-This program is able to generate a list of every hostnames being a DNS redirection to a list of DNS zones and IP networks.
+Generates a host list of first-party trackers for ad-blocking.
-It is primarilyy used to generate [Geoffrey Frogeye's block list of first-party trackers](https://hostfiles.frogeye.fr) (learn about first-party trackers by following this link).
+The latest list is available here: <https://hostfiles.frogeye.fr/firstparty-trackers-hosts.txt>
-If you want to contribute but don't want to create an account on this forge, contact me the way you like: <https://geoffrey.frogeye.fr>
+**DISCLAIMER:** I'm by no way an expert on this subject so my vocabulary or other stuff might be wrong. Use at your own risk.
-## How does this work
+## What's a first-party tracker?
-This program takes as input:
+Traditionally, websites load trackers scripts directly.
 For example, `website1.com` and `website2.com` both load `https://trackercompany.com/trackerscript.js` to track their users.
 In order to block those, one can simply block the host `trackercompany.com`.
- Lists of hostnames to match
+However, to circumvent this easy block, tracker companies made the website using them load trackers from `somethingirelevant.website1.com`.
- Lists of DNS zone to match (a domain and their subdomains)
+The latter being a DNS redirection to `website1.trackercompany.com`, directly pointing to a server serving the tracking script.
- Lists of IP address / IP networks to match
+Those are the first-party trackers.
 - Lists of Autonomous System numbers to match
 - An enormous quantity of DNS records
-It will be able to output hostnames being a DNS redirection to any item in the lists provided.
+Blocking `trackercompany.com` doesn't work any more, and blocking `*.trackercompany.com` isn't really possible since:
-DNS records can be locally resolved from a list of subdomains using [MassDNS](https://github.com/blechschmidt/massdns).
+1. Most ad-blocker don't support wildcards
 2. It's a DNS redirection, meaning that most ad-blockers will only see `somethingirelevant.website1.com`
-Those subdomains can either be provided as is, come from [Cisco Umbrella Popularity List](http://s3-us-west-1.amazonaws.com/umbrella-static/index.html), from your browsing history, or from analyzing the traffic a web browser makes when opening an URL (the program provides utility to do all that).
+So the only solution is to block every `somethingirelevant.website1.com`-like subdomains known, which is a lot.
 That's where this scripts comes in, to generate a list of such subdomains.
 ## How does this script work
 It takes an input a list of websites with trackers included.
 So far, this list is manually-generated from the list of clients of such first-party trackers
 (latter we should use a general list of websites to be more exhaustive).
 It open each ones of those websites (just the homepage) in a web browser, and record the domains of the network requests the page makes.
 Additionaly, or alternatively, you can feed the script some browsing history and get domains from there.
 It then find the DNS redirections of those domains, and compare with regexes of known tracking domains.
 It finally outputs the matching ones.
 ## Requirements
 Just to build the list, you can find an already-built list in the releases.
 - Bash
 - [Python 3.4+](https://www.python.org/)
 - [progressbar2](https://pypi.org/project/progressbar2/)
 - dnspython
 - [A Python wrapper for re2](https://pypi.org/project/google-re2/) (optional, just speeds things up)
 (if you don't want to collect the subdomains, you can skip the following) 
 - Firefox
 - Selenium
 - seleniumwire
 ## Usage
-Remember you can get an already generated and up-to-date list of first-party trackers from [here](https://hostfiles.frogeye.fr).
+This is only if you want to build the list yourself.
 If you just want to use the list, the latest build is available here: <https://hostfiles.frogeye.fr/firstparty-trackers-hosts.txt>
 It was build using additional sources not included in this repository for privacy reasons.
-The following is for the people wanting to build their own list.
+### Add personal sources
-### Requirements
+The list of websites provided in this script is by no mean exhaustive,
-
+so adding your own browsing history will help create a better list.
 Depending on the sources you'll be using to generate the list, you'll need to install some of the following:
 - [Bash](https://www.gnu.org/software/bash/bash.html)
 - [Coreutils](https://www.gnu.org/software/coreutils/)
 - [Gawk](https://www.gnu.org/software/gawk/)
 - [curl](https://curl.haxx.se)
 - [pv](http://www.ivarch.com/programs/pv.shtml)
 - [Python 3.4+](https://www.python.org/)
 - [coloredlogs](https://pypi.org/project/coloredlogs/) (sorry I can't help myself)
 - [numpy](https://www.numpy.org/)
 - [python-abp](https://pypi.org/project/python-abp/) (only if you intend to use AdBlock rules as a rule source)
 - [massdns](https://github.com/blechschmidt/massdns) in your `$PATH` (only if you have subdomains as a source)
 - [Firefox](https://www.mozilla.org/firefox/) (only if you have websites as a source)
 - [selenium (Python bindings)](https://pypi.python.org/pypi/selenium) (only if you have websites as a source)
 - [selenium-wire](https://pypi.org/project/selenium-wire/) (only if you have websites as a source)
 - [markdown2](https://pypi.org/project/markdown2/) (only if you intend to generate the index webpage)
 ### Create a new database
 The so-called database (in the form of `blocking.p`) is a file storing all the matching entities (ASN, IPs, hostnames, zones…) and every entity leading to it.
 It exists because the list cannot be generated in one pass, as DNS redirections chain links do not have to be inputed in order.
 You can purge of old records the database by running `./prune.sh`.
 When you remove a source of data, remove its corresponding file in `last_updates` to fix the pruning process.
 ### Gather external sources
 External sources are not stored in this repository.
 You'll need to fetch them by running `./fetch_resources.sh`.
 Those include:
 - Third-party trackers lists
 - TLD lists (used to test the validity of hostnames)
 - List of public DNS resolvers (for DNS resolving from subdomains)
 - Top 1M subdomains
 ### Import rules into the database
 You need to put the lists of rules for matching in the different subfolders:
 - `rules`: Lists of DNS zones
 - `rules_ip`: Lists of IP networks (for IP addresses append `/32`)
 - `rules_asn`: Lists of Autonomous Systems numbers (IP ranges will be deducted from them)
 - `rules_adblock`: Lists of DNS zones, but in the form of AdBlock lists (only the ones concerning domains will be extracted)
 - `rules_hosts`: Lists of DNS zones, but in the form of hosts lists
 See the provided examples for syntax.
 In each folder:
 - `first-party.ext` will be the only files considered for the first-party variant of the list
 - `*.cache.ext` are from external sources, and thus might be deleted / overwrote
 - `*.custom.ext` are for sources that you don't want commited
 Then, run `./import_rules.sh`.
 If you removed rules and you want to remove every record depending on those rules immediately,
 run the following command:
 ```
 ./db.py --prune --prune-before "$(cat "last_updates/rules.txt")" --prune-base
 ```
 ### Add subdomains
 If you plan to resolve DNS records yourself (as the DNS records datasets are not exhaustive),
 the top 1M subdomains provided might not be enough.
 You can add them into the `subdomains` folder.
 It follows the same specificities as the rules folder for `*.cache.ext` and `*.custom.ext` files.
 #### Add personal sources
 Adding your own browsing history will help create a more suited subdomains list.
 Here's reference command for possible sources:
 - **Pi-hole**: `sqlite3 /etc/pihole-FTL.db "select distinct domain from queries" > /path/to/eulaurarien/subdomains/my-pihole.custom.list`
 - **Firefox**: `cp ~/.mozilla/firefox/<your_profile>.default/places.sqlite temp; sqlite3 temp "select distinct rev_host from moz_places" | rev | sed 's|^\.||' > /path/to/eulaurarien/subdomains/my-firefox.custom.list; rm temp`
-#### Collect subdomains from websites
+### Collect subdomains from websites
-You can add the websites URLs into the `websites` folder.
+Just run `collect_subdomain.sh`.
 It follows the same specificities as the rules folder for `*.cache.ext` and `*.custom.ext` files.
 Then, run `collect_subdomain.sh`.
 This is a long step, and might be memory-intensive from time to time.
-> **Note:** For first-party tracking, a list of subdomains issued from the websites in the repository is avaliable here: <https://hostfiles.frogeye.fr/from_websites.cache.list> 
+This step is optional if you already added personal sources.
 Alternatively, you can get just download the list of subdomains used to generate the official block list here: <https://hostfiles.frogeye.fr/from_websites.cache.list> (put it in the `subdomains` folder).
-### Resolve DNS records
+### Extract tracking domains
-Once you've added subdomains, you'll need to resolve them to get their DNS records.
+Make sure your system is configured with a DNS server without limitation.
-The program will use a list of public nameservers to do that, but you can add your own in the `nameservers` directory.
+Then, run `filter_subdomain.sh`.
 The files you need will be in the folder `dist`.
-Then, run `./resolve_subdomains.sh`.
+## Contributing
 Note that this is a network intensive process, not in term of bandwith, but in terms of packet number.
-> **Note:** Some VPS providers might detect this as a DDoS attack and cut the network access.
+### Adding websites
 > Some Wi-Fi connections can be rendered unusable for other uses, some routers might cease to work.
 > Since massdns does not support yet rate limiting, my best bet was a Raspberry Pi with a slow ethernet link (Raspberry Pi < 4).
-The DNS records will automatically be imported into the database.
+Just add the URL to the relevant list: `websites/<source>.list`.
 If you want to re-import the records without re-doing the resolving, just run the last line of the `./resolve_subdomains.sh` script.
-### Export the lists
+### Adding first-party trackers regex
-For the tracking list, use `./export_lists.sh`, the output will be in the `dist` folder (please change the links before distributing them).
+Just add them to `regexes.py`.
 For other purposes, tinker with the `./export.py` program.
 #### Explanations
 Note that if you created an `explanations` folder at the root of the project, a file with a timestamp will be created in it.
 It contains every rule in the database and the reason of their presence (i.e. their dependency).
 This might be useful to track changes between runs.
 Every rule has an associated tag with four components:
 1. A number: the level of the rule (1 if it is a rule present in the `rules*` folders)
 2. A letter: `F` if first-party, `M` if multi-party.
 3. A letter: `D` if a dupplicate (e.g. `foo.bar.com` if `*.bar.com` is already a rule), `_` if not.
 4. A number: the number of rules relying on this one
 ### Generate the index webpage
 This is the one served on <https://hostfiles.frogeye.fr>.
 Just run `./generate_index.py`.
 ### Everything
 Once you've made sure every step runs fine, you can use `./eulaurarien.sh` to run every step consecutively.
--- a/accel.c
+++ b/accel.c
@ -0,0 +1,101 @@
 #include <stdlib.h>
 #include <stdio.h>
 char ip4_flat(char* value, wchar_t* flat)
 {
    unsigned char value_index = 0;
    unsigned char octet_index = 0;
    unsigned char octet_value = 0;
    char flat_index;
    unsigned char value_chara;
    do {
        value_chara = value[value_index];
        if (value_chara >= '0' && value_chara <= '9') {
            octet_value *= 10;
            octet_value += value_chara - '0';
        } else if (value_chara == '.') {
            for (flat_index = (octet_index+1)*8-1; flat_index >= octet_index*8; flat_index--) {
                flat[flat_index] = '0' + (octet_value & 1);
                octet_value >>= 1;
            }
            octet_index++;
            octet_value = 0;
        } else if (value_chara == '\0') {
            if (octet_index != 3) {
                return 1;
            }
            for (flat_index = 31; flat_index >= 24; flat_index--) {
                flat[flat_index] = '0' + (octet_value & 1);
                octet_value >>= 1;
            }
            return 0;
        } else {
            return 1;
        }
        value_index++;
    } while (1); // This ugly thing save one comparison
    return 1;
 }
 #define MAX_OUTPUT 255
 char feed_dns_parse_json(char* line, char* name, char* value)
 {
    unsigned short line_index = 0;
    unsigned char quote_index = 0;
    unsigned char output_index = 0;
    char line_chara;
    char* current_output = NULL;
    char type = 0; // 0: error, 1: cname, 2: a, 3: aaaa
    do {
        line_chara = line[line_index];
        if (line_chara == '"') {
            quote_index += 1;
            switch (quote_index) {
                case 7: // Start of name
                    current_output = name;
                    break;
                case 8: // End of name
                    name[output_index] = '\0';
                    current_output = NULL;
                    break;
                case 11: // Start of type
                    line_chara = line[++line_index];
                    if (line_chara == 'c') { // Must be CNAME
                        type = 1;
                        break;
                    } else if (line_chara == 'a') { // A or AAAA
                        line_chara = line[++line_index];
                        if (line[line_chara+2] == '"') { // Is A
                            type = 2;
                            quote_index++;
                            break;
                        } else if (line[line_chara+1] == 'a') { // Must be AAAA
                            type = 3;
                            break;
                        }
                    }
                    return 0;
                case 15: // Start of value
                    current_output = value;
                    break;
                case 16: // End of value
                    value[output_index] = '\0';
                    return type;
            }
            output_index = 0;
        } else if (line_chara == '\0') {
            return 0;
        } else {
            if (current_output != 0) {
                if (output_index >= MAX_OUTPUT) {
                    return 0;
                }
                current_output[output_index] = line_chara;
                output_index++;
            }
        }
        line_index++;
    } while (1); // This ugly thing save one comparison
    return 0;
 }
--- a/adblock_to_domain_list.py
+++ b/adblock_to_domain_list.py
@ -16,36 +16,25 @@ import abp.filters
 def get_domains(rule: abp.filters.parser.Filter) -> typing.Iterable[str]:
    if rule.options:
        return
-    selector_type = rule.selector["type"]
+    selector_type = rule.selector['type']
-    selector_value = rule.selector["value"]
+    selector_value = rule.selector['value']
-    if (
+    if selector_type == 'url-pattern' \
-        selector_type == "url-pattern"
+            and selector_value.startswith('||') \
-        and selector_value.startswith("||")
+            and selector_value.endswith('^'):
        and selector_value.endswith("^")
    ):
        yield selector_value[2:-1]
-if __name__ == "__main__":
+if __name__ == '__main__':
    # Parsing arguments
    parser = argparse.ArgumentParser(
-        description="Extract whole domains from an AdBlock blocking list"
+        description="Extract whole domains from an AdBlock blocking list")
    )
    parser.add_argument(
-        "-i",
+        '-i', '--input', type=argparse.FileType('r'), default=sys.stdin,
-        "--input",
+        help="Input file with AdBlock rules")
        type=argparse.FileType("r"),
        default=sys.stdin,
        help="Input file with AdBlock rules",
    )
    parser.add_argument(
-        "-o",
+        '-o', '--output', type=argparse.FileType('w'), default=sys.stdout,
-        "--output",
+        help="Outptut file with one rule tracking subdomain per line")
        type=argparse.FileType("w"),
        default=sys.stdout,
        help="Outptut file with one rule tracking subdomain per line",
    )
    args = parser.parse_args()
    # Reading rules
--- a/collect_subdomains.py
+++ b/collect_subdomains.py
@ -14,28 +14,6 @@ import time
 import progressbar
 import selenium.webdriver.firefox.options
 import seleniumwire.webdriver
 import logging
 log = logging.getLogger("cs")
 DRIVER = None
 SCROLL_TIME = 10.0
 SCROLL_STEPS = 100
 SCROLL_CMD = f"window.scrollBy(0,document.body.scrollHeight/{SCROLL_STEPS})"
 def new_driver() -> seleniumwire.webdriver.browser.Firefox:
    profile = selenium.webdriver.FirefoxProfile()
    profile.set_preference("privacy.trackingprotection.enabled", False)
    profile.set_preference("network.cookie.cookieBehavior", 0)
    profile.set_preference("privacy.trackingprotection.pbmode.enabled", False)
    profile.set_preference("privacy.trackingprotection.cryptomining.enabled", False)
    profile.set_preference("privacy.trackingprotection.fingerprinting.enabled", False)
    options = selenium.webdriver.firefox.options.Options()
    # options.add_argument('-headless')
    driver = seleniumwire.webdriver.Firefox(
        profile, executable_path="geckodriver", options=options
    )
    return driver
 def subdomain_from_url(url: str) -> str:
@ -51,36 +29,34 @@ def collect_subdomains(url: str) -> typing.Iterable[str]:
    Load an URL into an headless browser and return all the domains
    it tried to access.
    """
-    global DRIVER
+    options = selenium.webdriver.firefox.options.Options()
-    if not DRIVER:
+    options.add_argument('-headless')
-        DRIVER = new_driver()
+    driver = seleniumwire.webdriver.Firefox(
        executable_path='geckodriver', options=options)
-    try:
+    driver.get(url)
-        DRIVER.get(url)
+    time.sleep(10)
-        for s in range(SCROLL_STEPS):
+    for request in driver.requests:
            DRIVER.execute_script(SCROLL_CMD)
            time.sleep(SCROLL_TIME / SCROLL_STEPS)
        for request in DRIVER.requests:
        if request.response:
            yield subdomain_from_url(request.path)
-    except Exception:
+    driver.close()
        log.exception("Error")
        DRIVER.quit()
        DRIVER = None
 def collect_subdomains_standalone(url: str) -> None:
    url = url.strip()
    if not url:
        return
    try:
        for subdomain in collect_subdomains(url):
            print(subdomain)
    except:
        pass
-if __name__ == "__main__":
+if __name__ == '__main__':
    assert len(sys.argv) <= 2
    filename = None
-    if len(sys.argv) == 2 and sys.argv[1] != "-":
+    if len(sys.argv) == 2 and sys.argv[1] != '-':
        filename = sys.argv[1]
        num_lines = sum(1 for line in open(filename))
        iterator = progressbar.progressbar(open(filename), max_value=num_lines)
@ -90,8 +66,5 @@ if __name__ == "__main__":
    for line in iterator:
        collect_subdomains_standalone(line)
    if DRIVER:
        DRIVER.quit()
    if filename:
        iterator.close()
--- a/database.py
+++ b/database.py
--- a/database_schema.sql
+++ b/database_schema.sql
@ -0,0 +1,23 @@
 -- Remember to increment DB_VERSION
 -- in database.py on changes to this file
 CREATE TABLE blocking (
    key TEXT PRIMARY KEY, -- Contains the reversed domain name or IP in binary form
    source TEXT, -- The rule this one is based on
    type INTEGER, -- Type of the field: 1: AS, 2: domain tree, 3: domain, 4: IPv4 network, 6: IPv6 network
    updated INTEGER, -- If the row was updated during last data import (0: No, 1: Yes)
    firstparty INTEGER, -- Which blocking list this row is issued from (0: first-party, 1: multi-party)
    -- refs INTEGER, -- Which blocking list this row is issued from (0: first-party, 1: multi-party)
    FOREIGN KEY (source) REFERENCES blocking(key) ON DELETE CASCADE
 );
 CREATE INDEX "blocking_type_updated_key" ON "blocking" (
    "type",
    "updated",
    "key"    DESC
 );
 -- Store various things
 CREATE TABLE meta (
    key TEXT PRIMARY KEY,
    value integer
 );
--- a/db.py
+++ b/db.py
@ -1,54 +0,0 @@
 #!/usr/bin/env python3
 import argparse
 import database
 import time
 import os
 if __name__ == "__main__":
    # Parsing arguments
    parser = argparse.ArgumentParser(description="Database operations")
    parser.add_argument(
        "-i", "--initialize", action="store_true", help="Reconstruct the whole database"
    )
    parser.add_argument(
        "-p", "--prune", action="store_true", help="Remove old entries from database"
    )
    parser.add_argument(
        "-b",
        "--prune-base",
        action="store_true",
        help="With --prune, only prune base rules "
        "(the ones added by ./feed_rules.py)",
    )
    parser.add_argument(
        "-s",
        "--prune-before",
        type=int,
        default=(int(time.time()) - 60 * 60 * 24 * 31 * 6),
        help="With --prune, only rules updated before "
        "this UNIX timestamp will be deleted",
    )
    parser.add_argument(
        "-r",
        "--references",
        action="store_true",
        help="DEBUG: Update the reference count",
    )
    args = parser.parse_args()
    if not args.initialize:
        DB = database.Database()
    else:
        if os.path.isfile(database.Database.PATH):
            os.unlink(database.Database.PATH)
        DB = database.Database()
    DB.enter_step("main")
    if args.prune:
        DB.prune(before=args.prune_before, base_only=args.prune_base)
    if args.references:
        DB.update_references()
    DB.save()
--- a/dist/.gitignore
+++ b/dist/.gitignore
@ -1,2 +1 @@
 *.txt
 *.html
--- a/dist/README.md
+++ b/dist/README.md
@ -1,114 +0,0 @@
 # Geoffrey Frogeye's block list of first-party trackers
 ## What's a first-party tracker?
 A tracker is a script put on many websites to gather informations about the visitor.
 They can be used for multiple reasons: statistics, risk management, marketing, ads serving…
 In any case, they are a threat to Internet users' privacy and many may want to block them.
 Traditionnaly, trackers are served from a third-party.
 For example, `website1.com` and `website2.com` both load their tracking script from `https://trackercompany.com/trackerscript.js`.
 In order to block those, one can simply block the hostname `trackercompany.com`, which is what most ad blockers do.
 However, to circumvent this block, tracker companies made the websites using them load trackers from `somestring.website1.com`.
 The latter is a DNS redirection to `website1.trackercompany.com`, directly to an IP address belonging to the tracking company.
 Those are called first-party trackers.
 On top of aforementionned privacy issues, they also cause some security issue, as websites usually trust those scripts more.
 For more information, learn about [Content Security Policy](https://developer.mozilla.org/en-US/docs/Web/HTTP/CSP), [same-origin policy](https://developer.mozilla.org/en-US/docs/Web/Security/Same-origin_policy) and [Cross-Origin Resource Sharing](https://enable-cors.org/).
 In order to block those trackers, ad blockers would need to block every subdomain pointing to anything under `trackercompany.com` or to their network.
 Unfortunately, most don't support those blocking methods as they are not DNS-aware, e.g. they only see `somestring.website1.com`.
 This list is an inventory of every `somestring.website1.com` found to allow non DNS-aware ad blocker to still block first-party trackers.
 ### Learn more
 - [CNAME Cloaking, the dangerous disguise of third-party trackers](https://medium.com/nextdns/cname-cloaking-the-dangerous-disguise-of-third-party-trackers-195205dc522a) from NextDNS
 - [Trackers first-party](https://blog.imirhil.fr/2019/11/13/first-party-tracker.html) from Aeris, in french
 - [uBlock Origin issue](https://github.com/uBlockOrigin/uBlock-issues/issues/780)
 - [CNAME Cloaking and Bounce Tracking Defense](https://webkit.org/blog/11338/cname-cloaking-and-bounce-tracking-defense/) on WebKit's blog
 - [Characterizing CNAME cloaking-based tracking](https://blog.apnic.net/2020/08/04/characterizing-cname-cloaking-based-tracking/) on APNIC's webiste
 - [Characterizing CNAME Cloaking-Based Tracking on the Web](https://tma.ifip.org/2020/wp-content/uploads/sites/9/2020/06/tma2020-camera-paper66.pdf) is a research paper from Sokendai and ANSSI
 ## List variants
 ### First-party trackers
 **Recommended for hostfiles-based ad blockers, such as [Pi-hole](https://pi-hole.net/) (&lt;v5.0, as it introduced CNAME blocking).**
 **Recommended for Android ad blockers as applications, such ad [Blokada](https://blokada.org/).**
 - Hosts file: <https://hostfiles.frogeye.fr/firstparty-trackers-hosts.txt>
 - Raw list: <https://hostfiles.frogeye.fr/firstparty-trackers.txt>
 This list contains every hostname redirecting to [a hand-picked list of first-party trackers](https://git.frogeye.fr/geoffrey/eulaurarien/src/branch/master/rules/first-party.list).
 It should be safe from false-positives.
 It also contains all tracking hostnames under company domains (e.g. `website1.trackercompany.com`),
 useful for ad blockers that don't support mass regex blocking,
 while still preventing fallback to third-party trackers.
 Don't be afraid of the size of the list, as this is due to the nature of first-party trackers: a single tracker generates at least one hostname per client (typically two).
 ### First-party only trackers
 **Recommended for ad blockers as web browser extensions, such as [uBlock Origin](https://ublockorigin.com/) (&lt;v1.25.0 or for Chromium-based browsers, as it introduced CNAME uncloaking for Firefox).**
 - Hosts file: <https://hostfiles.frogeye.fr/firstparty-only-trackers-hosts.txt>
 - Raw list: <https://hostfiles.frogeye.fr/firstparty-only-trackers.txt>
 This is the same list as above, albeit not containing the hostnames under the tracking company domains (e.g. `website1.trackercompany.com`).
 This allows for reducing the size of the list for ad-blockers that already block those third-party trackers with their support of regex blocking.
 Use in conjunction with other block lists used in regex-mode, such as [Peter Lowe's](https://pgl.yoyo.org/adservers/)
 ### Multi-party trackers
 - Hosts file: <https://hostfiles.frogeye.fr/multiparty-trackers-hosts.txt>
 - Raw list: <https://hostfiles.frogeye.fr/multiparty-trackers.txt>
 As first-party trackers usually evolve from third-party trackers, this list contains every hostname redirecting to trackers found in existing lists of third-party trackers (see next section).
 Since the latter were not designed with first-party trackers in mind, they are likely to contain false-positives.
 On the other hand, they might protect against first-party tracker that we're not aware of / have not yet confirmed.
 #### Source of third-party trackers
 - [EasyPrivacy](https://easylist.to/easylist/easyprivacy.txt)
 - [AdGuard](https://github.com/AdguardTeam/AdguardFilters)
 (yes there's only two for now. A lot of existing ones cause a lot of false positives)
 ### Multi-party only trackers
 - Hosts file: <https://hostfiles.frogeye.fr/multiparty-only-trackers-hosts.txt>
 - Raw list: <https://hostfiles.frogeye.fr/multiparty-only-trackers.txt>
 This is the same list as above, albeit not containing the hostnames under the tracking company domains (e.g. `website1.trackercompany.com`).
 This allows for reducing the size of the list for ad-blockers that already block those third-party trackers with their support of regex blocking.
 Use in conjunction with other block lists used in regex-mode, such as the ones in the previous section.
 ## Meta
 In case of false positives/negatives, or any other question contact me the way you like: <https://geoffrey.frogeye.fr>
 The software used to generate this list is available here: <https://git.frogeye.fr/geoffrey/eulaurarien>
 ## Acknowledgements
 Some of the first-party tracker included in this list have been found by:
 - [Aeris](https://imirhil.fr/)
 - NextDNS and [their blocklist](https://github.com/nextdns/cname-cloaking-blocklist)'s contributors
 - Yuki2718 from [Wilders Security Forums](https://www.wilderssecurity.com/threads/ublock-a-lean-and-fast-blocker.365273/page-168#post-2880361)
 - Ha Dao, Johan Mazel, and Kensuke Fukuda, ["Characterizing CNAME Cloaking-Based Tracking on the Web", Proceedings of IFIP/IEEE Traffic Measurement Analysis Conference (TMA), 9 pages, 2020.](https://tma.ifip.org/2020/wp-content/uploads/sites/9/2020/06/tma2020-camera-paper66.pdf)
 - AdGuard and [their blocklist](https://github.com/AdguardTeam/cname-trackers)'s contributors
 The list was generated using data from
 - [Cisco Umbrella Popularity List](http://s3-us-west-1.amazonaws.com/umbrella-static/index.html)
 - [Public DNS Server List](https://public-dns.info/)
 Similar projects:
 - [NextDNS blocklist](https://github.com/nextdns/cname-cloaking-blocklist): for DNS-aware ad blockers
 - [Stefan Froberg's lists](https://www.orwell1984.today/cname/): subset of those lists grouped by tracker
 - [AdGuard blocklist](https://github.com/AdguardTeam/cname-trackers): same thing with a bigger scope, maintained by a bigger team
--- a/dist/markdown7.min.css
+++ b/dist/markdown7.min.css
@ -1,2 +0,0 @@
 /* Source: https://github.com/jasonm23/markdown-css-themes */
 body{font-family:Helvetica,arial,sans-serif;font-size:14px;line-height:1.6;padding-top:10px;padding-bottom:10px;background-color:#fff;padding:30px}body>:first-child{margin-top:0!important}body>:last-child{margin-bottom:0!important}a{color:#4183c4}a.absent{color:#c00}a.anchor{display:block;padding-left:30px;margin-left:-30px;cursor:pointer;position:absolute;top:0;left:0;bottom:0}h1,h2,h3,h4,h5,h6{margin:20px 0 10px;padding:0;font-weight:700;-webkit-font-smoothing:antialiased;cursor:text;position:relative}h1:hover a.anchor,h2:hover a.anchor,h3:hover a.anchor,h4:hover a.anchor,h5:hover a.anchor,h6:hover a.anchor{text-decoration:none}h1 code,h1 tt{font-size:inherit}h2 code,h2 tt{font-size:inherit}h3 code,h3 tt{font-size:inherit}h4 code,h4 tt{font-size:inherit}h5 code,h5 tt{font-size:inherit}h6 code,h6 tt{font-size:inherit}h1{font-size:28px;color:#000}h2{font-size:24px;border-bottom:1px solid #ccc;color:#000}h3{font-size:18px}h4{font-size:16px}h5{font-size:14px}h6{color:#777;font-size:14px}blockquote,dl,li,ol,p,pre,table,ul{margin:15px 0}hr{border:0 none;color:#ccc;height:4px;padding:0}body>h2:first-child{margin-top:0;padding-top:0}body>h1:first-child{margin-top:0;padding-top:0}body>h1:first-child+h2{margin-top:0;padding-top:0}body>h3:first-child,body>h4:first-child,body>h5:first-child,body>h6:first-child{margin-top:0;padding-top:0}a:first-child h1,a:first-child h2,a:first-child h3,a:first-child h4,a:first-child h5,a:first-child h6{margin-top:0;padding-top:0}h1 p,h2 p,h3 p,h4 p,h5 p,h6 p{margin-top:0}li p.first{display:inline-block}li{margin:0}ol,ul{padding-left:30px}ol :first-child,ul :first-child{margin-top:0}dl{padding:0}dl dt{font-size:14px;font-weight:700;font-style:italic;padding:0;margin:15px 0 5px}dl dt:first-child{padding:0}dl dt>:first-child{margin-top:0}dl dt>:last-child{margin-bottom:0}dl dd{margin:0 0 15px;padding:0 15px}dl dd>:first-child{margin-top:0}dl dd>:last-child{margin-bottom:0}blockquote{border-left:4px solid #ddd;padding:0 15px;color:#777}blockquote>:first-child{margin-top:0}blockquote>:last-child{margin-bottom:0}table{padding:0;border-collapse:collapse}table tr{border-top:1px solid #ccc;background-color:#fff;margin:0;padding:0}table tr:nth-child(2n){background-color:#f8f8f8}table tr th{font-weight:700;border:1px solid #ccc;margin:0;padding:6px 13px}table tr td{border:1px solid #ccc;margin:0;padding:6px 13px}table tr td :first-child,table tr th :first-child{margin-top:0}table tr td :last-child,table tr th :last-child{margin-bottom:0}img{max-width:100%}span.frame{display:block;overflow:hidden}span.frame>span{border:1px solid #ddd;display:block;float:left;overflow:hidden;margin:13px 0 0;padding:7px;width:auto}span.frame span img{display:block;float:left}span.frame span span{clear:both;color:#333;display:block;padding:5px 0 0}span.align-center{display:block;overflow:hidden;clear:both}span.align-center>span{display:block;overflow:hidden;margin:13px auto 0;text-align:center}span.align-center span img{margin:0 auto;text-align:center}span.align-right{display:block;overflow:hidden;clear:both}span.align-right>span{display:block;overflow:hidden;margin:13px 0 0;text-align:right}span.align-right span img{margin:0;text-align:right}span.float-left{display:block;margin-right:13px;overflow:hidden;float:left}span.float-left span{margin:13px 0 0}span.float-right{display:block;margin-left:13px;overflow:hidden;float:right}span.float-right>span{display:block;overflow:hidden;margin:13px auto 0;text-align:right}code,tt{margin:0 2px;padding:0 5px;white-space:nowrap;border:1px solid #eaeaea;background-color:#f8f8f8;border-radius:3px}pre code{margin:0;padding:0;white-space:pre;border:none;background:0 0}.highlight pre{background-color:#f8f8f8;border:1px solid #ccc;font-size:13px;line-height:19px;overflow:auto;padding:6px 10px;border-radius:3px}pre{background-color:#f8f8f8;border:1px solid #ccc;font-size:13px;line-height:19px;overflow:auto;padding:6px 10px;border-radius:3px}pre code,pre tt{background-color:transparent;border:none}sup{font-size:.83em;vertical-align:super;line-height:0}*{-webkit-print-color-adjust:exact}@media screen and (min-width:914px){body{width:854px;margin:0 auto}}@media print{pre,table{page-break-inside:avoid}pre{word-wrap:break-word}}
--- a/eulaurarien.sh
+++ b/eulaurarien.sh
@ -2,13 +2,8 @@
 # Main script for eulaurarien
 [ ! -f .env ] && touch .env
 ./fetch_resources.sh
 ./collect_subdomains.sh
 ./import_rules.sh
 ./resolve_subdomains.sh
-./prune.sh
+./filter_subdomains.sh
 ./export_lists.sh
 ./generate_index.py
--- a/export.py
+++ b/export.py
@ -1,91 +0,0 @@
 #!/usr/bin/env python3
 import database
 import argparse
 import sys
 if __name__ == "__main__":
    # Parsing arguments
    parser = argparse.ArgumentParser(
        description="Export the hostnames rules stored " "in the Database as plain text"
    )
    parser.add_argument(
        "-o",
        "--output",
        type=argparse.FileType("w"),
        default=sys.stdout,
        help="Output file, one rule per line",
    )
    parser.add_argument(
        "-f",
        "--first-party",
        action="store_true",
        help="Only output rules issued from first-party sources",
    )
    parser.add_argument(
        "-e",
        "--end-chain",
        action="store_true",
        help="Only output rules that are not referenced by any other",
    )
    parser.add_argument(
        "-r",
        "--rules",
        action="store_true",
        help="Output all kinds of rules, not just hostnames",
    )
    parser.add_argument(
        "-b",
        "--base-rules",
        action="store_true",
        help="Output base rules "
        "(the ones added by ./feed_rules.py) "
        "(implies --rules)",
    )
    parser.add_argument(
        "-d",
        "--no-dupplicates",
        action="store_true",
        help="Do not output rules that already match a zone/network rule "
        "(e.g. dummy.example.com when there's a zone example.com rule)",
    )
    parser.add_argument(
        "-x",
        "--explain",
        action="store_true",
        help="Show the chain of rules leading to one "
        "(and the number of references they have)",
    )
    parser.add_argument(
        "-c",
        "--count",
        action="store_true",
        help="Show the number of rules per type instead of listing them",
    )
    args = parser.parse_args()
    DB = database.Database()
    if args.count:
        assert not args.explain
        print(
            DB.count_records(
                first_party_only=args.first_party,
                end_chain_only=args.end_chain,
                no_dupplicates=args.no_dupplicates,
                rules_only=args.base_rules,
                hostnames_only=not (args.rules or args.base_rules),
            )
        )
    else:
        for domain in DB.list_records(
            first_party_only=args.first_party,
            end_chain_only=args.end_chain,
            no_dupplicates=args.no_dupplicates,
            rules_only=args.base_rules,
            hostnames_only=not (args.rules or args.base_rules),
            explain=args.explain,
        ):
            print(domain, file=args.output)
--- a/export_lists.sh
+++ b/export_lists.sh
@ -1,98 +0,0 @@
 #!/usr/bin/env bash
 function log() {
    echo -e "\033[33m$@\033[0m"
 }
 log "Calculating statistics…"
 oldest="$(cat last_updates/*.txt | sort -n | head -1)"
 oldest_date=$(date -Isec -d @$oldest)
 gen_date=$(date -Isec)
 gen_software=$(git describe --tags)
 number_websites=$(wc -l < temp/all_websites.list)
 number_subdomains=$(wc -l < temp/all_subdomains.list)
 number_dns=$(grep 'NOERROR' temp/all_resolved.txt | wc -l)
 for partyness in {first,multi}
 do
    if [ $partyness = "first" ]
    then
        partyness_flags="--first-party"
    else
        partyness_flags=""
    fi
    rules_input=$(./export.py --count --base-rules $partyness_flags)
    rules_found=$(./export.py --count --rules $partyness_flags)
    rules_found_nd=$(./export.py --count --rules --no-dupplicates $partyness_flags)
    echo
    echo "Statistics for ${partyness}-party trackers"
    echo "Input rules: $rules_input"
    echo "Subsequent rules: $rules_found"
    echo "Subsequent rules (no dupplicate): $rules_found_nd"
    echo "Output hostnames: $(./export.py --count $partyness_flags)"
    echo "Output hostnames (no dupplicate): $(./export.py --count --no-dupplicates $partyness_flags)"
    echo "Output hostnames (end-chain only): $(./export.py --count --end-chain $partyness_flags)"
    echo "Output hostnames (no dupplicate, end-chain only): $(./export.py --count --no-dupplicates --end-chain $partyness_flags)"
    for trackerness in {trackers,only-trackers}
    do
        if [ $trackerness = "trackers" ]
        then
            trackerness_flags=""
        else
            trackerness_flags="--no-dupplicates"
        fi
        file_list="dist/${partyness}party-${trackerness}.txt"
        file_host="dist/${partyness}party-${trackerness}-hosts.txt"
        log "Generating lists for variant ${partyness}-party ${trackerness}…"
        # Real export heeere
        ./export.py $partyness_flags $trackerness_flags > $file_list
        # Sometimes a bit heavy to have the DB open and sort the output
        # so this is done in two steps
        sort -u $file_list -o $file_list
        rules_output=$(./export.py --count $partyness_flags $trackerness_flags)
        (
            echo "# First-party trackers host list"
            echo "# Variant: ${partyness}-party ${trackerness}"
            echo "#"
            echo "# About first-party trackers: https://hostfiles.frogeye.fr/#whats-a-first-party-tracker"
            echo "#"
            echo "# In case of false positives/negatives, or any other question,"
            echo "# contact me the way you like: https://geoffrey.frogeye.fr"
            echo "#"
            echo "# Latest versions and variants: https://hostfiles.frogeye.fr/#list-variants"
            echo "# Source code: https://git.frogeye.fr/geoffrey/eulaurarien"
            echo "# License: https://git.frogeye.fr/geoffrey/eulaurarien/src/branch/master/LICENSE"
            echo "# Acknowledgements: https://hostfiles.frogeye.fr/#acknowledgements"
            echo "#"
            echo "# Generation software: eulaurarien $gen_software"
            echo "# List generation date: $gen_date"
            echo "# Oldest record: $oldest_date"
            echo "# Number of source websites: $number_websites"
            echo "# Number of source subdomains: $number_subdomains"
            echo "# Number of source DNS records: $number_dns"
            echo "#"
            echo "# Input rules: $rules_input"
            echo "# Subsequent rules: $rules_found"
            echo "# … no dupplicates: $rules_found_nd"
            echo "# Output rules: $rules_output"
            echo "#"
            echo
            sed 's|^|0.0.0.0 |' "$file_list"
        ) > "$file_host"
    done
 done
 if [ -d explanations ]
 then
    filename="$(date -Isec).txt"
    ./export.py --explain > "explanations/$filename"
    ln --force --symbolic "$filename" "explanations/latest.txt"
 fi
--- a/feed_asn.py
+++ b/feed_asn.py
@ -1,68 +0,0 @@
 #!/usr/bin/env python3
 import database
 import argparse
 import requests
 import typing
 import ipaddress
 import logging
 import time
 IPNetwork = typing.Union[ipaddress.IPv4Network, ipaddress.IPv6Network]
 def get_ranges(asn: str) -> typing.Iterable[str]:
    req = requests.get(
        "https://stat.ripe.net/data/as-routing-consistency/data.json",
        params={"resource": asn},
    )
    data = req.json()
    for pref in data["data"]["prefixes"]:
        yield pref["prefix"]
 def get_name(asn: str) -> str:
    req = requests.get(
        "https://stat.ripe.net/data/as-overview/data.json", params={"resource": asn}
    )
    data = req.json()
    return data["data"]["holder"]
 if __name__ == "__main__":
    log = logging.getLogger("feed_asn")
    # Parsing arguments
    parser = argparse.ArgumentParser(
        description="Add the IP ranges associated to the AS in the database"
    )
    args = parser.parse_args()
    DB = database.Database()
    def add_ranges(
        path: database.Path,
        match: database.Match,
    ) -> None:
        assert isinstance(path, database.AsnPath)
        assert isinstance(match, database.AsnNode)
        asn_str = database.Database.unpack_asn(path)
        DB.enter_step("asn_get_name")
        name = get_name(asn_str)
        match.name = name
        DB.enter_step("asn_get_ranges")
        for prefix in get_ranges(asn_str):
            parsed_prefix: IPNetwork = ipaddress.ip_network(prefix)
            if parsed_prefix.version == 4:
                DB.set_ip4network(prefix, source=path, updated=int(time.time()))
                log.info("Added %s from %s (%s)", prefix, path, name)
            elif parsed_prefix.version == 6:
                log.warning("Unimplemented prefix version: %s", prefix)
            else:
                log.error("Unknown prefix version: %s", prefix)
    for _ in DB.exec_each_asn(add_ranges):
        pass
    DB.save()
--- a/feed_dns.py
+++ b/feed_dns.py
@ -1,256 +1,53 @@
 #!/usr/bin/env python3
 import argparse
 import database
-import logging
+import argparse
 import sys
-import typing
+import ctypes
-import multiprocessing
+import json
 import time
-Record = typing.Tuple[typing.Callable, typing.Callable, int, str, str]
+ACCEL = ctypes.cdll.LoadLibrary('./libaccel.so')
 ACCEL_NAME_BUF = ctypes.create_string_buffer(b'Z'*255, 255)
 ACCEL_VALUE_BUF = ctypes.create_string_buffer(b'Z'*255, 255)
-# select, write
+FUNCTION_MAP = {
-FUNCTION_MAP: typing.Any = {
+    b'a': database.feed_a,
-    "a": (
+    b'cname': database.feed_cname,
        database.Database.get_ip4,
        database.Database.set_hostname,
    ),
    "cname": (
        database.Database.get_domain,
        database.Database.set_hostname,
    ),
    "ptr": (
        database.Database.get_domain,
        database.Database.set_ip4address,
    ),
 }
-
+if __name__ == '__main__':
 class Writer(multiprocessing.Process):
    def __init__(
        self,
        recs_queue: multiprocessing.Queue = None,
        autosave_interval: int = 0,
        ip4_cache: int = 0,
    ):
        if recs_queue:  # MP
            super(Writer, self).__init__()
            self.recs_queue = recs_queue
        self.log = logging.getLogger("wr")
        self.autosave_interval = autosave_interval
        self.ip4_cache = ip4_cache
        if not recs_queue:  # No MP
            self.open_db()
    def open_db(self) -> None:
        self.db = database.Database()
        self.db.log = logging.getLogger("wr")
        self.db.fill_ip4cache(max_size=self.ip4_cache)
    def exec_record(self, record: Record) -> None:
        self.db.enter_step("exec_record")
        select, write, updated, name, value = record
        try:
            for source in select(self.db, value):
                write(self.db, name, updated, source=source)
        except (ValueError, IndexError):
            # ValueError: non-number in IP
            # IndexError: IP too big
            self.log.exception("Cannot execute: %s", record)
    def end(self) -> None:
        self.db.enter_step("end")
        self.db.save()
    def run(self) -> None:
        self.open_db()
        if self.autosave_interval > 0:
            next_save = time.time() + self.autosave_interval
        else:
            next_save = 0
        self.db.enter_step("block_wait")
        block: typing.List[Record]
        for block in iter(self.recs_queue.get, None):
            assert block
            record: Record
            for record in block:
                self.exec_record(record)
            if next_save > 0 and time.time() > next_save:
                self.log.info("Saving database...")
                self.db.save()
                self.log.info("Done!")
                next_save = time.time() + self.autosave_interval
            self.db.enter_step("block_wait")
        self.end()
 class Parser:
    def __init__(
        self,
        buf: typing.Any,
        recs_queue: multiprocessing.Queue = None,
        block_size: int = 0,
        writer: Writer = None,
    ):
        assert bool(writer) ^ bool(block_size and recs_queue)
        self.buf = buf
        self.log = logging.getLogger("pr")
        self.recs_queue = recs_queue
        if writer:  # No MP
            self.prof: database.Profiler = writer.db
            self.register = writer.exec_record
        else:  # MP
            self.block: typing.List[Record] = list()
            self.block_size = block_size
            self.prof = database.Profiler()
            self.prof.log = logging.getLogger("pr")
            self.register = self.add_to_queue
    def add_to_queue(self, record: Record) -> None:
        self.prof.enter_step("register")
        self.block.append(record)
        if len(self.block) >= self.block_size:
            self.prof.enter_step("put_block")
            assert self.recs_queue
            self.recs_queue.put(self.block)
            self.block = list()
    def run(self) -> None:
        self.consume()
        if self.recs_queue:
            self.recs_queue.put(self.block)
        self.prof.profile()
    def consume(self) -> None:
        raise NotImplementedError
 class MassDnsParser(Parser):
    # massdns --output Snrql
    # --retry REFUSED,SERVFAIL --resolvers nameservers-ipv4
    TYPES = {
        "A": (FUNCTION_MAP["a"][0], FUNCTION_MAP["a"][1], -1, None),
        # 'AAAA': (FUNCTION_MAP['aaaa'][0], FUNCTION_MAP['aaaa'][1], -1, None),
        "CNAME": (FUNCTION_MAP["cname"][0], FUNCTION_MAP["cname"][1], -1, -1),
    }
    def consume(self) -> None:
        self.prof.enter_step("parse_massdns")
        timestamp = 0
        header = True
        for line in self.buf:
            line = line.rstrip()
            if not line:
                header = True
                continue
            split = line.split(" ")
            try:
                if header:
                    timestamp = int(split[1])
                    header = False
                else:
                    select, write, name_offset, value_offset = MassDnsParser.TYPES[
                        split[1]
                    ]
                    record = (
                        select,
                        write,
                        timestamp,
                        split[0][:name_offset].lower(),
                        split[2][:value_offset].lower(),
                    )
                    self.register(record)
                    self.prof.enter_step("parse_massdns")
            except KeyError:
                # Unhandle record type
                continue
            except IndexError as err:
                # Aborted file
                print(err)
                break
 PARSERS = {
    "massdns": MassDnsParser,
 }
 if __name__ == "__main__":
    # Parsing arguments
-    log = logging.getLogger("feed_dns")
+    parser = argparse.ArgumentParser(
-    args_parser = argparse.ArgumentParser(
+        description="TODO")
-        description="Read DNS records and import "
+    parser.add_argument(
-        "tracking-relevant data into the database"
+        '-i', '--input', type=argparse.FileType('rb'), default=sys.stdin.buffer,
-    )
+        help="TODO")
-    args_parser.add_argument("parser", choices=PARSERS.keys(), help="Input format")
+    args = parser.parse_args()
    args_parser.add_argument(
        "-i",
        "--input",
        type=argparse.FileType("r"),
        default=sys.stdin,
        help="Input file",
    )
    args_parser.add_argument(
        "-b", "--block-size", type=int, default=1024, help="Performance tuning value"
    )
    args_parser.add_argument(
        "-q", "--queue-size", type=int, default=128, help="Performance tuning value"
    )
    args_parser.add_argument(
        "-a",
        "--autosave-interval",
        type=int,
        default=900,
        help="Interval to which the database will save in seconds. " "0 to disable.",
    )
    args_parser.add_argument(
        "-s",
        "--single-process",
        action="store_true",
        help="Only use one process. " "Might be useful for single core computers.",
    )
    args_parser.add_argument(
        "-4",
        "--ip4-cache",
        type=int,
        default=0,
        help="RAM cache for faster IPv4 lookup. "
        "Maximum useful value: 512 MiB (536870912). "
        "Warning: Depending on the rules, this might already "
        "be a memory-heavy process, even without the cache.",
    )
    args = args_parser.parse_args()
-    parser_cls = PARSERS[args.parser]
+    database.open_db()
    if args.single_process:
        writer = Writer(
            autosave_interval=args.autosave_interval, ip4_cache=args.ip4_cache
        )
        parser = parser_cls(args.input, writer=writer)
        parser.run()
        writer.end()
    else:
        recs_queue: multiprocessing.Queue = multiprocessing.Queue(
            maxsize=args.queue_size
        )
-        writer = Writer(
+    line = b'(none)'
            recs_queue,
            autosave_interval=args.autosave_interval,
            ip4_cache=args.ip4_cache,
        )
        writer.start()
-        parser = parser_cls(
+    def err(name: bytes, value: bytes) -> None:
-            args.input, recs_queue=recs_queue, block_size=args.block_size
+        print(f"Error with line: {line!r}")
        )
        parser.run()
-        recs_queue.put(None)
+    FUNCTIONS = [err, database.feed_cname, database.feed_a]
-        writer.join()
+
    try:
        database.time_step('iowait')
        for line in args.input:
            database.time_step('feed_json_parse')
            dtype = ACCEL.feed_dns_parse_json(
                ctypes.c_char_p(line),
                ACCEL_NAME_BUF,
                ACCEL_VALUE_BUF
            )
            database.time_step('feed_switch')
            FUNCTIONS[dtype](ACCEL_NAME_BUF.value, ACCEL_VALUE_BUF.value)
            database.time_step('iowait')
    except KeyboardInterrupt:
        print("Interupted.")
        pass
    database.close_db()
--- a/feed_rules.py
+++ b/feed_rules.py
@ -3,59 +3,38 @@
 import database
 import argparse
 import sys
-import time
+import ipaddress
 import typing
 FUNCTION_MAP = {
    "zone": database.Database.set_zone,
    "hostname": database.Database.set_hostname,
    "asn": database.Database.set_asn,
    "ip4network": database.Database.set_ip4network,
    "ip4address": database.Database.set_ip4address,
 }
-if __name__ == "__main__":
+if __name__ == '__main__':
    # Parsing arguments
-    parser = argparse.ArgumentParser(description="Import base rules to the database")
+    parser = argparse.ArgumentParser(
        description="TODO")
    parser.add_argument(
-        "type", choices=FUNCTION_MAP.keys(), help="Type of rule inputed"
+        'type',
-    )
+        choices={'subdomains', 'ip4network'},
        help="Type of rule inputed")
    parser.add_argument(
-        "-i",
+        '-i', '--input', type=argparse.FileType('r'), default=sys.stdin,
-        "--input",
+        help="List of domains domains to block (with their subdomains)")
        type=argparse.FileType("r"),
        default=sys.stdin,
        help="File with one rule per line",
    )
    parser.add_argument(
-        "-f",
+        '-f', '--first-party', action='store_true',
-        "--first-party",
+        help="The input only comes from verified first-party sources")
        action="store_true",
        help="The input only comes from verified first-party sources",
    )
    args = parser.parse_args()
-    DB = database.Database()
+    database.open_db()
    fun = FUNCTION_MAP[args.type]
    source: database.RulePath
    if args.first_party:
        source = database.RuleFirstPath()
    else:
        source = database.RuleMultiPath()
    if args.type == 'subdomains':
        for rule in args.input:
-        rule = rule.strip()
+            database.feed_rule_subdomains(
-        try:
+                rule.strip(), first_party=args.first_party)
-            fun(
+    elif args.type == 'ip4network':
-                DB,
+        for rule in args.input:
-                rule,
+            network = ipaddress.ip_network(rule.strip())
-                source=source,
+            database.feed_rule_ip4network(
-                updated=int(time.time()),
+                network, first_party=args.first_party)
-            )
+    else:
-        except ValueError:
+        assert False
            DB.log.error(f"Could not add rule: {rule}")
-    DB.save()
+    database.close_db()
--- a/fetch_resources.sh
+++ b/fetch_resources.sh
@ -1,8 +1,5 @@
 #!/usr/bin/env bash
 source .env.default
 source .env
 function log() {
    echo -e "\033[33m$@\033[0m"
 }
@ -16,22 +13,30 @@ function dl() {
    fi
 }
 log "Retrieving tests…"
 rm -f tests/*.cache.csv
 dl https://raw.githubusercontent.com/fukuda-lab/cname_cloaking/master/Subdomain_CNAME-cloaking-based-tracking.csv temp/fukuda.csv
 (echo "url,allow,deny,comment"; tail -n +2 temp/fukuda.csv | awk -F, '{ print "https://" $2 "/,," $3 "," $5 }') > tests/fukuda.cache.csv
 log "Retrieving rules…"
 rm -f rules*/*.cache.*
 dl https://easylist.to/easylist/easyprivacy.txt rules_adblock/easyprivacy.cache.txt
-dl https://filters.adtidy.org/extension/chromium/filters/3.txt rules_adblock/adguard.cache.txt
+# From firebog.net Tracking & Telemetry Lists
-
+dl https://v.firebog.net/hosts/Prigent-Ads.txt rules/prigent-ads.cache.list
-log "Retrieving TLD list…"
+# dl https://gitlab.com/quidsup/notrack-blocklists/raw/master/notrack-blocklist.txt rules/notrack-blocklist.cache.list
-dl http://data.iana.org/TLD/tlds-alpha-by-domain.txt temp/all_tld.temp.list
+# False positives: https://github.com/WaLLy3K/wally3k.github.io/issues/73 -> 69.media.tumblr.com chicdn.net
-grep -v '^#' temp/all_tld.temp.list | awk '{print tolower($0)}' > temp/all_tld.list
+dl https://raw.githubusercontent.com/StevenBlack/hosts/master/data/add.2o7Net/hosts rules_hosts/add2o7.cache.txt
 dl https://raw.githubusercontent.com/crazy-max/WindowsSpyBlocker/master/data/hosts/spy.txt rules_hosts/spy.cache.txt
 # dl https://raw.githubusercontent.com/Kees1958/WS3_annual_most_used_survey_blocklist/master/w3tech_hostfile.txt rules/w3tech.cache.list
 # False positives: agreements.apple.com -> edgekey.net
 # dl https://www.github.developerdan.com/hosts/lists/ads-and-tracking-extended.txt rules_hosts/ads-and-tracking-extended.cache.txt # Lots of false-positives
 # dl https://raw.githubusercontent.com/Perflyst/PiHoleBlocklist/master/android-tracking.txt rules_hosts/android-tracking.cache.txt
 # dl https://raw.githubusercontent.com/Perflyst/PiHoleBlocklist/master/SmartTV.txt rules_hosts/smart-tv.cache.txt
 # dl https://raw.githubusercontent.com/Perflyst/PiHoleBlocklist/master/AmazonFireTV.txt rules_hosts/amazon-fire-tv.cache.txt
 log "Retrieving nameservers…"
-dl https://public-dns.info/nameserver/${RESOLVERS_REGION}.txt nameservers/public-dns.cache.list
+rm -f nameservers
 touch nameservers
 [ -f nameservers.head ] && cat nameservers.head >> nameservers
 dl https://public-dns.info/nameservers.txt nameservers.temp
 sort -R nameservers.temp >> nameservers
 rm nameservers.temp
 log "Retrieving top subdomains…"
 dl http://s3-us-west-1.amazonaws.com/umbrella-static/top-1m.csv.zip top-1m.csv.zip
@ -41,7 +46,7 @@ rm top-1m.csv top-1m.csv.zip
 if [ -f subdomains/cisco-umbrella_popularity.cache.list ]
 then
    cp subdomains/cisco-umbrella_popularity.cache.list temp/cisco-umbrella_popularity.old.list
-    pv -f temp/cisco-umbrella_popularity.old.list temp/cisco-umbrella_popularity.fresh.list | sort -u > subdomains/cisco-umbrella_popularity.cache.list
+    pv temp/cisco-umbrella_popularity.old.list temp/cisco-umbrella_popularity.fresh.list | sort -u > subdomains/cisco-umbrella_popularity.cache.list
    rm temp/cisco-umbrella_popularity.old.list temp/cisco-umbrella_popularity.fresh.list
 else
    mv temp/cisco-umbrella_popularity.fresh.list subdomains/cisco-umbrella_popularity.cache.list
--- a/filter_subdomains.py
+++ b/filter_subdomains.py
@ -0,0 +1,160 @@
 #!/usr/bin/env python3
 # pylint: disable=C0103
 """
 From a list of subdomains, output only
 the ones resolving to a first-party tracker.
 """
 import argparse
 import sys
 import progressbar
 import csv
 import typing
 import ipaddress
 # DomainRule = typing.Union[bool, typing.Dict[str, 'DomainRule']]
 DomainRule = typing.Union[bool, typing.Dict]
 # IpRule = typing.Union[bool, typing.Dict[int, 'DomainRule']]
 IpRule = typing.Union[bool, typing.Dict]
 RULES_DICT: DomainRule = dict()
 RULES_IP_DICT: IpRule = dict()
 def get_bits(address: ipaddress.IPv4Address) -> typing.Iterator[int]:
    for char in address.packed:
        for i in range(7, -1, -1):
            yield (char >> i) & 0b1
 def subdomain_matching(subdomain: str) -> bool:
    parts = subdomain.split('.')
    parts.reverse()
    dic = RULES_DICT
    for part in parts:
        if isinstance(dic, bool) or part not in dic:
            break
        dic = dic[part]
    if isinstance(dic, bool):
        return dic
    return False
 def ip_matching(ip_str: str) -> bool:
    ip = ipaddress.ip_address(ip_str)
    dic = RULES_IP_DICT
    i = 0
    for bit in get_bits(ip):
        i += 1
        if isinstance(dic, bool) or bit not in dic:
            break
        dic = dic[bit]
    if isinstance(dic, bool):
        return dic
    return False
 def get_matching(chain: typing.List[str], no_explicit: bool = False
                 ) -> typing.Iterable[str]:
    if len(chain) <= 1:
        return
    initial = chain[0]
    cname_destinations = chain[1:-1]
    a_destination = chain[-1]
    initial_matching = subdomain_matching(initial)
    if no_explicit and initial_matching:
        return
    cname_matching = any(map(subdomain_matching, cname_destinations))
    if cname_matching or initial_matching or ip_matching(a_destination):
        yield initial
 def register_rule(subdomain: str) -> None:
    # Make a tree with domain parts
    parts = subdomain.split('.')
    parts.reverse()
    dic = RULES_DICT
    last_part = len(parts) - 1
    for p, part in enumerate(parts):
        if isinstance(dic, bool):
            return
        if p == last_part:
            dic[part] = True
        else:
            dic.setdefault(part, dict())
            dic = dic[part]
 def register_rule_ip(network: str) -> None:
    net = ipaddress.ip_network(network)
    ip = net.network_address
    dic = RULES_IP_DICT
    last_bit = net.prefixlen - 1
    for b, bit in enumerate(get_bits(ip)):
        if isinstance(dic, bool):
            return
        if b == last_bit:
            dic[bit] = True
        else:
            dic.setdefault(bit, dict())
            dic = dic[bit]
 if __name__ == '__main__':
    # Parsing arguments
    parser = argparse.ArgumentParser(
        description="Filter first-party trackers from a list of subdomains")
    parser.add_argument(
        '-i', '--input', type=argparse.FileType('r'), default=sys.stdin,
        help="Input file with DNS chains")
    parser.add_argument(
        '-o', '--output', type=argparse.FileType('w'), default=sys.stdout,
        help="Outptut file with one tracking subdomain per line")
    parser.add_argument(
        '-n', '--no-explicit', action='store_true',
        help="Don't output domains already blocked with rules without CNAME")
    parser.add_argument(
        '-r', '--rules', type=argparse.FileType('r'),
        help="List of domains domains to block (with their subdomains)")
    parser.add_argument(
        '-p', '--rules-ip', type=argparse.FileType('r'),
        help="List of IPs ranges to block")
    args = parser.parse_args()
    # Progress bar
    widgets = [
        progressbar.Percentage(),
        ' ', progressbar.SimpleProgress(),
        ' ', progressbar.Bar(),
        ' ', progressbar.Timer(),
        ' ', progressbar.AdaptiveTransferSpeed(unit='req'),
        ' ', progressbar.AdaptiveETA(),
    ]
    progress = progressbar.ProgressBar(widgets=widgets)
    # Reading rules
    if args.rules:
        for rule in args.rules:
            register_rule(rule.strip())
    if args.rules_ip:
        for rule in args.rules_ip:
            register_rule_ip(rule.strip())
    # Approximating line count
    if args.input.seekable():
        lines = 0
        for line in args.input:
            lines += 1
        progress.max_value = lines
        args.input.seek(0)
    # Reading domains to filter
    reader = csv.reader(args.input)
    progress.start()
    for chain in reader:
        for match in get_matching(chain, no_explicit=args.no_explicit):
            print(match, file=args.output)
        progress.update(progress.value + 1)
    progress.finish()
--- a/filter_subdomains.sh
+++ b/filter_subdomains.sh
@ -0,0 +1,85 @@
 #!/usr/bin/env bash
 function log() {
    echo -e "\033[33m$@\033[0m"
 }
 if [ ! -f temp/all_resolved.csv ]
 then
    echo "Run ./resolve_subdomains.sh first!"
    exit 1
 fi
 # Gather all the rules for filtering
 log "Compiling rules…"
 cat rules_adblock/*.txt | grep -v '^!' | grep -v '^\[Adblock' | sort -u > temp/all_rules_adblock.txt
 ./adblock_to_domain_list.py --input temp/all_rules_adblock.txt --output rules/from_adblock.cache.list
 cat rules_hosts/*.txt | grep -v '^#' | grep -v '^$' | cut -d ' ' -f2 > rules/from_hosts.cache.list
 cat rules/*.list | grep -v '^#' | grep -v '^$' | sort -u > temp/all_rules_multi.list
 cat rules/first-party.list | grep -v '^#' | grep -v '^$' | sort -u > temp/all_rules_first.list
 cat rules_ip/*.txt | grep -v '^#' | grep -v '^$' | sort -u > temp/all_ip_rules_multi.txt
 cat rules_ip/first-party.txt | grep -v '^#' | grep -v '^$' | sort -u > temp/all_ip_rules_first.txt
 log "Filtering first-party tracking domains…"
 ./filter_subdomains.py --rules temp/all_rules_first.list --rules-ip temp/all_ip_rules_first.txt --input temp/all_resolved_sorted.csv --output temp/firstparty-trackers.list
 sort -u temp/firstparty-trackers.list > dist/firstparty-trackers.txt
 log "Filtering first-party curated tracking domains…"
 ./filter_subdomains.py --rules temp/all_rules_first.list --rules-ip temp/all_ip_rules_first.txt --input temp/all_resolved_sorted.csv --no-explicit --output temp/firstparty-only-trackers.list
 sort -u temp/firstparty-only-trackers.list > dist/firstparty-only-trackers.txt
 log "Filtering multi-party tracking domains…"
 ./filter_subdomains.py --rules temp/all_rules_multi.list --rules-ip temp/all_ip_rules_multi.txt --input temp/all_resolved_sorted.csv --output temp/multiparty-trackers.list
 sort -u temp/multiparty-trackers.list > dist/multiparty-trackers.txt
 log "Filtering multi-party curated tracking domains…"
 ./filter_subdomains.py --rules temp/all_rules_multi.list --rules-ip temp/all_ip_rules_multi.txt --input temp/all_resolved_sorted.csv --no-explicit --output temp/multiparty-only-trackers.list
 sort -u temp/multiparty-only-trackers.list > dist/multiparty-only-trackers.txt
 # Format the blocklist so it can be used as a hostlist
 function generate_hosts {
    basename="$1"
    description="$2"
    description2="$3"
    (
        echo "# First-party trackers host list"
        echo "# $description"
        echo "# $description2"
        echo "#"
        echo "# About first-party trackers: https://git.frogeye.fr/geoffrey/eulaurarien#whats-a-first-party-tracker"
        echo "# Source code: https://git.frogeye.fr/geoffrey/eulaurarien"
        echo "#"
        echo "# In case of false positives/negatives, or any other question,"
        echo "# contact me the way you like: https://geoffrey.frogeye.fr"
        echo "#"
        echo "# Latest version:"
        echo "# - First-party trackers  : https://hostfiles.frogeye.fr/firstparty-trackers-hosts.txt"
        echo "# - … excluding redirected: https://hostfiles.frogeye.fr/firstparty-only-trackers-hosts.txt"
        echo "# - First and third party : https://hostfiles.frogeye.fr/multiparty-trackers-hosts.txt"
        echo "# - … excluding redirected: https://hostfiles.frogeye.fr/multiparty-only-trackers-hosts.txt"
        echo "#"
        echo "# Generation date: $(date -Isec)"
        echo "# Generation software: eulaurarien $(git describe --tags)"
        echo "# Number of source websites: $(wc -l temp/all_websites.list | cut -d' ' -f1)"
        echo "# Number of source subdomains: $(wc -l temp/all_subdomains.list | cut -d' ' -f1)"
        echo "#"
        echo "# Number of known first-party trackers: $(wc -l temp/all_rules_first.list | cut -d' ' -f1)"
        echo "# Number of first-party subdomains: $(wc -l dist/firstparty-trackers.txt | cut -d' ' -f1)"
        echo "# … excluding redirected: $(wc -l dist/firstparty-only-trackers.txt | cut -d' ' -f1)"
        echo "#"
        echo "# Number of known multi-party trackers: $(wc -l temp/all_rules_multi.list | cut -d' ' -f1)"
        echo "# Number of multi-party subdomains: $(wc -l dist/multiparty-trackers.txt | cut -d' ' -f1)"
        echo "# … excluding redirected: $(wc -l dist/multiparty-only-trackers.txt | cut -d' ' -f1)"
        echo
        cat "dist/$basename.txt" | while read host;
        do
            echo "0.0.0.0 $host"
        done
    ) > "dist/$basename-hosts.txt"
 }
 generate_hosts "firstparty-trackers" "Generated from a curated list of first-party trackers" ""
 generate_hosts "firstparty-only-trackers" "Generated from a curated list of first-party trackers" "Only contain the first chain of redirection."
 generate_hosts "multiparty-trackers" "Generated from known third-party trackers." "Also contains trackers used as third-party."
 generate_hosts "multiparty-only-trackers" "Generated from known third-party trackers." "Do not contain trackers used in third-party. Use in combination with third-party lists."
--- a/generate_index.py
+++ b/generate_index.py
@ -1,25 +0,0 @@
 #!/usr/bin/env python3
 import markdown2
 extras = ["header-ids"]
 with open("dist/README.md", "r") as fdesc:
    body = markdown2.markdown(fdesc.read(), extras=extras)
 output = f"""<!DOCTYPE html>
 <html lang="en">
 <head>
 <title>Geoffrey Frogeye's block list of first-party trackers</title>
 <meta charset="utf-8">
 <meta name="author" content="Geoffrey 'Frogeye' Preud'homme" />
 <link rel="stylesheet" type="text/css" href="markdown7.min.css">
 </head>
 <body>
 {body}
 </body>
 </html>
 """
 with open("dist/index.html", "w") as fdesc:
    fdesc.write(output)
--- a/last_updates/.gitignore
+++ b/last_updates/.gitignore
@ -1 +0,0 @@
 *.txt
--- a/nameservers/.gitignore
+++ b/nameservers/.gitignore
@ -1,2 +0,0 @@
 *.custom.list
 *.cache.list
--- a/nameservers/popular.list
+++ b/nameservers/popular.list
@ -1,24 +0,0 @@
 8.8.8.8
 8.8.4.4
 2001:4860:4860:0:0:0:0:8888
 2001:4860:4860:0:0:0:0:8844
 208.67.222.222
 208.67.220.220
 2620:119:35::35
 2620:119:53::53
 4.2.2.1
 4.2.2.2
 8.26.56.26
 8.20.247.20
 84.200.69.80
 84.200.70.40
 2001:1608:10:25:0:0:1c04:b12f
 2001:1608:10:25:0:0:9249:d69b
 9.9.9.10
 149.112.112.10
 2620:fe::10
 2620:fe::fe:10
 1.1.1.1
 1.0.0.1
 2606:4700:4700::1111
 2606:4700:4700::1001
--- a/new_workflow.sh
+++ b/new_workflow.sh
@ -4,17 +4,19 @@ function log() {
    echo -e "\033[33m$@\033[0m"
 }
-log "Importing rules…"
+log "Preparing database…"
-date +%s > "last_updates/rules.txt"
+./database.py --refresh
-cat rules_adblock/*.txt | grep -v '^!' | grep -v '^\[Adblock' | ./adblock_to_domain_list.py | ./feed_rules.py zone
+
-cat rules_hosts/*.txt | grep -v '^#' | grep -v '^$' | cut -d ' ' -f2 | ./feed_rules.py zone
+log "Compiling rules…"
-cat rules/*.list | grep -v '^#' | grep -v '^$' | ./feed_rules.py zone
+cat rules_adblock/*.txt | grep -v '^!' | grep -v '^\[Adblock' | ./adblock_to_domain_list.py | ./feed_rules.py subdomains
 cat rules_hosts/*.txt | grep -v '^#' | grep -v '^$' | cut -d ' ' -f2 | ./feed_rules.py subdomains
 cat rules/*.list | grep -v '^#' | grep -v '^$' | ./feed_rules.py subdomains
 cat rules_ip/*.txt | grep -v '^#' | grep -v '^$' | ./feed_rules.py ip4network
-cat rules_asn/*.txt | grep -v '^#' | grep -v '^$' | ./feed_rules.py asn
+# NOTE: Ensure first-party sources are last
-
+cat rules/first-party.list | grep -v '^#' | grep -v '^$' | ./feed_rules.py subdomains --first-party
 cat rules/first-party.list | grep -v '^#' | grep -v '^$' | ./feed_rules.py zone --first-party
 cat rules_ip/first-party.txt | grep -v '^#' | grep -v '^$' | ./feed_rules.py ip4network --first-party
 cat rules_asn/first-party.txt | grep -v '^#' | grep -v '^$' | ./feed_rules.py asn --first-party
 ./feed_asn.py
 # log "Reading A records…"
 # pv a.json.gz | gunzip | ./feed_dns.py
 # log "Reading CNAME records…"
 # pv cname.json.gz | gunzip | ./feed_dns.py
--- a/prune.sh
+++ b/prune.sh
@ -1,9 +0,0 @@
 #!/usr/bin/env bash
 function log() {
    echo -e "\033[33m$@\033[0m"
 }
 oldest="$(cat last_updates/*.txt | sort -n | head -1)"
 log "Pruning every record before ${oldest}…"
 ./db.py --prune --prune-before "$oldest"
--- a/regexes.py
+++ b/regexes.py
@ -0,0 +1,21 @@
 #!/usr/bin/env python3
 """
 List of regex matching first-party trackers.
 """
 # Syntax: https://docs.python.org/3/library/re.html#regular-expression-syntax
 REGEXES = [
    r'^.+\.eulerian\.net\.$',  # Eulerian
    r'^.+\.criteo\.com\.$',  # Criteo
    r'^.+\.dnsdelegation\.io\.$',  # Criteo
    r'^.+\.keyade\.com\.$',  # Keyade
    r'^.+\.omtrdc\.net\.$',  # Adobe Experience Cloud
    r'^.+\.bp01\.net\.$',  # NP6
    r'^.+\.ati-host\.net\.$',  # Xiti (AT Internet)
    r'^.+\.at-o\.net\.$',  # Xiti (AT Internet)
    r'^.+\.edgkey\.net\.$',  # Edgekey (Akamai)
    r'^.+\.akaimaiedge\.net\.$',  # Edgekey (Akamai)
    r'^.+\.storetail\.io\.$',  # Storetail (Criteo)
 ]
--- a/requirements.txt
+++ b/requirements.txt
@ -1,4 +0,0 @@
 coloredlogs>=10
 markdown2>=2.4<3
 numpy>=1.21<2
 python-abp>=0.2<0.3
--- a/resolve_subdomains.py
+++ b/resolve_subdomains.py
@ -0,0 +1,284 @@
 #!/usr/bin/env python3
 """
 From a list of subdomains, output only
 the ones resolving to a first-party tracker.
 """
 import argparse
 import logging
 import os
 import queue
 import sys
 import threading
 import typing
 import csv
 import coloredlogs
 import dns.exception
 import dns.resolver
 import progressbar
 DNS_TIMEOUT = 5.0
 NUMBER_THREADS = 512
 NUMBER_TRIES = 5
 # TODO All the domains don't get treated,
 # so it leaves with 4-5 subdomains not resolved
 glob = None
 class Worker(threading.Thread):
    """
    Worker process for a DNS resolver.
    Will resolve DNS to match first-party subdomains.
    """
    def change_nameserver(self) -> None:
        """
        Assign a this worker another nameserver from the queue.
        """
        server = None
        while server is None:
            try:
                server = self.orchestrator.nameservers_queue.get(block=False)
            except queue.Empty:
                self.orchestrator.refill_nameservers_queue()
        self.log.info("Using nameserver: %s", server)
        self.resolver.nameservers = [server]
    def __init__(self,
                 orchestrator: 'Orchestrator',
                 index: int = 0):
        super(Worker, self).__init__()
        self.log = logging.getLogger(f'worker{index:03d}')
        self.orchestrator = orchestrator
        self.resolver = dns.resolver.Resolver()
        self.change_nameserver()
    def resolve_subdomain(self, subdomain: str) -> typing.Optional[
        typing.List[
            str
        ]
    ]:
        """
        Returns the resolution chain of the subdomain to an A record,
        including any intermediary CNAME.
        The last element is an IP address.
        Returns None if the nameserver was unable to satisfy the request.
        Returns [] if the requests points to nothing.
        """
        self.log.debug("Querying %s", subdomain)
        try:
            query = self.resolver.query(subdomain, 'A', lifetime=DNS_TIMEOUT)
        except dns.resolver.NXDOMAIN:
            return []
        except dns.resolver.NoAnswer:
            return []
        except dns.resolver.YXDOMAIN:
            self.log.warning("Query name too long for %s", subdomain)
            return None
        except dns.resolver.NoNameservers:
            # NOTE Most of the time this error message means that the domain
            # does not exists, but sometimes it means the that the server
            # itself is broken. So we count on the retry logic.
            self.log.warning("All nameservers broken for %s", subdomain)
            return None
        except dns.exception.Timeout:
            # NOTE Same as above
            self.log.warning("Timeout for %s", subdomain)
            return None
        except dns.name.EmptyLabel:
            self.log.warning("Empty label for %s", subdomain)
            return None
        resolved = list()
        last = len(query.response.answer) - 1
        for a, answer in enumerate(query.response.answer):
            if answer.rdtype == dns.rdatatype.CNAME:
                assert a < last
                resolved.append(answer.items[0].to_text()[:-1])
            elif answer.rdtype == dns.rdatatype.A:
                assert a == last
                resolved.append(answer.items[0].address)
            else:
                assert False
        return resolved
    def run(self) -> None:
        self.log.info("Started")
        subdomain: str
        for subdomain in iter(self.orchestrator.subdomains_queue.get, None):
            for _ in range(NUMBER_TRIES):
                resolved = self.resolve_subdomain(subdomain)
                # Retry with another nameserver if error
                if resolved is None:
                    self.change_nameserver()
                else:
                    break
            # If it wasn't found after multiple tries
            if resolved is None:
                self.log.error("Gave up on %s", subdomain)
                resolved = []
            resolved.insert(0, subdomain)
            assert isinstance(resolved, list)
            self.orchestrator.results_queue.put(resolved)
        self.orchestrator.results_queue.put(None)
        self.log.info("Stopped")
 class Orchestrator():
    """
    Orchestrator of the different Worker threads.
    """
    def refill_nameservers_queue(self) -> None:
        """
        Re-fill the given nameservers into the nameservers queue.
        Done every-time the queue is empty, making it
        basically looping and infinite.
        """
        # Might be in a race condition but that's probably fine
        for nameserver in self.nameservers:
            self.nameservers_queue.put(nameserver)
        self.log.info("Refilled nameserver queue")
    def __init__(self, subdomains: typing.Iterable[str],
                 nameservers: typing.List[str] = None,
                 ):
        self.log = logging.getLogger('orchestrator')
        self.subdomains = subdomains
        # Use interal resolver by default
        self.nameservers = nameservers or dns.resolver.Resolver().nameservers
        self.subdomains_queue: queue.Queue = queue.Queue(
            maxsize=NUMBER_THREADS)
        self.results_queue: queue.Queue = queue.Queue()
        self.nameservers_queue: queue.Queue = queue.Queue()
        self.refill_nameservers_queue()
    def fill_subdomain_queue(self) -> None:
        """
        Read the subdomains in input and put them into the queue.
        Done in a thread so we can both:
        - yield the results as they come
        - not store all the subdomains at once
        """
        self.log.info("Started reading subdomains")
        # Send data to workers
        for subdomain in self.subdomains:
            self.subdomains_queue.put(subdomain)
        self.log.info("Finished reading subdomains")
        # Send sentinel to each worker
        # sentinel = None ~= EOF
        for _ in range(NUMBER_THREADS):
            self.subdomains_queue.put(None)
    def run(self) -> typing.Iterable[typing.List[str]]:
        """
        Yield the results.
        """
        # Create workers
        self.log.info("Creating workers")
        for i in range(NUMBER_THREADS):
            Worker(self, i).start()
        fill_thread = threading.Thread(target=self.fill_subdomain_queue)
        fill_thread.start()
        # Wait for one sentinel per worker
        # In the meantime output results
        for _ in range(NUMBER_THREADS):
            result: typing.List[str]
            for result in iter(self.results_queue.get, None):
                yield result
        self.log.info("Waiting for reader thread")
        fill_thread.join()
        self.log.info("Done!")
 def main() -> None:
    """
    Main function when used directly.
    Read the subdomains provided and output it,
    the last CNAME resolved and the IP adress it resolves to.
    Takes as an input a filename (or nothing, for stdin),
    and as an output a filename (or nothing, for stdout).
    The input must be a subdomain per line, the output is a comma-sep
    file with the columns source CNAME and A.
    Use the file `nameservers` as the list of nameservers
    to use, or else it will use the system defaults.
    Also shows a nice progressbar.
    """
    # Initialization
    coloredlogs.install(
        level='DEBUG',
        fmt='%(asctime)s %(name)s %(levelname)s %(message)s'
    )
    # Parsing arguments
    parser = argparse.ArgumentParser(
        description="Massively resolves subdomains and store them in a file.")
    parser.add_argument(
        '-i', '--input', type=argparse.FileType('r'), default=sys.stdin,
        help="Input file with one subdomain per line")
    parser.add_argument(
        '-o', '--output', type=argparse.FileType('w'), default=sys.stdout,
        help="Outptut file with DNS chains")
    # parser.add_argument(
    #     '-n', '--nameserver', type=argparse.FileType('r'),
    #     default='nameservers', help="File with one nameserver per line")
    # parser.add_argument(
    #     '-j', '--workers', type=int, default=512,
    #     help="Number of threads to use")
    args = parser.parse_args()
    # Progress bar
    widgets = [
        progressbar.Percentage(),
        ' ', progressbar.SimpleProgress(),
        ' ', progressbar.Bar(),
        ' ', progressbar.Timer(),
        ' ', progressbar.AdaptiveTransferSpeed(unit='req'),
        ' ', progressbar.AdaptiveETA(),
    ]
    progress = progressbar.ProgressBar(widgets=widgets)
    if args.input.seekable():
        progress.max_value = len(args.input.readlines())
        args.input.seek(0)
    # Cleaning input
    iterator = iter(args.input)
    iterator = map(str.strip, iterator)
    iterator = filter(None, iterator)
    # Reading nameservers
    servers: typing.List[str] = list()
    if os.path.isfile('nameservers'):
        servers = open('nameservers').readlines()
        servers = list(filter(None, map(str.strip, servers)))
    writer = csv.writer(args.output)
    progress.start()
    global glob
    glob = Orchestrator(iterator, servers)
    for resolved in glob.run():
        progress.update(progress.value + 1)
        writer.writerow(resolved)
    progress.finish()
 if __name__ == '__main__':
    main()
--- a/resolve_subdomains.sh
+++ b/resolve_subdomains.sh
@ -1,24 +1,14 @@
 #!/usr/bin/env bash
 source .env.default
 source .env
 function log() {
    echo -e "\033[33m$@\033[0m"
 }
-log "Compiling nameservers…"
+# Resolve the CNAME chain of all the known subdomains for later analysis
-pv -f nameservers/*.list | ./validate_list.py --ip4 | sort -u > temp/all_nameservers_ip4.list
+log "Compiling subdomain lists..."
-
+pv subdomains/*.list | sort -u > temp/all_subdomains.list
 log "Compiling subdomains…"
 # Sort by last character to utilize the DNS server caching mechanism
-# (not as efficient with massdns but it's almost free so why not)
+pv temp/all_subdomains.list | rev | sort | rev > temp/all_subdomains_reversort.list
-pv -f subdomains/*.list | ./validate_list.py --domain | rev | sort -u | rev > temp/all_subdomains.list
+./resolve_subdomains.py --input temp/all_subdomains_reversort.list --output temp/all_resolved.csv
 sort -u temp/all_resolved.csv > temp/all_resolved_sorted.csv
 log "Resolving subdomain…"
 date +%s > "last_updates/massdns.txt"
 "$MASSDNS_BINARY" --output Snrql --hashmap-size "$MASSDNS_HASHMAP_SIZE" --resolvers temp/all_nameservers_ip4.list --outfile temp/all_resolved.txt temp/all_subdomains.list
 log "Importing into database…"
 [ $SINGLE_PROCESS -eq 1 ] && EXTRA_ARGS="--single-process"
 pv -f temp/all_resolved.txt | ./feed_dns.py massdns --ip4-cache "$CACHE_SIZE" $EXTRA_ARGS
--- a/rules/first-party.list
+++ b/rules/first-party.list
@ -12,80 +12,13 @@ storetail.io
 # Keyade
 keyade.com
 # Adobe Experience Cloud
 # https://experienceleague.adobe.com/docs/analytics/implementation/vars/config-vars/trackingserversecure.html?lang=en#ssl-tracking-server-in-adobe-experience-platform-launch
 omtrdc.net
 2o7.net
-data.adobedc.net
+# ThreatMetrix
-sc.adobedc.net
+online-metrix.net
 # Webtrekk
 wt-eu02.net
 webtrekk.net
 # Otto Group
 oghub.io
-# Intent Media
+# ???
 partner.intentmedia.net
 # Wizaly
 wizaly.com
 # Commanders Act
 tagcommander.com
 # Ingenious Technologies
 affex.org
 # TraceDock
 a351fec2c318c11ea9b9b0a0ae18fb0b-1529426863.eu-central-1.elb.amazonaws.com
 a5e652663674a11e997c60ac8a4ec150-1684524385.eu-central-1.elb.amazonaws.com
 a88045584548111e997c60ac8a4ec150-1610510072.eu-central-1.elb.amazonaws.com
 afc4d9aa2a91d11e997c60ac8a4ec150-2082092489.eu-central-1.elb.amazonaws.com
 # A8
 trck.a8.net
 # AD EBiS
 # https://prtimes.jp/main/html/rd/p/000000215.000009812.html
 ebis.ne.jp
 # GENIEE
 genieesspv.jp
 # SP-Prod
 sp-prod.net
 # Act-On Software
 actonsoftware.com
 actonservice.com
 # eum-appdynamics.com
 eum-appdynamics.com
 # Extole
 extole.io
 extole.com
 # Eloqua
 hs.eloqua.com
 # segment.com
 xid.segment.com
 # exponea.com
 exponea.com
 # adclear.net
 adclear.net
 # contentsfeed.com
 contentsfeed.com
 # postaffiliatepro.com
 postaffiliatepro.com
 # Sugar Market (Salesfusion)
 msgapp.com
 # Exactag
 exactag.com
 # GMO Internet Group
 ad-cloud.jp
 # Pardot
 pardot.com
 # Fathom
 # https://usefathom.com/docs/settings/custom-domains
 starman.fathomdns.com
 # Lead Forensics
 # https://www.reddit.com/r/pihole/comments/g7qv3e/leadforensics_tracking_domains_blacklist/
 # No real-world data but the website doesn't hide what it does
 ghochv3eng.trafficmanager.net
 # Branch.io
 thirdparty.bnc.lt
 # Plausible.io
 custom.plausible.io
 # DataUnlocker
 # Bit different as it is a proxy to non first-party trackers scripts
 # but it fits I guess.
 smartproxy.dataunlocker.com
 # SAS
 ci360.sas.com
--- a/rules_asn/.gitignore
+++ b/rules_asn/.gitignore
@ -1,2 +0,0 @@
 *.custom.txt
 *.cache.txt
--- a/rules_asn/first-party.txt
+++ b/rules_asn/first-party.txt
@ -1,10 +0,0 @@
 # Eulerian
 AS50234
 # Criteo
 AS44788
 AS19750
 AS55569
 # Webtrekk
 AS60164
 # Act-On Software
 AS393648
--- a/rules_ip/first-party.txt
+++ b/rules_ip/first-party.txt
@ -0,0 +1,51 @@
 # Eulerian (AS50234 EULERIAN TECHNOLOGIES S.A.S.)
 109.232.192.0/21
 # Criteo (AS44788 Criteo SA)
 91.199.242.0/24
 91.212.98.0/24
 178.250.0.0/21
 178.250.0.0/24
 178.250.1.0/24
 178.250.2.0/24
 178.250.3.0/24
 178.250.4.0/24
 178.250.6.0/24
 185.235.84.0/24
 # Criteo (AS19750 Criteo Corp.)
 74.119.116.0/22
 74.119.117.0/24
 74.119.118.0/24
 74.119.119.0/24
 91.199.242.0/24
 185.235.85.0/24
 199.204.168.0/22
 199.204.168.0/24
 199.204.169.0/24
 199.204.170.0/24
 199.204.171.0/24
 178.250.0.0/21
 91.212.98.0/24
 91.199.242.0/24
 185.235.84.0/24
 # Criteo (AS55569 Criteo APAC)
 91.199.242.0/24
 116.213.20.0/22
 116.213.20.0/24
 116.213.21.0/24
 182.161.72.0/22
 182.161.72.0/24
 182.161.73.0/24
 185.235.86.0/24
 185.235.87.0/24
 # ThreatMetrix (AS30286 ThreatMetrix Inc.)
 69.84.176.0/24
 173.254.179.0/24
 185.32.240.0/23
 185.32.242.0/23
 192.225.156.0/22
 199.101.156.0/23
 199.101.158.0/23
 # Webtrekk (AS60164 Webtrekk GmbH)
 185.54.148.0/22
 185.54.150.0/24
 185.54.151.0/24
--- a/run_tests.py
+++ b/run_tests.py
@ -1,75 +0,0 @@
 #!/usr/bin/env python3
 import database
 import os
 import logging
 import csv
 TESTS_DIR = "tests"
 if __name__ == "__main__":
    DB = database.Database()
    log = logging.getLogger("tests")
    for filename in os.listdir(TESTS_DIR):
        if not filename.lower().endswith(".csv"):
            continue
        log.info("")
        log.info("Running tests from %s", filename)
        path = os.path.join(TESTS_DIR, filename)
        with open(path, "rt") as fdesc:
            count_ent = 0
            count_all = 0
            count_den = 0
            pass_ent = 0
            pass_all = 0
            pass_den = 0
            reader = csv.DictReader(fdesc)
            for test in reader:
                log.debug("Testing %s (%s)", test["url"], test["comment"])
                count_ent += 1
                passed = True
                for allow in test["allow"].split(":"):
                    if not allow:
                        continue
                    count_all += 1
                    if any(DB.get_domain(allow)):
                        log.error("False positive: %s", allow)
                        passed = False
                    else:
                        pass_all += 1
                for deny in test["deny"].split(":"):
                    if not deny:
                        continue
                    count_den += 1
                    if not any(DB.get_domain(deny)):
                        log.error("False negative: %s", deny)
                        passed = False
                    else:
                        pass_den += 1
                if passed:
                    pass_ent += 1
            perc_ent = (100 * pass_ent / count_ent) if count_ent else 100
            perc_all = (100 * pass_all / count_all) if count_all else 100
            perc_den = (100 * pass_den / count_den) if count_den else 100
            log.info(
                (
                    "%s: Entries %d/%d (%.2f%%)"
                    " | Allow %d/%d (%.2f%%)"
                    "| Deny %d/%d (%.2f%%)"
                ),
                filename,
                pass_ent,
                count_ent,
                perc_ent,
                pass_all,
                count_all,
                perc_all,
                pass_den,
                count_den,
                perc_den,
            )
--- a/tests/.gitignore
+++ b/tests/.gitignore
@ -1 +0,0 @@
 *.cache.csv
--- a/tests/false-positives.csv
+++ b/tests/false-positives.csv
@ -1,6 +1,6 @@
-url,allow,deny,comment
+url,white,black,comment
 https://support.apple.com,support.apple.com,,EdgeKey / AkamaiEdge
 https://www.pinterest.fr/,i.pinimg.com,,Cedexis
 https://www.pinterest.fr/,i.pinimg.com,,Cedexis
 https://www.tumblr.com/,66.media.tumblr.com,,ChiCDN
 https://www.skype.com/fr/,www.skype.com,,TrafficManager
 https://www.mitsubishicars.com/,www.mitsubishicars.com,,Tracking domain as reverse DNS
--- a/tests/first-party.csv
+++ b/tests/first-party.csv
@ -1,28 +1,7 @@
-url,allow,deny,comment
+url,white,black,comment
 https://www.red-by-sfr.fr/,static.s-sfr.fr,nrg.red-by-sfr.fr,Eulerian
 https://www.cbc.ca/,,smetrics.cbc.ca,2o7 | Ominuture | Adobe Experience Cloud
 https://www.discover.com/,,content.discover.com,ThreatMetrix
 https://www.mytoys.de/,,web.mytoys.de,Webtrekk
 https://www.baur.de/,,tp.baur.de,Otto Group
 https://www.liligo.com/,,compare.liligo.com,???
 https://www.boulanger.com/,,tag.boulanger.fr,TagCommander
 https://www.airfrance.fr/FR/,,tk.airfrance.fr,Wizaly
 https://www.vsgamers.es/,,marketing.net.vsgamers.es,Affex
 https://www.vacansoleil.fr/,,tdep.vacansoleil.fr,TraceDock
 https://www.ozmall.co.jp/,,js.enhance.co.jp,GENIEE
 https://www.thetimes.co.uk/,,cmp.thetimes.co.uk,SP-Prod
 https://agilent.com/,,seahorseinfo.agilent.com,Act-On Software
 https://halifax.co.uk/,,cem.halifax.co.uk,eum-appdynamics.com
 https://www.reallygoodstuff.com/,,refer.reallygoodstuff.com,Extole
 https://unity.com/,,eloqua-trackings.unity.com,Eloqua
 https://www.notino.gr/,,api.campaigns.notino.com,Exponea
 https://www.mytoys.de/,,0815.mytoys.de.adclear.net
 https://www.imbc.com/,,ads.imbc.com.contentsfeed.com
 https://www.cbdbiocare.com/,,affiliate.cbdbiocare.com,postaffiliatepro.com
 https://www.seatadvisor.com/,,marketing.seatadvisor.com,Sugar Market (Salesfusion)
 https://www.tchibo.de/,,tagm.tchibo.de,Exactag
 https://www.bouygues-immobilier.com/,,go.bouygues-immobilier.fr,Pardot
 https://caddyserver.com/,,mule.caddysever.com,Fathom
 Reddit.com mail notifications,,click.redditmail.com,Branch.io
 https://www.phpliveregex.com/,,yolo.phpliveregex.xom,Plausible.io
 https://www.earthclassmail.com/,,1avhg3kanx9.www.earthclassmail.com,DataUnlocker
 https://paulfredrick.com/,,execution-ci360.paulfredrick.com,SAS
--- a/validate_list.py
+++ b/validate_list.py
@ -1,35 +0,0 @@
 #!/usr/bin/env python3
 # pylint: disable=C0103
 """
 Filter out invalid domain names
 """
 import database
 import argparse
 import sys
 if __name__ == '__main__':
    # Parsing arguments
    parser = argparse.ArgumentParser(
        description="Filter out invalid domain name/ip addresses from a list.")
    parser.add_argument(
        '-i', '--input', type=argparse.FileType('r'), default=sys.stdin,
        help="Input file, one element per line")
    parser.add_argument(
        '-o', '--output', type=argparse.FileType('w'), default=sys.stdout,
        help="Output file, one element per line")
    parser.add_argument(
        '-d', '--domain', action='store_true',
        help="Can be domain name")
    parser.add_argument(
        '-4', '--ip4', action='store_true',
        help="Can be IP4")
    args = parser.parse_args()
    for line in args.input:
        line = line.rstrip().lower()
        if (args.domain and database.Database.validate_domain(line)) or \
                (args.ip4 and database.Database.validate_ip4address(line)):
            print(line, file=args.output)
		`@ -1,2 +0,0 @@`
			`/* Source: https://github.com/jasonm23/markdown-css-themes */`
			body{font-family:Helvetica,arial,sans-serif;font-size:14px;line-height:1.6;padding-top:10px;padding-bottom:10px;background-color:#fff;padding:30px}body>:first-child{margin-top:0!important}body>:last-child{margin-bottom:0!important}a{color:#4183c4}a.absent{color:#c00}a.anchor{display:block;padding-left:30px;margin-left:-30px;cursor:pointer;position:absolute;top:0;left:0;bottom:0}h1,h2,h3,h4,h5,h6{margin:20px 0 10px;padding:0;font-weight:700;-webkit-font-smoothing:antialiased;cursor:text;position:relative}h1:hover a.anchor,h2:hover a.anchor,h3:hover a.anchor,h4:hover a.anchor,h5:hover a.anchor,h6:hover a.anchor{text-decoration:none}h1 code,h1 tt{font-size:inherit}h2 code,h2 tt{font-size:inherit}h3 code,h3 tt{font-size:inherit}h4 code,h4 tt{font-size:inherit}h5 code,h5 tt{font-size:inherit}h6 code,h6 tt{font-size:inherit}h1{font-size:28px;color:#000}h2{font-size:24px;border-bottom:1px solid #ccc;color:#000}h3{font-size:18px}h4{font-size:16px}h5{font-size:14px}h6{color:#777;font-size:14px}blockquote,dl,li,ol,p,pre,table,ul{margin:15px 0}hr{border:0 none;color:#ccc;height:4px;padding:0}body>h2:first-child{margin-top:0;padding-top:0}body>h1:first-child{margin-top:0;padding-top:0}body>h1:first-child+h2{margin-top:0;padding-top:0}body>h3:first-child,body>h4:first-child,body>h5:first-child,body>h6:first-child{margin-top:0;padding-top:0}a:first-child h1,a:first-child h2,a:first-child h3,a:first-child h4,a:first-child h5,a:first-child h6{margin-top:0;padding-top:0}h1 p,h2 p,h3 p,h4 p,h5 p,h6 p{margin-top:0}li p.first{display:inline-block}li{margin:0}ol,ul{padding-left:30px}ol :first-child,ul :first-child{margin-top:0}dl{padding:0}dl dt{font-size:14px;font-weight:700;font-style:italic;padding:0;margin:15px 0 5px}dl dt:first-child{padding:0}dl dt>:first-child{margin-top:0}dl dt>:last-child{margin-bottom:0}dl dd{margin:0 0 15px;padding:0 15px}dl dd>:first-child{margin-top:0}dl dd>:last-child{margin-bottom:0}blockquote{border-left:4px solid #ddd;padding:0 15px;color:#777}blockquote>:first-child{margin-top:0}blockquote>:last-child{margin-bottom:0}table{padding:0;border-collapse:collapse}table tr{border-top:1px solid #ccc;background-color:#fff;margin:0;padding:0}table tr:nth-child(2n){background-color:#f8f8f8}table tr th{font-weight:700;border:1px solid #ccc;margin:0;padding:6px 13px}table tr td{border:1px solid #ccc;margin:0;padding:6px 13px}table tr td :first-child,table tr th :first-child{margin-top:0}table tr td :last-child,table tr th :last-child{margin-bottom:0}img{max-width:100%}span.frame{display:block;overflow:hidden}span.frame>span{border:1px solid #ddd;display:block;float:left;overflow:hidden;margin:13px 0 0;padding:7px;width:auto}span.frame span img{display:block;float:left}span.frame span span{clear:both;color:#333;display:block;padding:5px 0 0}span.align-center{display:block;overflow:hidden;clear:both}span.align-center>span{display:block;overflow:hidden;margin:13px auto 0;text-align:center}span.align-center span img{margin:0 auto;text-align:center}span.align-right{display:block;overflow:hidden;clear:both}span.align-right>span{display:block;overflow:hidden;margin:13px 0 0;text-align:right}span.align-right span img{margin:0;text-align:right}span.float-left{display:block;margin-right:13px;overflow:hidden;float:left}span.float-left span{margin:13px 0 0}span.float-right{display:block;margin-left:13px;overflow:hidden;float:right}span.float-right>span{display:block;overflow:hidden;margin:13px auto 0;text-align:right}code,tt{margin:0 2px;padding:0 5px;white-space:nowrap;border:1px solid #eaeaea;background-color:#f8f8f8;border-radius:3px}pre code{margin:0;padding:0;white-space:pre;border:none;background:0 0}.highlight pre{background-color:#f8f8f8;border:1px solid #ccc;font-size:13px;line-height:19px;overflow:auto;padding:6px 10px;border-radius:3px}pre{background-color:#f8f8f8;border:1px solid #ccc;font-size:13px;line-height:19px;overflow:auto;padding:6px 10px;border-radius:3px}pre code,pre tt{background-color:transparent;border:none}sup{font-size:.83em;vertical-align:super;line-height:0}*{-webkit-print-color-adjust:exact}@media screen and (min-width:914px){body{width:854px;margin:0 auto}}@media print{pre,table{page-break-inside:avoid}pre{word-wrap:break-word}}