eulaurarien

Author	SHA1	Message	Date
Geoffrey Frogeye	5023b85d7c	Added intermediate representation for DNS datasets It's just CSV. The DNS from the datasets are not ordered consistently, so we need to parse it completly. It seems that converting to an IR before sending data to ./feed_dns.py through a pipe is faster than decoding the JSON in ./feed_dns.py. This will also reduce the storage of the resolved subdomains by about 15% (compressed).	2019-12-13 21:59:35 +01:00
Geoffrey Frogeye	ab7ef609dd	Workflow: Various optimisations and fixes I forgot to close this one earlier, so: Closes #7	2019-12-13 18:08:22 +01:00
Geoffrey Frogeye	8d94b80fd0	Integrated DNS resolving to workflow Since the bigger datasets are only updated once a month, this might help for quick updates.	2019-12-13 13:38:23 +01:00
Geoffrey Frogeye	0159c6037c	Improved DNS resolving performances Also various fixes. Also some debug stuff, make sure to remove that later.	2019-12-03 15:35:21 +01:00
Geoffrey Frogeye	c23004fbff	Separated DNS resolution from filtering This effectively removes the parallelism of filtering, which doubles the processing time (5->8 hours), but this allows me to toy around with the performances of this step, which I aim to improve drastically.	2019-12-02 19:03:08 +01:00