Commit Graph

83 Commits (newworkflow_packbefore)

Author SHA1 Message Date
Geoffrey Frogeye dcf39c9582
Put packing in parsing thread
Why did I think this would be a good idea?
- value don't need to be packed most of the time, but we don't know that
early
- packed domain (it's one most of the time) is way larger than its
unpacked counterpart
2019-12-16 10:38:37 +01:00
Geoffrey Frogeye 03a4042238
Added level
Also fixed IP logic because this was real messed up
2019-12-16 09:31:29 +01:00
Geoffrey Frogeye 3197fa1663
Remove list usage for IpTreeNode 2019-12-16 06:54:18 +01:00
Geoffrey Frogeye a0e68f0848
Reworked match and node system
For level, and first_party later
Next: add get_match to retrieve level of source and have correct levels

... am I going somewhere with all this?
2019-12-15 23:13:25 +01:00
Geoffrey Frogeye aec8d3f8de
Reworked how paths work
Get those tuples out of my eyes
2019-12-15 22:21:05 +01:00
Geoffrey Frogeye 7af2074c7a
Small optimisation of feed_switch 2019-12-15 17:12:44 +01:00
Geoffrey Frogeye 45325782d2
Multi-processed parser 2019-12-15 17:05:41 +01:00
Geoffrey Frogeye ce52897d30
Smol fixes 2019-12-15 16:48:17 +01:00
Geoffrey Frogeye 954b33b2a6
Slightly better Rapid7 parser 2019-12-15 16:38:01 +01:00
Geoffrey Frogeye d976752797
Store Ip4Path as int instead of List[int] 2019-12-15 16:26:18 +01:00
Geoffrey Frogeye 4d966371b2
Workflow: SQL -> Tree
Welp. All that for this.
2019-12-15 15:56:26 +01:00
Geoffrey Frogeye 040ce4c14e
Typo in source 2019-12-15 01:52:45 +01:00
Geoffrey Frogeye b50c01f740 Merge branch 'master' into newworkflow 2019-12-15 01:30:03 +01:00
Geoffrey Frogeye ddceed3d25
Workflow: Can now import DnsMass output
Well, in a specific format but DnsMass nonetheless
2019-12-15 00:28:08 +01:00
Geoffrey Frogeye 189deeb559
Workflow: Multiprocess
Still trying.
It's better than multithread though.

Merge branch 'newworkflow' into newworkflow_threaded
2019-12-14 17:27:46 +01:00
Geoffrey Frogeye d7c239a6f6 Workflow: Some modifications 2019-12-14 16:04:19 +01:00
Geoffrey Frogeye 5023b85d7c
Added intermediate representation for DNS datasets
It's just CSV.
The DNS from the datasets are not ordered consistently,
so we need to parse it completly.
It seems that converting to an IR before sending data to ./feed_dns.py
through a pipe is faster than decoding the JSON in ./feed_dns.py.
This will also reduce the storage of the resolved subdomains by
about 15% (compressed).
2019-12-13 21:59:35 +01:00
Geoffrey Frogeye 269b8278b5
Worflow: Fixed rules counts 2019-12-13 18:36:08 +01:00
Geoffrey Frogeye ab7ef609dd
Workflow: Various optimisations and fixes
I forgot to close this one earlier, so:
Closes #7
2019-12-13 18:08:22 +01:00
Geoffrey Frogeye f3eedcba22
Updated now based on timestamp
Did I forget to add feed_asn.py a few commits ago?
Oh well...
2019-12-13 13:54:00 +01:00
Geoffrey Frogeye 8d94b80fd0
Integrated DNS resolving to workflow
Since the bigger datasets are only updated once a month,
this might help for quick updates.
2019-12-13 13:38:23 +01:00
Geoffrey Frogeye 231bb83667
Threaded feed_dns
Largely disapointing
2019-12-13 12:36:11 +01:00
Geoffrey Frogeye 9050a84670
Read-only mode 2019-12-13 12:35:05 +01:00
Geoffrey Frogeye e19f666331
Workflow: Automatically import IP ranges from ASN
Closes #9
2019-12-13 08:23:38 +01:00
Geoffrey Frogeye 57416b6e2c
Workflow: POO and individual tables per types
Mostly for performances reasons.
First one to implement threading later.
Second one to speed up the dichotomy,
but it doesn't seem that much better so far.
2019-12-13 00:11:21 +01:00
Geoffrey Frogeye 12dcafe606
Added alternate source of Eulerian CNAMES
It was requested so.
It should be temporary, once I have a bigger subdomain list
that shouldn't be required.
2019-12-12 19:13:54 +01:00
Geoffrey Frogeye 1484733a90 Workflow: Small tweaks 2019-12-09 18:21:08 +01:00
Geoffrey Frogeye 55877be891
IP parsing C accelerated, use bytes everywhere 2019-12-09 09:47:48 +01:00
Geoffrey Frogeye 7937496882
Workflow: Base for new one
While I'm automating this you'll need to download the A set from
https://opendata.rapid7.com/sonar.fdns_v2/ to the file a.json.gz.
2019-12-09 08:12:48 +01:00
Geoffrey Frogeye 62e6c9005b
Tracker: intendmedia? 2019-12-08 01:32:49 +01:00
Geoffrey Frogeye dc44dea505
Optimized IP matching 2019-12-08 01:23:36 +01:00
Geoffrey Frogeye b634ae5bbd
Updated IP ranges for Criteo 2019-12-07 23:23:39 +01:00
Geoffrey Frogeye 16f8bed887
Tracker: Otto Group 2019-12-07 21:30:15 +01:00
Geoffrey Frogeye d6df0fd4f9
Tracker: Webtrekk 2019-12-07 21:21:33 +01:00
Geoffrey Frogeye 4dd3d4a64b
Preliminary structure for testing
In preparation of #4
2019-12-07 19:19:37 +01:00
Geoffrey Frogeye ae71d6b204 Tracker: 2o7 2019-12-07 19:17:18 +01:00
Geoffrey Frogeye 2b0a723c30
Fix log in scripts
Closes #8
2019-12-07 18:45:48 +01:00
Geoffrey Frogeye 0b2eb000c3
FP: ThreatMetrix 2019-12-07 18:23:11 +01:00
Geoffrey Frogeye cbb0cc6f3b Rules lists are optional 2019-12-07 18:22:20 +01:00
Geoffrey Frogeye a5e768fe00
Filtering by IP range
Closes #5
2019-12-07 13:56:04 +01:00
Geoffrey Frogeye 28e33dcc7a
Fixed description generation 2019-12-05 20:51:53 +01:00
Geoffrey Frogeye 95d4535abd
Nitpicking 2019-12-05 19:38:26 +01:00
Geoffrey Frogeye 025370bbbe
Splitted list with curated and not curated
Closes #2
2019-12-05 19:15:24 +01:00
Geoffrey Frogeye 1c20963ffd
Removed third-parties from easyprivacy 2019-12-05 01:19:10 +01:00
Geoffrey Frogeye 188a8f7455
Removed another source of false-positives 2019-12-05 00:50:32 +01:00
Geoffrey Frogeye f2bab3ca3f Added contact information 2019-12-03 21:45:29 +01:00
Geoffrey Frogeye 08f25e26ba
Removed false-positive source
Also had edgekey.net for blocking.

Thanks @TorchedPoseidon for the report!
2019-12-03 21:27:37 +01:00
Geoffrey Frogeye 8c744d621e
Removed too restrictive source
Was blocking ssl.ovh.net and akaimi.net
2019-12-03 18:43:23 +01:00
Geoffrey Frogeye fe5f0c6c05
Added more rule sources 2019-12-03 17:33:46 +01:00
Geoffrey Frogeye 0159c6037c
Improved DNS resolving performances
Also various fixes.
Also some debug stuff, make sure to remove that later.
2019-12-03 15:35:21 +01:00