38cf532854
Updated README
...
Split in two actually (program and list).
Closes #3
Also,
Closes #1
Because I forgot to do it earlier.
2019-12-20 17:15:39 +01:00
53b14c6ffa
Removed TODO placeholders in commands description
...
It's better than nothing but not by that much
2019-12-19 08:07:01 +01:00
c81be4825c
Automated tests
...
Very rudimentary but should do the trick
Closes #4
2019-12-18 22:46:00 +01:00
4a22054796
Added optional cache for faster IP matching
2019-12-18 21:40:24 +01:00
06b745890c
Added other first-party trackers
2019-12-18 17:03:05 +01:00
aca5023c3f
Fixed scripting around
2019-12-18 13:01:32 +01:00
dce35cb299
Harder verficiation before adding entries to DB
2019-12-17 19:53:05 +01:00
747fe46ad0
Script to automatically download from Rapid7 datasets
2019-12-17 15:04:19 +01:00
b43cb1725c
Autosave
...
Not needed but since the import may take multiple hour I get frustrated
if this gets interrupted for some reason.
2019-12-17 15:02:42 +01:00
e882e09b37
Added outdated documentation warning in README
2019-12-17 14:27:43 +01:00
d65107f849
Save dupplicates too
...
Maybe I won't publish them but this will help me for tracking trackers.
2019-12-17 14:10:41 +01:00
ea0855bd00
Forgot to push this little guy
...
Good thing I cleaned up my working directory.
It only exists because pickles created from database.py itself
won't be openable from a file simply importing databse.py.
So we create it when in 'imported state'.
2019-12-17 13:50:39 +01:00
7851b038f5
Reworked rule export
2019-12-17 13:30:24 +01:00
8f6e01c857
Added first_party tracking
...
Well, tracking if a rule is from a first or a multi rule...
Hope I did not do any mistake
2019-12-16 19:09:02 +01:00
c3bf102289
Made references work
2019-12-16 14:18:03 +01:00
03a4042238
Added level
...
Also fixed IP logic because this was real messed up
2019-12-16 09:31:29 +01:00
3197fa1663
Remove list usage for IpTreeNode
2019-12-16 06:54:18 +01:00
a0e68f0848
Reworked match and node system
...
For level, and first_party later
Next: add get_match to retrieve level of source and have correct levels
... am I going somewhere with all this?
2019-12-15 23:13:25 +01:00
aec8d3f8de
Reworked how paths work
...
Get those tuples out of my eyes
2019-12-15 22:21:05 +01:00
7af2074c7a
Small optimisation of feed_switch
2019-12-15 17:12:44 +01:00
45325782d2
Multi-processed parser
2019-12-15 17:05:41 +01:00
ce52897d30
Smol fixes
2019-12-15 16:48:17 +01:00
954b33b2a6
Slightly better Rapid7 parser
2019-12-15 16:38:01 +01:00
d976752797
Store Ip4Path as int instead of List[int]
2019-12-15 16:26:18 +01:00
4d966371b2
Workflow: SQL -> Tree
...
Welp. All that for this.
2019-12-15 15:56:26 +01:00
040ce4c14e
Typo in source
2019-12-15 01:52:45 +01:00
b50c01f740
Merge branch 'master' into newworkflow
2019-12-15 01:30:03 +01:00
ddceed3d25
Workflow: Can now import DnsMass output
...
Well, in a specific format but DnsMass nonetheless
2019-12-15 00:28:08 +01:00
189deeb559
Workflow: Multiprocess
...
Still trying.
It's better than multithread though.
Merge branch 'newworkflow' into newworkflow_threaded
2019-12-14 17:27:46 +01:00
d7c239a6f6
Workflow: Some modifications
2019-12-14 16:04:19 +01:00
5023b85d7c
Added intermediate representation for DNS datasets
...
It's just CSV.
The DNS from the datasets are not ordered consistently,
so we need to parse it completly.
It seems that converting to an IR before sending data to ./feed_dns.py
through a pipe is faster than decoding the JSON in ./feed_dns.py.
This will also reduce the storage of the resolved subdomains by
about 15% (compressed).
2019-12-13 21:59:35 +01:00
269b8278b5
Worflow: Fixed rules counts
2019-12-13 18:36:08 +01:00
ab7ef609dd
Workflow: Various optimisations and fixes
...
I forgot to close this one earlier, so:
Closes #7
2019-12-13 18:08:22 +01:00
f3eedcba22
Updated now based on timestamp
...
Did I forget to add feed_asn.py a few commits ago?
Oh well...
2019-12-13 13:54:00 +01:00
8d94b80fd0
Integrated DNS resolving to workflow
...
Since the bigger datasets are only updated once a month,
this might help for quick updates.
2019-12-13 13:38:23 +01:00
231bb83667
Threaded feed_dns
...
Largely disapointing
2019-12-13 12:36:11 +01:00
9050a84670
Read-only mode
2019-12-13 12:35:05 +01:00
e19f666331
Workflow: Automatically import IP ranges from ASN
...
Closes #9
2019-12-13 08:23:38 +01:00
57416b6e2c
Workflow: POO and individual tables per types
...
Mostly for performances reasons.
First one to implement threading later.
Second one to speed up the dichotomy,
but it doesn't seem that much better so far.
2019-12-13 00:11:21 +01:00
12dcafe606
Added alternate source of Eulerian CNAMES
...
It was requested so.
It should be temporary, once I have a bigger subdomain list
that shouldn't be required.
2019-12-12 19:13:54 +01:00
1484733a90
Workflow: Small tweaks
2019-12-09 18:21:08 +01:00
55877be891
IP parsing C accelerated, use bytes everywhere
2019-12-09 09:47:48 +01:00
7937496882
Workflow: Base for new one
...
While I'm automating this you'll need to download the A set from
https://opendata.rapid7.com/sonar.fdns_v2/ to the file a.json.gz.
2019-12-09 08:12:48 +01:00
62e6c9005b
Tracker: intendmedia?
2019-12-08 01:32:49 +01:00
dc44dea505
Optimized IP matching
2019-12-08 01:23:36 +01:00
b634ae5bbd
Updated IP ranges for Criteo
2019-12-07 23:23:39 +01:00
16f8bed887
Tracker: Otto Group
2019-12-07 21:30:15 +01:00
d6df0fd4f9
Tracker: Webtrekk
2019-12-07 21:21:33 +01:00
4dd3d4a64b
Preliminary structure for testing
...
In preparation of #4
2019-12-07 19:19:37 +01:00
ae71d6b204
Tracker: 2o7
2019-12-07 19:17:18 +01:00