Commit graph

167 commits

Author SHA1 Message Date
Geoffrey Frogeye 0b9e2d0975
Validate also lower the case of domains 2019-12-25 15:31:20 +01:00
Geoffrey Frogeye 2bcf6cbbf7
Added SINGLE_PROCESS environment variable 2019-12-25 15:15:49 +01:00
Geoffrey Frogeye b310ca2fc2
Clever pruning mechanism 2019-12-25 14:54:57 +01:00
Geoffrey Frogeye bb9e6de62f
Profiling is now optional 2019-12-25 13:52:19 +01:00
Geoffrey Frogeye c543e0eab6
Make multi-processing optional for feed_dns 2019-12-25 13:04:15 +01:00
Geoffrey Frogeye 195f41bd9f
Use smaller cache if it cannot allocate 2019-12-25 13:03:55 +01:00
Geoffrey Frogeye 0e7479e23e
Added handling for IPs too big 2019-12-25 12:35:06 +01:00
Geoffrey Frogeye 9f343ed296
Removed debug print 2019-12-24 15:12:38 +01:00
Geoffrey Frogeye c65ae94892
Added ability to use Rapid7 API
Closes #11
2019-12-24 15:08:18 +01:00
Geoffrey Frogeye 7d1c1a1d54
Implement pruning 2019-12-21 19:38:20 +01:00
Geoffrey Frogeye 1a6e64da3d
Forgot numpy dependency 2019-12-20 21:08:21 +01:00
Geoffrey Frogeye d66040a7b6
Added some litterature
Well not really litterature in the scientific term but still something
to read
2019-12-20 18:22:15 +01:00
Geoffrey Frogeye 57e2919f25
Added information about CORS security issue 2019-12-20 17:58:53 +01:00
Geoffrey Frogeye 94acd106da
Acknwoledgments
Gesundheit
2019-12-20 17:46:24 +01:00
Geoffrey Frogeye 885d92dd77
Added LICENSE 2019-12-20 17:38:26 +01:00
Geoffrey Frogeye 8b7e538677
Updated links
(could not bother guessing them)
2019-12-20 17:24:05 +01:00
Geoffrey Frogeye cd46b39756
Merge branch 'newworkflow' 2019-12-20 17:18:42 +01:00
Geoffrey Frogeye 38cf532854
Updated README
Split in two actually (program and list).

Closes #3

Also,
Closes #1
Because I forgot to do it earlier.
2019-12-20 17:15:39 +01:00
Geoffrey Frogeye 53b14c6ffa
Removed TODO placeholders in commands description
It's better than nothing but not by that much
2019-12-19 08:07:01 +01:00
Geoffrey Frogeye c81be4825c
Automated tests
Very rudimentary but should do the trick

Closes #4
2019-12-18 22:46:00 +01:00
Geoffrey Frogeye 4a22054796
Added optional cache for faster IP matching 2019-12-18 21:40:24 +01:00
Geoffrey Frogeye 06b745890c
Added other first-party trackers 2019-12-18 17:03:05 +01:00
Geoffrey Frogeye aca5023c3f
Fixed scripting around 2019-12-18 13:01:32 +01:00
Geoffrey Frogeye dce35cb299
Harder verficiation before adding entries to DB 2019-12-17 19:53:05 +01:00
Geoffrey Frogeye 747fe46ad0
Script to automatically download from Rapid7 datasets 2019-12-17 15:04:19 +01:00
Geoffrey Frogeye b43cb1725c
Autosave
Not needed but since the import may take multiple hour I get frustrated
if this gets interrupted for some reason.
2019-12-17 15:02:42 +01:00
Geoffrey Frogeye f5c60c482a Merge branch 'master' of git.frogeye.fr:geoffrey/eulaurarien 2019-12-17 14:28:38 +01:00
Geoffrey Frogeye 12ecfa1a5d Added outdated documentation warning in README 2019-12-17 14:28:23 +01:00
Geoffrey Frogeye e882e09b37
Added outdated documentation warning in README 2019-12-17 14:27:43 +01:00
Geoffrey Frogeye d65107f849
Save dupplicates too
Maybe I won't publish them but this will help me for tracking trackers.
2019-12-17 14:10:41 +01:00
Geoffrey Frogeye ea0855bd00
Forgot to push this little guy
Good thing I cleaned up my working directory.
It only exists because pickles created from database.py itself
won't be openable from a file simply importing databse.py.
So we create it when in 'imported state'.
2019-12-17 13:50:39 +01:00
Geoffrey Frogeye 7851b038f5
Reworked rule export 2019-12-17 13:30:24 +01:00
Geoffrey Frogeye 8f6e01c857
Added first_party tracking
Well, tracking if a rule is from a first or a multi rule...
Hope I did not do any mistake
2019-12-16 19:09:02 +01:00
Geoffrey Frogeye c3bf102289
Made references work 2019-12-16 14:18:03 +01:00
Geoffrey Frogeye 03a4042238
Added level
Also fixed IP logic because this was real messed up
2019-12-16 09:31:29 +01:00
Geoffrey Frogeye 3197fa1663
Remove list usage for IpTreeNode 2019-12-16 06:54:18 +01:00
Geoffrey Frogeye a0e68f0848
Reworked match and node system
For level, and first_party later
Next: add get_match to retrieve level of source and have correct levels

... am I going somewhere with all this?
2019-12-15 23:13:25 +01:00
Geoffrey Frogeye aec8d3f8de
Reworked how paths work
Get those tuples out of my eyes
2019-12-15 22:21:05 +01:00
Geoffrey Frogeye 7af2074c7a
Small optimisation of feed_switch 2019-12-15 17:12:44 +01:00
Geoffrey Frogeye 45325782d2
Multi-processed parser 2019-12-15 17:05:41 +01:00
Geoffrey Frogeye ce52897d30
Smol fixes 2019-12-15 16:48:17 +01:00
Geoffrey Frogeye 954b33b2a6
Slightly better Rapid7 parser 2019-12-15 16:38:01 +01:00
Geoffrey Frogeye d976752797
Store Ip4Path as int instead of List[int] 2019-12-15 16:26:18 +01:00
Geoffrey Frogeye 4d966371b2
Workflow: SQL -> Tree
Welp. All that for this.
2019-12-15 15:56:26 +01:00
Geoffrey Frogeye 040ce4c14e
Typo in source 2019-12-15 01:52:45 +01:00
Geoffrey Frogeye b50c01f740 Merge branch 'master' into newworkflow 2019-12-15 01:30:03 +01:00
Geoffrey Frogeye ddceed3d25
Workflow: Can now import DnsMass output
Well, in a specific format but DnsMass nonetheless
2019-12-15 00:28:08 +01:00
Geoffrey Frogeye 189deeb559
Workflow: Multiprocess
Still trying.
It's better than multithread though.

Merge branch 'newworkflow' into newworkflow_threaded
2019-12-14 17:27:46 +01:00
Geoffrey Frogeye d7c239a6f6 Workflow: Some modifications 2019-12-14 16:04:19 +01:00
Geoffrey Frogeye 5023b85d7c
Added intermediate representation for DNS datasets
It's just CSV.
The DNS from the datasets are not ordered consistently,
so we need to parse it completly.
It seems that converting to an IR before sending data to ./feed_dns.py
through a pipe is faster than decoding the JSON in ./feed_dns.py.
This will also reduce the storage of the resolved subdomains by
about 15% (compressed).
2019-12-13 21:59:35 +01:00