Geoffrey Frogeye
a0e68f0848
Reworked match and node system
...
For level, and first_party later
Next: add get_match to retrieve level of source and have correct levels
... am I going somewhere with all this?
2019-12-15 23:13:25 +01:00
Geoffrey Frogeye
aec8d3f8de
Reworked how paths work
...
Get those tuples out of my eyes
2019-12-15 22:21:05 +01:00
Geoffrey Frogeye
7af2074c7a
Small optimisation of feed_switch
2019-12-15 17:12:44 +01:00
Geoffrey Frogeye
45325782d2
Multi-processed parser
2019-12-15 17:05:41 +01:00
Geoffrey Frogeye
ce52897d30
Smol fixes
2019-12-15 16:48:17 +01:00
Geoffrey Frogeye
954b33b2a6
Slightly better Rapid7 parser
2019-12-15 16:38:01 +01:00
Geoffrey Frogeye
d976752797
Store Ip4Path as int instead of List[int]
2019-12-15 16:26:18 +01:00
Geoffrey Frogeye
4d966371b2
Workflow: SQL -> Tree
...
Welp. All that for this.
2019-12-15 15:56:26 +01:00
Geoffrey Frogeye
040ce4c14e
Typo in source
2019-12-15 01:52:45 +01:00
Geoffrey Frogeye
b50c01f740
Merge branch 'master' into newworkflow
2019-12-15 01:30:03 +01:00
Geoffrey Frogeye
ddceed3d25
Workflow: Can now import DnsMass output
...
Well, in a specific format but DnsMass nonetheless
2019-12-15 00:28:08 +01:00
Geoffrey Frogeye
189deeb559
Workflow: Multiprocess
...
Still trying.
It's better than multithread though.
Merge branch 'newworkflow' into newworkflow_threaded
2019-12-14 17:27:46 +01:00
Geoffrey Frogeye
d7c239a6f6
Workflow: Some modifications
2019-12-14 16:04:19 +01:00
Geoffrey Frogeye
5023b85d7c
Added intermediate representation for DNS datasets
...
It's just CSV.
The DNS from the datasets are not ordered consistently,
so we need to parse it completly.
It seems that converting to an IR before sending data to ./feed_dns.py
through a pipe is faster than decoding the JSON in ./feed_dns.py.
This will also reduce the storage of the resolved subdomains by
about 15% (compressed).
2019-12-13 21:59:35 +01:00
Geoffrey Frogeye
269b8278b5
Worflow: Fixed rules counts
2019-12-13 18:36:08 +01:00
Geoffrey Frogeye
ab7ef609dd
Workflow: Various optimisations and fixes
...
I forgot to close this one earlier, so:
Closes #7
2019-12-13 18:08:22 +01:00
Geoffrey Frogeye
f3eedcba22
Updated now based on timestamp
...
Did I forget to add feed_asn.py a few commits ago?
Oh well...
2019-12-13 13:54:00 +01:00
Geoffrey Frogeye
8d94b80fd0
Integrated DNS resolving to workflow
...
Since the bigger datasets are only updated once a month,
this might help for quick updates.
2019-12-13 13:38:23 +01:00
Geoffrey Frogeye
231bb83667
Threaded feed_dns
...
Largely disapointing
2019-12-13 12:36:11 +01:00
Geoffrey Frogeye
9050a84670
Read-only mode
2019-12-13 12:35:05 +01:00
Geoffrey Frogeye
e19f666331
Workflow: Automatically import IP ranges from ASN
...
Closes #9
2019-12-13 08:23:38 +01:00
Geoffrey Frogeye
57416b6e2c
Workflow: POO and individual tables per types
...
Mostly for performances reasons.
First one to implement threading later.
Second one to speed up the dichotomy,
but it doesn't seem that much better so far.
2019-12-13 00:11:21 +01:00
Geoffrey Frogeye
b076fa6c34
Typo in new source URL
2019-12-12 23:28:00 +01:00
Geoffrey Frogeye
12dcafe606
Added alternate source of Eulerian CNAMES
...
It was requested so.
It should be temporary, once I have a bigger subdomain list
that shouldn't be required.
2019-12-12 19:13:54 +01:00
Geoffrey Frogeye
1484733a90
Workflow: Small tweaks
2019-12-09 18:21:08 +01:00
Geoffrey Frogeye
55877be891
IP parsing C accelerated, use bytes everywhere
2019-12-09 09:47:48 +01:00
Geoffrey Frogeye
7937496882
Workflow: Base for new one
...
While I'm automating this you'll need to download the A set from
https://opendata.rapid7.com/sonar.fdns_v2/ to the file a.json.gz.
2019-12-09 08:12:48 +01:00
Geoffrey Frogeye
62e6c9005b
Tracker: intendmedia?
2019-12-08 01:32:49 +01:00
Geoffrey Frogeye
dc44dea505
Optimized IP matching
2019-12-08 01:23:36 +01:00
Geoffrey Frogeye
b634ae5bbd
Updated IP ranges for Criteo
2019-12-07 23:23:39 +01:00
Geoffrey Frogeye
16f8bed887
Tracker: Otto Group
2019-12-07 21:30:15 +01:00
Geoffrey Frogeye
d6df0fd4f9
Tracker: Webtrekk
2019-12-07 21:21:33 +01:00
Geoffrey Frogeye
4dd3d4a64b
Preliminary structure for testing
...
In preparation of #4
2019-12-07 19:19:37 +01:00
Geoffrey Frogeye
ae71d6b204
Tracker: 2o7
2019-12-07 19:17:18 +01:00
Geoffrey Frogeye
2b0a723c30
Fix log in scripts
...
Closes #8
2019-12-07 18:45:48 +01:00
Geoffrey Frogeye
0b2eb000c3
FP: ThreatMetrix
2019-12-07 18:23:11 +01:00
Geoffrey Frogeye
cbb0cc6f3b
Rules lists are optional
2019-12-07 18:22:20 +01:00
Geoffrey Frogeye
a5e768fe00
Filtering by IP range
...
Closes #5
2019-12-07 13:56:04 +01:00
Geoffrey Frogeye
28e33dcc7a
Fixed description generation
2019-12-05 20:51:53 +01:00
Geoffrey Frogeye
95d4535abd
Nitpicking
2019-12-05 19:38:26 +01:00
Geoffrey Frogeye
025370bbbe
Splitted list with curated and not curated
...
Closes #2
2019-12-05 19:15:24 +01:00
Geoffrey Frogeye
1c20963ffd
Removed third-parties from easyprivacy
2019-12-05 01:19:10 +01:00
Geoffrey Frogeye
188a8f7455
Removed another source of false-positives
2019-12-05 00:50:32 +01:00
Geoffrey Frogeye
f2bab3ca3f
Added contact information
2019-12-03 21:45:29 +01:00
Geoffrey Frogeye
08f25e26ba
Removed false-positive source
...
Also had edgekey.net for blocking.
Thanks @TorchedPoseidon for the report!
2019-12-03 21:27:37 +01:00
Geoffrey Frogeye
8c744d621e
Removed too restrictive source
...
Was blocking ssl.ovh.net and akaimi.net
2019-12-03 18:43:23 +01:00
Geoffrey Frogeye
fe5f0c6c05
Added more rule sources
2019-12-03 17:33:46 +01:00
Geoffrey Frogeye
0159c6037c
Improved DNS resolving performances
...
Also various fixes.
Also some debug stuff, make sure to remove that later.
2019-12-03 15:35:21 +01:00
Geoffrey Frogeye
c609b90390
Append top 1M subdomains rather than replacing it
2019-12-03 09:04:19 +01:00
Geoffrey Frogeye
69b82d29fd
Improved rules handling
...
Rules can now come in 3 different formats:
- AdBlock rules
- Host lists
- Domains lists
All will be converted into domain lists and aggregated
(only AdBlock rules matching a whole domain will be kept).
Subdomains will now be matched if it is a subdomain of any domain of the
rule.
It is way faster (seconds rather than hours!) but less flexible
(although it shouldn't be a problem).
2019-12-03 08:48:12 +01:00