|
9 months ago | |
---|---|---|
src/main | 10 months ago | |
.gitignore | 10 months ago | |
README.md | 9 months ago | |
genVectors.py | 10 months ago | |
plotClassification.py | 10 months ago | |
pom.xml | 10 months ago |
Project for the course: Middleware Technologies for Distributed Systems.
You need Java ≥ 8 and Maven ≥ 3.1.
mvn package
You need Python 3.
./genVectors.py $DIMENSION $NUMBER > $FILE
(example: ./genVectors.py 2 1000 > input.csv
)
You need a running Apache Flink cluster
Input data is a point per line, in the folowing format: xCoords,yCoords
.
Output data is a point per line, in the folowing format: xCoords,yCoords,clusterIndex
.
flink run -p $NBWORKERS target/project-*.jar --input $INPUT --output $OUTPUT [--k $K] [--maxIterations $ITERATIONS]
(example: flink run -p 4 target/project-1.0.jar --input $PWD/input.csv --output $PWD/output.csv --k 5
)
You need Python 3, NumPy, Matplotlib.
./plotClassification.py $FILE
(example: ./plotClassification.py output.csv
)