i using https://github.com/alitouka/spark_dbscan, , determine parameters, using utility class supply, org.alitouka.spark.dbscan.exploratoryanalysis.distancetonearestneighbordriver.
i on 10 node cluster 1 machine 8 cores , 32g of memory , 9 machines 6 cores , 16g of memory.
i have 442m of data, seems joke, job stalls @ last stage.
it stuck in scheduler delay 10 hours overnight, , have tried number of things last couple days, nothing seems helping.
i have tried:
- increasing heap sizes , numbers of cores
- more/less executors different amounts of resources.
- kyro serialization
- fair scheduling
the spark version 1.4.1
the logs full of standard fair, nothing exception or interesting [info] lines.
here script using: https://gist.github.com/isaacsanders/660f480810fbc07d4df2
hadoop is: hdp 2.3.2.0-2950
here gist (pastebin) of versions en masse , stacktrace: https://gist.github.com/isaacsanders/2e59131758469097651b
Comments
Post a Comment