hadoop - What to do with a Spark Job that hangs on the last task? -


i using https://github.com/alitouka/spark_dbscan, , determine parameters, using utility class supply, org.alitouka.spark.dbscan.exploratoryanalysis.distancetonearestneighbordriver.

i on 10 node cluster 1 machine 8 cores , 32g of memory , 9 machines 6 cores , 16g of memory.

i have 442m of data, seems joke, job stalls @ last stage.

it stuck in scheduler delay 10 hours overnight, , have tried number of things last couple days, nothing seems helping.

i have tried:

  • increasing heap sizes , numbers of cores
  • more/less executors different amounts of resources.
  • kyro serialization
  • fair scheduling

the spark version 1.4.1

the logs full of standard fair, nothing exception or interesting [info] lines.

here script using: https://gist.github.com/isaacsanders/660f480810fbc07d4df2

hadoop is: hdp 2.3.2.0-2950

here gist (pastebin) of versions en masse , stacktrace: https://gist.github.com/isaacsanders/2e59131758469097651b

https://github.com/alitouka/spark_dbscan/blob/master/src/src/main/scala/org/alitouka/spark/dbscan/exploratoryanalysis/distancetonearestneighbordriver.scala


Comments