java - Openfire server crashes randomly -


deployment: 3 openfire instances in cluster using hazelcast clustering plugin 2.1.2. have 2 custom plugins doing work in addition clustering plugin.

connections(bosh , tcp): no of connections: around 10,000 @ peak load time.

nature of problem: seeing random crashes. manifests @ random times. few observations around time number of threads of openfire process shoots 30,000 in few seconds. java stack trace tells these threads blocked on operation. stack trace pasted reference here.

stack trace observed is

"taskengine-pool-64675" #68993 daemon prio=5 os_prio=0 tid=0x00007f125502e800 nid=0x54e7 waiting monitor entry [0x00007f1312156000] java.lang.thread.state: blocked (on object monitor) @ java.util.collections$synchronizedcollection.add(collections.java:2035) - waiting lock <0x00000004e58dfd00> (a java.util.collections$synchronizedrandomaccesslist) @ org.jivesoftware.openfire.http.httpsession.deliver(httpsession.java:1004) @ org.jivesoftware.openfire.http.httpsession.deliver(httpsession.java:970) @ org.jivesoftware.openfire.session.localsession.process(localsession.java:289) @ org.jivesoftware.openfire.spi.routingtableimpl.routetobarejid(routingtableimpl.java:633) @ org.jivesoftware.openfire.spi.routingtableimpl.routetolocaldomain(routingtableimpl.java:303) @ org.jivesoftware.openfire.spi.routingtableimpl.routepacket(routingtableimpl.java:239) @ org.jivesoftware.openfire.net.socketpacketwritehandler.process(socketpacketwritehandler.java:68) @ org.jivesoftware.openfire.spi.packetdelivererimpl.deliver(packetdelivererimpl.java:56) @ org.jivesoftware.openfire.http.httpsession$4.run(httpsession.java:1083) @ java.util.concurrent.executors$runnableadapter.call(executors.java:511) @ java.util.concurrent.futuretask.run(futuretask.java:266) @ java.util.concurrent.threadpoolexecutor.runworker(threadpoolexecutor.java:1142) @ java.util.concurrent.threadpoolexecutor$worker.run(threadpoolexecutor.java:617) @ java.lang.thread.run(thread.java:745)

the above jstack output while thread count has reached high value 30,000 , server unresponsive. pointers? trace seems indicate data structure cant accessed.


Comments