Hang Detection Policy

A common error in J2EE applications is a hung thread. A hung thread can result from a simple software defect (such as an infinite loop) or a more complex cause (for example, a resource deadlock). System resources, such as CPU time, might be consumed by this hung transaction when threads run unbounded code paths, such as when the code is running in an infinite loop. Alternately, a system can become unresponsive even though all resources are idle, as in a deadlock scenario. Unless an end user or a monitoring tool reports the problem, the system may remain in this degraded state indefinitely.

Using the hang detection policy, you can specify a time that is too long for unit of work to complete, the thread monitor will monitor all the managed threads and check if any of the thread is running for more than threashold value if yes it will write a message in System.Out to let you know. The hang detection policy is on by default

Important Note: The hang detection policy only monitors managed threads such as web container threads or object request broker (ORB) threads(used for executing EJB). Unmanaged threads, which are created by the application are not monitored.

You can configure the hang detection policy and set following values.


  • com.ibm.websphere.threadmonitor.threshold: The length of time (in seconds) in which a thread can be active before it is considered hung. Any thread that is detected as active for longer than this length of time is reported as hung. The default value is 10minutes or 600 seconds

  • com.ibm.websphere.threadmonitor.interval: The frequency (in seconds) at which managed threads in the selected application server will be interrogated. Default value is 180 seconds or 3 minutes

  • com.ibm.websphere.threadmonitor.false.alarm.threshold: The number of times (T) that false alarms can occur before automatically increasing the threshold. It is possible that a thread that is reported as hung eventually completes its work, resulting in a false alarm. A large number of these events indicates that the threshhold value is too small. The hang detection facility can automatically respond to this situation: For every T false alarms, the threshold T is increased by a factor of 1.5. Set the value to zero (or less) to disable the automatic adjustment. Default value is 100

  • com.ibm.websphere.threadmonitor.dump.java: Set to true to cause a javacore to be created when a hung thread is detected and a WSVR0605W message is printed. The threads section of the javacore can be analyzed to determine what the reported thread and other related threads are doing.

No comments: