Wednesday 6 July 2016

To kill a thread or not to kill a thread… tis the WLS question

I was looking into a stuck thread issue at a client.  WLS console was reporting everything as “healthy”, but there was a problem.  See the screen shots below, indicating that not only was there some excessive CPU usage, there was also some crazy network traffic.  Lucky this is not metered (hopefully it was intra VPC).

clip_image001

This was on Linux, so I was able to use top to find the PID that is stuck, generally this is java for WLS

Use ps to find the thread that is spinning it’s wheels, you’ll generally get that from the time on the CPU, see below – there are two troublesome threads.

>ps -eT |grep 1577 |more

1577  2983 ?        00:00:00 java

1577  2984 ?        1-04:12:25 java

1577  2985 ?        00:01:15 java

1577  2986 ?        00:00:50 java

1577  2987 ?        00:07:41 java

1577  2988 ?        1-04:12:50 java

1577  2989 ?        00:01:10 java

Note that the part in yellow is the time on the CPU – there is a problem with those two threads.  Can I work out anything else about these threads?

Also can you kill a WLS thread and the JVM survives?  To be honest without detailed development knowledge of the program and the type of thread that is spinning, this is not recommended behaviour.  Also, I think that you might find it hard to kill (or a command to kill it).

You can also use the following command to see how complicated the thread structure is:

pstree –p 1577

java(1577)-+-{java}(1578)

           |-{java}(1579)

           |-{java}(1580)

           |-{java}(1581)

           |-{java}(1582)

           |-{java}(1583)

           |-{java}(1584)

           |-{java}(1585)

           |-{java}(1586)

           |-{java}(1587)

           |-{java}(1588)

           |-{java}(1589)

           |-{java}(1590)

Now to the WLS console to see if I can find out anymore:

clip_image002

So WLS thinks things are fine, but it also thinks that there are only 27 execute threads.

But, if you do a “Dump Thread stacks”, you get the attached:  A 90Kb file with loads of threads in wait and sleep, but two that are not!

I believe that I’ve locked the two internal JVM threads that are each sitting on a single CPU (Hence consuming 50% of the 4 core machine).

Worker Thread - OWVirtual_Thread_Group4 Assigned To com.jdedwards.runtime.virtual.OWVirtual@16e99e46" RUNNABLE

java.lang.Object.hashCode(Native Method)

                        java.util.Hashtable.hash(Hashtable.java:239)

                        java.util.Hashtable.get(Hashtable.java:434)

                        com.jdedwards.jas.BaseServlet.unregisterServletInfo(Unknown Source)

                        com.jdedwards.jas.ServletProxy.callServerComponent(Unknown Source)

                        com.jdedwards.jas.ServletProxy.servletProxy(Unknown Source)

                        com.jdedwards.jas.ServletProxy.dispatch(Unknown Source)

                        com.jdedwards.jas.net.NetConnection.callJAS(Unknown Source)

                        com.jdedwards.jas.net.NetManager.processRequest(Unknown Source)

                        com.jdedwards.jas.net.NetManager.callJAS(Unknown Source)

                        com.jdedwards.jas.net.NetManager.callJAS(Unknown Source)

                        com.jdedwards.jas.net.NetManager.nextpage(Unknown Source)

                        com.jdedwards.runtime.model.BusinessView.fetchNextPage(Unknown Source)

                        com.jdedwards.runtime.vtcomponent.form.VTFetchManager.rtFetchOnePage(Unknown Source)

                        com.jdedwards.runtime.vtcomponent.form.VTFetchManager.runtimeFetchStartPage(Unknown Source)

                        com.jdedwards.runtime.vtcomponent.form.VTFetchManager.runtimeFetch(Unknown Source)

                        com.jdedwards.runtime.engine.form.FindBrowseEngine.fetchDetailData(Unknown Source)

                        com.jdedwards.runtime.engine.form.FindBrowseEngine.fetchDetailData(Unknown Source)

                        com.jdedwards.runtime.engine.form.FindBrowseEngine.processFind(Unknown Source)

                        com.jdedwards.runtime.vtcomponent.form.VTForm.processFind(Unknown Source)

                        com.jdedwards.runtime.engine.form.FindBrowseEngine.initializeForm(Unknown Source)

                        com.jdedwards.runtime.vtcomponent.form.VTForm.windowDidShow(Unknown Source)

                        com.jdedwards.runtime.virtual.OWVirtual.showWindow(Unknown Source)

                        com.jdedwards.runtime.virtual.OWVirtual.showForm(Unknown Source)

                        com.jdedwards.runtime.er.OWShortcut.launchShortcut(Unknown Source)

                        com.jdedwards.runtime.virtual.SystemEventProcessor.processVTSystemEvent(Unknown Source)

                        com.jdedwards.runtime.virtual.OWVirtual.processVTEvent(Unknown Source)

                        com.jdedwards.runtime.virtual.OWVirtual.processEventLoop(Unknown Source)

                        com.jdedwards.runtime.virtual.OWVirtual.run(Unknown Source)

                        com.jdedwards.base.util.ThreadPool$WorkerThread.run(Unknown Source)

AND

" Worker Thread - OWVirtual_Thread_Group0 Assigned To com.jdedwards.runtime.virtual.OWVirtual@4396116a" RUNNABLE

com.jdedwards.jas.BaseServlet.unregisterServletInfo(Unknown Source)

                        com.jdedwards.jas.ServletProxy.callServerComponent(Unknown Source)

                        com.jdedwards.jas.ServletProxy.servletProxy(Unknown Source)

                        com.jdedwards.jas.ServletProxy.dispatch(Unknown Source)

                        com.jdedwards.jas.net.NetConnection.callJAS(Unknown Source)

                        com.jdedwards.jas.net.NetManager.processRequest(Unknown Source)

                        com.jdedwards.jas.net.NetManager.callJAS(Unknown Source)

                        com.jdedwards.jas.net.NetManager.callJAS(Unknown Source)

                        com.jdedwards.jas.net.NetManager.nextpage(Unknown Source)

                        com.jdedwards.runtime.model.BusinessView.fetchNextPage(Unknown Source)

                        com.jdedwards.runtime.vtcomponent.form.VTFetchManager.rtFetchOnePage(Unknown Source)

                        com.jdedwards.runtime.vtcomponent.form.VTFetchManager.runtimeFetchStartPage(Unknown Source)

                        com.jdedwards.runtime.vtcomponent.form.VTFetchManager.runtimeFetch(Unknown Source)

                        com.jdedwards.runtime.engine.form.FindBrowseEngine.fetchDetailData(Unknown Source)

                        com.jdedwards.runtime.engine.form.FindBrowseEngine.fetchDetailData(Unknown Source)

                        com.jdedwards.runtime.engine.form.FindBrowseEngine.processFind(Unknown Source)

                        com.jdedwards.runtime.vtcomponent.form.VTForm.processFind(Unknown Source)

                        com.jdedwards.runtime.engine.form.FindBrowseEngine.initializeForm(Unknown Source)

                        com.jdedwards.runtime.vtcomponent.form.VTForm.windowDidShow(Unknown Source)

                        com.jdedwards.runtime.virtual.OWVirtual.showWindow(Unknown Source)

                        com.jdedwards.runtime.virtual.OWVirtual.showForm(Unknown Source)

                        com.jdedwards.runtime.er.OWShortcut.launchShortcut(Unknown Source)

                        com.jdedwards.runtime.virtual.SystemEventProcessor.processVTSystemEvent(Unknown Source)

                        com.jdedwards.runtime.virtual.OWVirtual.processVTEvent(Unknown Source)

                        com.jdedwards.runtime.virtual.OWVirtual.processEventLoop(Unknown Source)

                        com.jdedwards.runtime.virtual.OWVirtual.run(Unknown Source)

                        com.jdedwards.base.util.ThreadPool$WorkerThread.run(Unknown Source)

So, you read these stacks upwards, they seem to be worker threads for the findbrowse engine, both have got some sort of problem.  I’d guess that the second one might be causing the network traffic, but it’s all dead now.  I used the AWS ELB to take a server out and restart the JVM when all of the sessions left, then added back to the ELB.

What did I learn – nothing really…  But if I see it again I can start to build a pattern of the problem and perhaps find a solution

2 comments:

Brandon K said...

We are struggling with the same issue on Apps 9.1 / Tools 9.1.5.4 / WebLogic 12.1.2

This is occurring on a nearly daily basis. Sometimes it occurs more than once, leading to 50% cpu usage on a 4-core server, just as you have experienced.

Do you have any updates or insights as to what may be the cause of this problem?

Shannon Moir said...

Hi there, there are a couple of known issues with "Stuck Threads" https://support.oracle.com/epmos/faces/DocContentDisplay?_afrLoop=405150906427657&id=1616726.1&_afrWindowMode=0&_adf.ctrl-state=fdh20j5l2_165 . It'll happen all of the time unless you've patched your WLS. search MOS for "stuck threads"