Monday 20 July 2009

Pop quiz

Web users are getting heaps of:

“An exception has been caught by the Web client. Please contact your system administrator. See log for details. Exception id =Exception_2009-07-20 13:18:20.465_WARRENG   There was a problem with the sever while running the business function F06116EndDoc.\nThe server may still be available, but because of state information, the entire unit of work must be resubmitted.\nPlease exit the application and restart it.\nPost Button Clicked Line number 0\nP051121_W051121C com.jdedwards.runtime.base.SystemException: There was a problem with the sever while running the business function F06116EndDoc.\nThe server may still be available, but because of state information, the entire unit of work must be resubmitted.\nPlease exit the application and restart it.\nPost Button Clicked Line number 0\nP051121_W051121C”

Application server is saying nothing…

What is it going to be…  Obvious, timeouts…  Remember that a call object kernel is not going to complain that it takes too long to perform an action, it’ll keep trying…  Meanwhile, the web will have given up a long time ago.

Note that if there are not ERRORs causing the timeout, the log server logs will reveal nothing!

In the above situation, enabling logging revealed the following:

Jul 20 13:30:00.521003 - 1008/6140 WRK:JDE_08090008_P051121              SELECT  *  FROM JDE_PROD.PRODCTL.F0002  (UPDLOCK)  WHERE  ( NNSY = '06' )  ORDER BY NNSY ASC FOR UPDATE OF NNN001, NNN002, NNN003, NNN004, NNN005, NNN006, NNN007, NNN008, NNN009, NNN010
Jul 20 13:30:00.521004 - 1008/6140 WRK:JDE_08090008_P051121              Entering DBPerformRequest
Jul 20 13:30:00.521005 - 1008/6140 WRK:JDE_08090008_P051121              ODBC:S DBPerformRequest req=08FC3B40 con=07F0A5E8 env=00A614E8 dbc=00A74640 spid=718 JDEDATA A (JDE@Control Tables - Prod)
Jul 20 13:30:00.521006 - 1008/6140 WRK:JDE_08090008_P051121              Exiting DBPerformRequest
Jul 20 13:30:14.897000 - 1008/4600 SYS:Dispatch                          About to call dispatch function, flags=0x0, Type=916
Jul 20 13:30:14.897001 - 1008/4600 SYS:Dispatch                          ADD TO POOL type: 916, 172.19.1.65_5796_m283302
Jul 20 13:30:14.897002 - 1008/4600 SYS:Dispatch                          PSTHREAD_TPJ: Entering psthread_pool_job_createEx()
Jul 20 13:30:14.897003 - 1008/4600 SYS:Dispatch                          PSTHREAD_TPJ: psthread_pool_job_createEx(): Exiting
Jul 20 13:30:14.897004 - 1008/4600 SYS:Dispatch                          PSTHREAD_TPL: Entering: psthread_pool_add_job()
Jul 20 13:30:14.897005 - 1008/4600 SYS:Dispatch                          PSTHREAD_TPL: threads: total (5) idle (3) waiting (3) jobs: queued (0)
Jul 20 13:30:14.897006 - 1008/4600 SYS:Dispatch                          PSTHREAD_BLQ: ps_blocking_queue_enqueue(WRK:Queue): Entering
Jul 20 13:30:14.897007 - 1008/4600 SYS:Dispatch                          PSTHREAD_BLQ: ps_blocking_queue_enqueue(WRK:Queue): Exiting
Jul 20 13:30:14.897008 - 1008/4600 SYS:Dispatch                          PSTHREAD_TPL: psthread_pool_add_job(): Exiting
Jul 20 13:30:14.897009 - 1008/5956 WRK:Idle Worker                       PSTHREAD_BLQ: ps_blocking_queue_dequeue(WRK:Queue): Exiting
Jul 20 13:30:14.897010 - 1008/5956 WRK:Active Worker                     START SYSTEM JOB type 916, 172.19.1.65_5796_m283302
Jul 20 13:30:14.897011 - 1008/5956 WRK:Get User List                     END SYSTEM JOB   type 916, 172.19.1.65_5796_m283302
Jul 20 13:30:14.897012 - 1008/5956 WRK:Idle Worker                       PSTHREAD_BLQ: ps_blocking_queue_dequeue(WRK:Queue): Entering
Jul 20 13:30:29.899000 - 1008/4600 SYS:Dispatch                          ADD TO POOL type: 10, 0.0.0.0_0_m0
Jul 20 13:30:29.899001 - 1008/4600 SYS:Dispatch                          PSTHREAD_TPJ: Entering psthread_pool_job_createEx()
Jul 20 13:30:29.899002 - 1008/4600 SYS:Dispatch                          PSTHREAD_TPJ: psthread_pool_job_createEx(): Exiting
Jul 20 13:30:29.899003 - 1008/4600 SYS:Dispatch                          PSTHREAD_TPL: Entering: psthread_pool_add_job()
Jul 20 13:30:29.899004 - 1008/4600 SYS:Dispatch                          PSTHREAD_TPL: threads: total (5) idle (3) waiting (3) jobs: queued (0)
Jul 20 13:30:29.899005 - 1008/4600 SYS:Dispatch                          PSTHREAD_BLQ: ps_blocking_queue_enqueue(WRK:Queue): Entering
Jul 20 13:30:29.899006 - 1008/4600 SYS:Dispatch                          PSTHREAD_BLQ: ps_blocking_queue_enqueue(WRK:Queue): Exiting
Jul 20 13:30:29.899007 - 1008/4600 SYS:Dispatch                          PSTHREAD_TPL: psthread_pool_add_job(): Exiting
Jul 20 13:30:29.899008 - 1008/3068 WRK:Idle Worker                       PSTHREAD_BLQ: ps_blocking_queue_dequeue(WRK:Queue): Exiting
Jul 20 13:30:29.899009 - 1008/3068 WRK:Active Worker                     START SYSTEM JOB type 10, 0.0.0.0_0_m0
Jul 20 13:30:29.899010 - 1008/3068 WRK:eNetKernelIdle                    END SYSTEM JOB   type 10, 0.0.0.0_0_m0
Jul 20 13:30:29.899011 - 1008/3068 WRK:Idle Worker                       PSTHREAD_BLQ: ps_blocking_queue_dequeue(WRK:Queue): Entering
Jul 20 13:30:35.977000 - 1008/4600 SYS:Dispatch                          About to call dispatch function, flags=0x0, Type=916
Jul 20 13:30:35.977001 - 1008/4600 SYS:Dispatch                          ADD TO POOL type: 916, 172.19.1.65_5796_m283571
Jul 20 13:30:35.977002 - 1008/4600 SYS:Dispatch                          PSTHREAD_TPJ: Entering psthread_pool_job_createEx()
Jul 20 13:30:35.977003 - 1008/4600 SYS:Dispatch                          PSTHREAD_TPJ: psthread_pool_job_createEx(): Exiting
Jul 20 13:30:35.977004 - 1008/4600 SYS:Dispatch                          PSTHREAD_TPL: Entering: psthread_pool_add_job()
Jul 20 13:30:35.977005 - 1008/4600 SYS:Dispatch                          PSTHREAD_TPL: threads: total (5) idle (3) waiting (3) jobs: queued (0)
Jul 20 13:30:35.977006 - 1008/4600 SYS:Dispatch                          PSTHREAD_BLQ: ps_blocking_queue_enqueue(WRK:Queue): Entering

Note that there is a bunch of timeouts for the SQL operation…  Hmm, me thinks that when there is a problem with the F002 – it’s locking.   A hunting we will go.

Run the following queries to find the locks:  Note that the database is SQLServer 2005

sp_lock ;
--Gives a very terse list of locks.

--This was the winning query below:

SELECT tl.request_session_id, wt.blocking_session_id, DB_NAME(tl.resource_database_id) AS DatabaseName, tl.resource_type, tl.request_mode, tl.resource_associated_entity_id
FROM sys.dm_tran_locks as tl
INNER JOIN sys.dm_os_waiting_tasks as wt
ON tl.lock_owner_address = wt.resource_address;
GO

-- Query for specific lock types
SELECT resource_type, request_session_id, resource_database_id, resource_associated_entity_id, resource_subtype, resource_description, request_status, request_owner_type, request_mode
FROM sys.dm_tran_locks
WHERE resource_type IN ('PAGE', 'KEY', 'EXTENT', 'RID');
GO

I then opened the instance activity monitor and saw the blocked process and killed it!

Job done!

2 comments:

Unknown said...

Hey Shannon,

Most of the times I've resolved this one by removing out the zombies on the enterprise server.

But I never knew it was bcoz of the SQL Locks on the SQL Server.

Thanks a lot mate for sharing this info.

-Irfan

Unknown said...

u r genius mate...
sheer genius

saw the same error for a client but the client is on Oracle

checked the locks on the tables using TOAD, released the locks and boom.....the error is gone

u r da man muh friend

-Irfan