Thursday 13 December 2018

JDE scheduler problems

Who loves seeing logs like this for their scheduler kernel?

108/1168     Tue Dec 11 21:49:02.125002        jdbodbc.C7611
       ODB0000164 - STMT:00 [08S01][10054][2] [Microsoft][SQL Server Native Client 11.0]TCP Provider: An existing connection was forcibly closed by the remote host.
108/1168     Tue Dec 11 21:49:02.125003        jdbodbc.C7611
       ODB0000164 - STMT:01 [08S01][10054][2] [Microsoft][SQL Server Native Client 11.0]Communication link failure

108/1168     Tue Dec 11 21:49:02.125004        JDB_DRVM.C998
       JDB9900401 - Failed to execute db request

108/1168     Tue Dec 11 21:49:02.125005        JTP_CM.C1335
       JDB9900255 - Database connection to F98611 (PJDEENT02 - 920 Server Map) has been lost.

108/1168     Tue Dec 11 21:49:02.125006        JTP_CM.C1295
       JDB9900256 - Database connection to (PJDEENT02 - 920 Server Map) has been re-established.

108/1168     Tue Dec 11 21:49:02.125007        jdbodbc.C2702
       ODB0000020 - DBInitRequest failed - lost database connection.

108/1168     Tue Dec 11 21:49:02.125008        JDB_DRVM.C908
       JDB9900168 - Failed to initialize db request

Who loves spending the morning fixing jobs from the night before and moving batch queues and UBE's until things are back to normal?  Noone!

Here is something that may help, not I must admit I gotta thank an amazing colleague for this, not my SQL - but I go like it.

What you need to do is write a basic shell script (say that was on the ent server) that runs this:

select count (*) from SY910.F91300
    where SJSCHJBTYP = '1'
    and SJSCHSTTIME > (select
                      ((extract(day from (current_timestamp-timestamp '1970-01-01 00:00:00 +00:00'))*86400+
                        extract(hour from (current_timestamp-timestamp '1970-01-01 00:00:00 +00:00'))*3600+
                        extract(minute from (current_timestamp-timestamp '1970-01-01 00:00:00 +00:00'))*60+
                        extract(second from (current_timestamp-timestamp '1970-01-01 00:00:00 +00:00')))/60)-60 current_utime_minus_1hour
                        from dual);

If you get a 1 that is good, if you get 0 that is bad.  You probably need to recycle your scheduler kernel  (that control  record should change every 15 mins at least).

So, if you have a script that runs that, you can tell if the kernel is updating the control record...

Then you can grep through the logs to find the PID of the scheduler kernel and kill it from the OS.  Then I write a little executable that gives the scheduler kernel a kick in the pants (start a new one) - and BOOM!  You have a resiliant JD Edwards scheduler.




No comments: