Tuesday 20 October 2009

metadata kernel restarts

We just had a problem where the metadata kernel stopped performing requests for UBEs.  The UBE would be ‘P’ing (hehe), yet nothing was occurring.  We could not see any “introspective” data either.  After running a job with logging on, we saw the message:

Oct 20 13:44:29.323150    specopen.c2949     - 585730/1 MAIN_THREAD                           Spec Encapsulation UBE Job Number set.  Job Number = 97900
Oct 20 13:44:29.323195    specmisc.c1669     - 585730/1 MAIN_THREAD                           Waiting for metadata kernel to finish cache loading for job 97900.

And that is where it would stay.  I found the metatdata kernel pid and kill –9’d the sucker on the enterprise server.  Another one started immediately. (I did actually test the process on the DEV machine before blazing away at prod – unlike me I know).

I also made a change to the JDE.INI on the ent server to increase the number of metaData kernels to 2.  This was from a metalink article.  This will not take affect until the next restart.

This actually occurred because the JDE account for out PP812 environment (shares prod end server) locked out.  The metadata kernel seemed to get into a spin.  The only jobs that were running where those that already had a “runtimeCache” directory.

Note that you can do the same to queue kernels, go into JDE and P986130 and choose “refresh queue” for any queue on the ent server that you smashed the queue kernel.  This will restart it immediately and the business will not be the wiser!  (note that this is E812 queues, where they are a single kernel).

No comments: