Monday 30 November 2009

Transaction server, 8.98, WAS, JAS, RTE, TLA

A bit more information about the transaction server.

Everything has been running for a couple of months without any problems.  Regular scrutiny of F90710 was revealing 0 records, regular scrutiny of MQQueues via SIBExplorer (oh how I love that program), is revealing 0 records in all of the queues – or at least when the system is very busy there are a couple of records in the various queues.

We’ve had one incident of everything falling over, but it was repaired quickly with a transaction server restart…  until…

We started to get a bunch of messages staying at status 4 in the sy812.f90710

sometimes > 5000 a day.  This is nay good I was thinking.  What has changed?

It finally occurred to me (like when Bart was trying to explain to Homer about Sideshow Bob’s plans to kill aunt Selma) that an additional server must be stealing the messages…  And because the prod txn is the only one that should be getting the messages, this is the problem. I’m still trying to jump through the hoops to understand more about this model.

I think that the txn server is updating the records to 4, but then because the message never gets to the txn server – they are not getting deleted (nice and reliable I guess). So what we do have is a bunch of messages at status 4 that the transaction server does not know about.  This is “sub optimal”.

The transaction server seems to poll the F90710 table for any type 3 records.  When it finds them it updates them to 4.  It then grabs the message and puts the message into the MQ queue and then deletes the message once this has been done.  We are having the problem that another txn server is grabbing the messages (I think) and not able to process them.  Therefore they are staying at status 4.  All I need to do is update all of them to a 3 and they process successfully.

A couple of risks with the above is that the messages in the F90710 might contain old data (or might contain new data??) and therefore the old data might be going to all of the servers?  I can’t be sure of this.

So, pretty amazing really.  If you are getting messages staying in the F90710 at status 4 and you have multiple transaction servers, I’d be checking on them and making sure that your OCMs are right.  Make sure that your other txn servers are polling the correct F90710 tables!

1 comment:

Breves said...

Hi,

Interesting post. I just had the same problem and had hunch about this situation too. I created new tables and remapped them through the OCM. It worked fine, with very little trouble. Your post confirmed my suspition. Thanks...