Wednesday 8 August 2018

A long was to tell a short story–HAFS mounts on an ODA for PrintQueue

Everyone wants to create a disposable compute environment, it’s the right thing to do.

IF your machines / servers are stateless, then this is the first step to being elastic and more portable – think containers…  I know that I’m talking about an ODA here, but you can still put constructs into the design that allow you to be more flexible for HA and DR…  that is taking stateful data from your machines and creating a level of abstraction.

So, helping out with JDE, PrintQueue needs to go!

If you want to make your environment elastic, then eventually you need to put printqueue somewhere else and mount the location on your enterprise server.

Imagine that you did all of this, and then when the filesystem was mounting on boot, it wanted to FSK the mounted drive, and perhaps you do not have a network yet…  Guess what happens – NOTHING

Imagine if this was a ODA and the VM did not really give you great access to grub – wow – you got a problem!

Welcome to my world!

/etc/fstab looked something like:

10.255.252.180:/u01/app/sharedrepo/printqdv /mnt/printqueue  nfs nfsvers=3,rw,bg,intr,0 0

easy – leaving this automatic.  the use of hard is implied if not specified, so we do not need this in the mount options.

We created this FS on ODA_BASE as a repo

oakcli create repo printqdv -size 200G -dg DATA

Then on ODA_BASE created the NFS server using grid based srvctl (HAFS) which

srvctl stop exportfs -name printqdv
srvctl remove exportfs -name printqdv
  srvctl add exportfs -name printqdv -id havip_1 -path /u01/app/sharedrepo/printqdv -clients 10.255.252.150 –options rw,no_root_squash"
srvctl start exportfs -name printqdv

So we have a shared printqueue as a repo on ODA_BASE that all of the guests can mount using NFS.  Therefore when using WSJ, all jobs are in a single location and we can support seamless augmentation of logic hosts.

But, when automounting on guests (enterprise servers), we found though is when the machine needed to FSK, it wanted to do this to the printQueue and would not boot.

This is highly risky, we implemented the following:

10.255.252.180:/u01/app/sharedrepo/printqdv /mnt/printqueue  nfs nfsvers=3,rw,bg,intr,_netdev 0 0

adding _netdev to try and tell the boot sequence to only attempt this if there was a network, that should be much nicer.  Though I still am a little worried that this is a massive FS and I don’t want to wait for an FSK EVER!

But, could I risk this?  I want to do noauto and have a systemctl command mount the printqueue manually after boot.

go to this dir as root

/etc/systemd/system

create a new file (use a name that you want for your service)

vi jdePrintQ.service

Add the following contents

[Unit]
Description=JD Edwards PrintQueue
After=network.target

[Service]
#these need a full path
ExecStart=/usr/bin/mount /mnt/printqueue
ExecStop=/usr/bin/umount /mnt/printqueue

[Install]
WantedBy=multi-user.target

Now chmod

chmod 755 ./jdePrintQ.service

you can now use the command systemctl start jdePrintQ and she’ll start – easy

You can stop it too

systemctl stop jdePrintQ

get the status

systemctl status jdePrintQ

Aug 08 12:41:29 bear. systemd[1]: Started JD Edwards PrintQueue.
Aug 08 12:41:29 bear. systemd[1]: Starting JD Edwards PrintQueue...
Aug 08 12:41:29 bear. mount[3351]: mount.nfs: /mnt/printqueue is busy or already mounted
Aug 08 12:41:29 bear. systemd[1]: jdePrintQ.service: main process exited, code=exited, status=32/n/a
Aug 08 12:41:29 bear. systemd[1]: Unit jdePrintQ.service entered failed state.
Aug 08 12:41:29 bear. systemd[1]: jdePrintQ.service failed.

I guess that this is belt and braces, but rescuing a non booting VM on an ODA is not the most fun job in the world.

Just ensure that fstab now looks like this too:

10.255.252.180:/u01/app/sharedrepo/printqdv /mnt/printqueue  nfs nfsvers=3,rw,bg,intr,_netdev,noauto 0 0

The addition of noauto

Make sure you set your service to start on boot:

[root@bear system]# chkconfig jdePrintQ on
Note: Forwarding request to 'systemctl enable jdePrintQ.service'.
Created symlink from /etc/systemd/system/multi-user.target.wants/jdePrintQ.service to /etc/systemd/system/jdePrintQ.service.


It’s really important that you are ready to rescue a VM’s boot disk or any disk for that matter on the ODA.  Remember that if a repo runs out of space, or there is an ACFS problem (which we had lots of) that prevents ODA_BASE seeing / writing to the repo.  You are probably going to get corrupt machines – or at least the need to FSK drives.  Make sure that you have the ability to clone a boot disk and mount it to a temporary guest so that you can fix any problems with the /etc/fstab or other files that might be giving you problems on boot.  Perhaps you can stop services too.  I did NOT have a problem getting to the console of the machine.

No comments: