Tuesday, 22 March 2016

Database class and throughput lessons for JDE in AWS

You gotta start somewhere…  We all know that you need to start somewhere, but where should we start?

This is a question that I had when choosing the database instance class for a POC in AWS.  I needed to start somewhere.  I did not want to spent too much money, although I do pay by hour… 

AWS teaches you to be a little more efficient with things like IO, because you are paying for it and you are limited in about 10 different ways.

image

Look at the above graph, 60MB/s every day of the week – all large reads.

image

Database recording throughput at 60000 IOPs, but wait – I paid for 3000.

image

AWS is telling me that I’m only using 500 IOP’s but this is a hard celling.  This is limiting my performance…  How do i know?

image

This is how, look at my queue depth

image

The above is a little too “choppy” for my liking too, so I need to smooth this out.

image

Across my architecture I don’t see too much activity at all…  I do not see enough…  Classic bottleneck

So, I do a deep dive before I ask AWS to raise my IOPS.

DB Instance Classes Optimized for Provisioned IOPS

Dedicated EBS Throughput (Mbps)

Maximum 16k IOPS Rate**

Max Bandwidth (MB/s)**

db.m1.large

500 Mbps

4000

62.5

db.m1.xlarge

1000 Mbps

8000

125

db.m2.2xlarge

500 Mbps

4000

62.5

db.m2.4xlarge

1000 Mbps

8000

125

db.m3.xlarge

500 Mbps

4000

62.5

You gotta start somewhere – remember, I started at the bottom line above…  Look at the Max Bandwidth# – that is ME!!  In the addition of reads and writes, I’m hitting this pretty hard.

So I need to up my database server class to something with some more IO – in terms of MB/s – even to get close to 3000 IOPs.  REMEMBER THIS!!  This is an important equation if you are looking at RDS.  Sure, you might think that 3000 IOPs is good when provisioning your server – but if you are limited by this bandwith, you are high and dry.  (Note that with an 8K page, oracle is doing large reads…  We are getting 500 IOPs and 60MB, so it seems that oracle is doing 120K reads??) this seems strange.  although I don’t doubt the MB/s

I need a class above!

db.m3.2xlarge

1000 Mbps

8000

125

db.m3.xlarge

$1.680

db.m3.2xlarge

$3.360

image

Double memory, double cores and double disk throughput

Click a few buttons – and viola!  We are modifying the database to a new class instance

image

Note that you do lose the database!  this was not made clear with the wizard…

time (utc+11)

event

Mar 22 3:04 PM

DB instance restarted

Mar 22 3:04 PM

Multi-AZ instance failover completed

Mar 22 3:03 PM

Multi-AZ instance failover started

Mar 22 2:53 PM

Applying modification to database instance class

Mar 22 5:07 AM

Finished DB Instance backup

Mar 22 5:03 AM

Backing up DB instance

I had a SQL plus session open, and I got the following errors mid way

Error starting at line : 12 in command -
select count(1) from proddta.f0911
Error at Command Line : 12 Column : 1
Error report -
SQL Error: No more data to read from socket

Error starting at line : 13 in command -
select count(1) from proddta.F4801
Error at Command Line : 13 Column : 1
Error report -
SQL Error: No more data to read from socket

Now look!!!  The statistics are vastly different.  The database now has enough memory for SGA, so the physical writes are actually significantly lower – because they do not need to be physical anymore.

image

All fixed with a different class of DB server.

And also, here is the default performance graphs of a system that is hitting some serious ceilings…

image

Above is the before performance…  All scripts running about the same time 250 seconds (give or take)

image

Here is the exact same payload with a proper sized DB server, the difference is AMAZING!  All scripts lower than 250 seconds on average.

No comments:

Extending JDE to generative AI