Friday, 14 May 2021

Load testing on linux - access.log is pretty rich in content

I've been doing some more load testing using OATS and Linux machines.

This could be one of my last load tests using OATS, as this is not going to be supported going forward.  This is a bit of a shame to me, as I've always been able to completely load test a JD Edwards implementation solo.  From the database, web servers, application servers - everything!  I know that we also had a couple of team members that had the same level of competence.  It's nice when you can simulate a load on JDE and have complete confidence for go-live.

The site that I've been working on are looking to go-live on 9.2 and have not changed their platform.  They run a pretty tight ship and are also running JDE on an ODA.  I have lots of war stories, but this machine is a beast.  768GB of memory per node and all SSD makes it pretty hard to get things wrong to be honest...  No wait, there are a million things to get wrong - but you have a lot of realestate.

So, back to the JDE content here.  The load test is somewhat different, as we are comfortable with the platform and the throughput of the hardware.  The load testing we want to do here is about stability and configuration.  We have changed database to 19c, new weblogic, new Linux and of course tools and apps - so there is a lot of change - but we want to ensure that the basic hardware and database configuration is going to be okay, as we are confident in the platform.

To do this, I find the best testing is simple long running scripts hitting popular applications.  So, lots of Sales order inquirys, item locations, pricing and more.  

While running this load, I happened to get about 3% of sessions failing with a fairly strange error: 


The HTTP error (12152) is: The server returned an invalid or unrecognized response

So, OATS is getting the server to respond, but the response is invalid.  This is going to be a little harder to track down, because it's client side.  and we don't get complete responses client side.


Another annoying (or perhaps positive) observation is that the errors are occurring all over the place - they are not on a particular screen / particular request.  You can also see that overall 36 failed out of 1100, but among the detail of the page, 3 of 1100 are failing - very small representation - but I do want to know more.

Also, I see that there is nothing in the WLS logs - I guess because it's client side.  I even went for the details in the access logging, hoping that this might lead to an observation...  But remember, my OATS scripts easily detect 404's or 500's - they are NOT getting this.

[oracle@jdevpweb03 logs]$ cat access.log |grep '1" ' | awk '{print $(NF -1) }' | sort |uniq -c
   3425 200
      8 302
      1 404


But, good to know that you can sum all of the errors / HTTP return codes from the access logs, you can also see how much traffic is being generated.  This is pretty interesting too, if you are looking for strange behavior or patterns for your security POV.

I can also see this in the OATS logs:  C:\OracleATS\logs\java_agent.log

2021-05-14 13:01:32,541 ERROR [IteratingAgent 1648][oracle.oats.scripting.modules.basic.api.internal.IteratingAgent.1648] Iteration 13 failed at line (script.java:2182). Failed to read HTTP response header from server. 
Comparable WinInet error code: Error 12152: ERROR_HTTP_INVALID_SERVER_RESPONSE The server response could not be parsed. Caused by: java.net.SocketException occurred. Error Message:Connection reset
oracle.oats.scripting.modules.basic.api.exceptions.ScriptException: Failed to read HTTP response header from server. 


Interestingly this message says that the connection was reset, but - the script has not failed - only the individual page response.  OATS actually does not recover very well from this and we get orphan sessions in server manager.  Eventually this will cause the JAS servers to have too many open sessions and crap out all of your testing.

It seems that I need to follow the bouncing ball, as the client is getting "network disconnect"

Caused by: java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:210)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
at sun.security.ssl.InputRecord.read(InputRecord.java:503)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:973)
at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:930)

I need to look at the LB, to see if this is causing some problems.

I have a theory that this is a problem with the LB, so I'm going to change my configuration that is currently going to the LB to go directly to each web server.  If I get reliability in that test, then everything is going back to the LB.  Incidentally I have run a 24 hour test against the non LB end points and did not get anything like the above (3% failure).

Guess what, all of the sessions that went direct to the webservers had 0 errors [after 2 hours].  We think that dark trace might be killing the sessions.  So the LB is causing my problems, and maybe dark trace.  Good lessons learned in a complex load testing environment












Monday, 22 February 2021

UBE performance tracking part 8

I do blog about this a lot, I'm sorry.  UBE performance is cool though, and important.

No matter what type of performance tracking you are trying to do, using batch jobs as your benchmark is a very good place to start.  I've mentioned this a number of times, but batch jobs (UBEs) give you a fairly pure performance reading - as it's two tier.  Unless you've done something crazy like map UBE BSFN's to another server - I really hope you have not done this.

This is a good method of ensuring you have consistency too.  If you are considering any change of technology - a consistent UBE performance is going to give you peace of mind.  Finally, UBE performance also gives you some great ideas of where to look for archive and purge opportunities.  I've see too many Z files grow so large that they have a negative impact on performance.

You can do everything I'm going to show you here without ERPInsights, but our product ERPInsights just makes it all much easier.

You can use SQL over F986110 and F986114 that will extract all trend data for you, but I'm going to demonstrate how I show clients their performance results.  I'm often engaged during load testing exercises to give really deep insights into queuing, runtime and rows processed data.  Performance testing requires detailed analysis of the batch queues, and you will generally run batch next to your interactive load.  Quite often the interactive performance will fluctuate more than batch - but this will give you really good pointers into what is going on.

I used to use SQL over F986110 and F986114, but now I just grab a copy of both files and upload them into bigquery, where I can plugin my insights deashboards.  It only takes a couple of minutes to upload millions of WSJ rows and get a complete understanding of all your batch performance - both holistically and on a job by job and day by day basis.




Quickly, the above shows me this month all of my batch jobs, and how their load profile compares with the previous month.  The report defaults to one month, this would be 1 week or 1 year.  You can compare rows processed, total runtime or average runtime quickly for any period you choose.  You can see that this sample data set has run nearly 400000 UBE's in the last month, which is up 15% on the previous period.  We can then compare the average runtime for each individual job to look for changes.



I also have the ability to plot that data over the last month to see when things have changed.  I can see the rows processed (by the main select loop), the average runtime and the record count (number of jobs processed per day). I just need to scroll down the list and look at the profile for each job.  This is nice, because I'm seeing the representation of 16000 rows of data instantly, and understanding the load profile of this job completely.  Another really nice thing about something like the R47042 is that the rows processed data actually represents order volumes - so there is some empirical business data that you can extract from the above.  Of course we don't mind if the jobs takes twice as long - especially if it's processing twice the data!

The reporting suite contains loads of insights out of the box, and takes moments to navigate.  None of the data is on premise anymore, so there is no performance degradation to JDE when you are getting runtime history for 1000000 batch jobs. 




Some other beneficial reports are show below:

Reports not selecting rows:

Very handy for fixing your job schedule to be more efficient.  This might be normal for some jobs, but generally indicates some improvements are needed for your job schedule.  How about using orchestration studio, check for data and then launch the report...  Instead of just launching the report!


Single job focus

We can see below ALL 16000 runs of the job in the last month, how many rows are processed and how long the jobs took.  We can definitely see that there was an issue on Feb 12, where all of the jobs spiked upwards.  We can check that against other jobs if we need to.


Queue Time and Runtime

you can look at any or all jobs at once and see if they are performing normally.  This sheet compares any date range with the previous month and graphs the entire month.  Therefore after a package build and deploy, you can ensure that your "oft" run and much loved UBE's are running in the same performant way you expect!  You can go and pat the developers on the back and tell them what a great job they are doing.


Remember that all of these reports can be extracted with SQL - but these dashboards make it easy.  All of the information is in JDE, you just need to extract the right stuff.

If you want this at your fingertips, we have a job that you can schedule which updates a cloud datasource (at an interval that you specify).  Therefore you can have up to date dashboards like the above telling you exactly how your batch window is going.  If you like - reach out.  look up this page.  https://www.fusion5.com.au/solutions/enterprise-resource-planning/jd-edwards/erp-insights/ 

We have some really cool demos, and can show you your data very quickly - to ensure that it's all kept nice and real!  We can also just take a 1 off dump of your data and allow you access to all of the reports to see how you are performing - no locked in deals.







Wednesday, 17 February 2021

Archiving my JDE data off an AS/400 back tables to CSV from a 400

Let's be honest, if you are running on a 400, your time is running out.  We have a number of very efficient platform migration mechanisms to get you on a more modern platform - and perhaps one that can move to the cloud.

I do like the AS/400, I get on that thing and can do all the admin tasks I need - but it's done.  I have a client that wants to move all their AS/400 data to some "datalake" and then farm it with custom queries.  I do NOT recommend this as an approach, as you will spend all of your time reverse engineering the data and logic to understand what you had.

I recommend putting the data on ice, in a database that you've probably paid for...  then if you need to access or search the data, use the familiar JDE interface.  Cloud is perfect for this, your footprint in terms of $$ is miniscule and you can bring the system up - as and when you need.  This is the perfect choice for archival or retuirement.

Anyway, back to the post.  This is for when I do it next time.

What I will tell you about the script below, is that it works - yay

Save it as a .vbs file (mine is called downloadDataPRODDTA) and run with

>wscript.exe /h:cscript //E:vbscript downloadDataPRODDTA.vbs

The thing about this is that it was running on an old server (oh, did I mention the AS/400)?  The old server.

Note that my script is a little more complex, in terms of the files that it uses for t he download, as it is only grabbing the tables with data, this is easy - you can search my blog for how to create the datacounts file.


drop table proddta.rowcounts;

commit;

create table proddta.rowcounts (table_name varchar (120), AS400_count integer, oracle_count integer) ;

insert into proddta.rowcounts (
select table_name, 0, 0
from qsys2.systables  t1
where table_schema = 'PRODDTA');

''Run the results of this

select 'UPDATE PRODDTA.ROWCOUNTS set AS400_count = ( SELECT COUNT(1) FROM ' || 'PRODDTA.' || table_name || ' ' || ') WHERE table_name = ''' || table_name || ''';' 
from qsys2.systables t1
where table_schema = 'PRODDTA' ;

The following will give you a nice summary 

select table_name, simd, AS400_count
from proddta.rowcounts, ol7333.f9860
where ltrim(rtrim(siobnm)) = ltrim(rtrim(table_name))
and AS400_count > 0 order by AS400_count desc

Now you are ready to start the export of tables that have rows.  Note that some have blobs and they do not behave so they have also been excluded. 

1.5GB is taking about 15 minutes on this old dog - so it's doing okay.


    schema = "PRODDTA"
    server = "JDE400"
    connectString = "Driver={IBM i Access ODBC Driver};System=JDE400;Data Source=Business Data - PROD;User Id=JDE;Password=jde;"
    dim dbConnIn
    set dbConnIn = wscript.CreateObject("ADODB.Connection")
    dbConnIn.CommandTimeout = 60000
    dbConnIn.Open(connectString)
    dim fso
    set fso = wscript.CreateObject("Scripting.FileSystemObject")

        Dim tmp
        set rs = wscript.CreateObject("ADODB.RecordSet")
        rs.Open "SELECT T1.TABLE_NAME FROM QSYS2/SYSTABLES T1, PRODDTA/ROWCOUNTS T2, OL7333/F9860 T3 WHERE T1.TABLE_NAME = T2.TABLE_NAME AND ltrim(rtrim(t1.table_name)) = ltrim(rtrim(t3.siobnm)) AND  T2.AS400_count > 0 AND T1.TABLE_SCHEMA = '" & schema & "' AND t2.table_name not in ('F99410', 'F09505', 'F03B525') ORDER BY AS400_count asc", dbConnIn
        rs.MoveFirst

    tableName = ""
    

        For Each objField In rs.fields
            While objField > "" 
                tableName = objField.Value
                folder = "E:\Myriad\datadump\" & schema & ""
                If Not fso.FolderExists(folder) Then
                    fso.CreateFolder(folder)
                End If
        'wscript.echo schema & "." & tableName
                Call getData(schema, tableName, dbConnIn, folder)
                rs.MoveNext
            WEND
        Next

        rs.close



FUNCTION getData(schema, tableName , dbConnIn , folder)

        If Not fso.FileExists("E:\Myriad\datadump\" & schema & "\" & tableName & ".csv"Then
            set rs1 = wscript.CreateObject("ADODB.RecordSet")
            rs1.Open "Select * From " & tableName, dbConnIn

            set fso = wscript.CreateObject("Scripting.FileSystemObject")
            set ts = fso.OpenTextFile("E:\Myriad\datadump\" & schema & "\" & tableName & ".csv"2, True)
            line = ""

            For Each tmp2 In rs1.Fields
                line = line & tmp2.Name & ","
            Next

            ts.WriteLine(Left(line, Len(line) - 1))

            While Not rs1.EOF
                line = ""
                For Each tmp1 In rs1.Fields
                    line = line & """" & Replace(tmp1.Value, """""""""") & ""","
                Next

                ts.WriteLine(Left(line, Len(line) - 1))
                rs1.MoveNext
            wend

            rs1.close
            ts.close
        End If

End function

Monday, 25 January 2021

AWR from Oracle SQLDeveloper, super simple

 

This is going to be a simple post, and a little regurgitation.

You cannot run the classic awrrpt.sql from sqldeveloper, because it cannot access the database servers directories to find the rdbms scripts.  But, it's easy to work around this.

The server actually just calls a procedure, which is listed below (dbms_workload_repository.awr_report_html).  In this example, I want html output.

Therefore, if you are using sqlDeveloper, and you want to run an AWR, it's really easy.

First, get the AWR ID of the start and finish times, using this SQL:

select snap_id, begin_interval_time,end_interval_time from dba_hist_snapshot;

Easy, now run the following and press the "Run Statement" green arrow ->

set linesize 8000
set feedback off;
set heading off;
set verify off;
SELECT output FROM    TABLE(dbms_workload_repository.awr_report_html ((select dbid from v$database),(select instance_number from v$instance),1956,1957,8 ));
quit;

Then copy the output from the query result window into a new file and call it AWR.html



Open that in a browser and you are in AWR territory.  Note that you need to be EE and you also need the right permissions.  Enjoy not having access to the database server.



Wednesday, 13 January 2021

regex search and replace in visual studio code

Where do I begin...

Always starts with a seemingly easy task, this one was uploading F9860 into Google bigquery - simple right?

Move it into GCS and then create a table, easy...

but, I kept have problems because the description field has ","s in it.

Okay there are a couple of fixes for this, firstly - just cleanse the data - super easy:

translate(SIMD, chr(10)||chr(11)||chr(13),’,’, ' ')

The code above will change commas, tabs and new lines into a space - perfect...  That means that the csv will have no dodgy commas.

But I was not running the SQL, so it was plan B - post SQL fix

I needed a regex that would find commas in Quotes, that was easy enough:

("[^",]+),([^"]+")

Then I needed to replace this in visual code studio, using the two grouped outputs $1$2


Note that you do this with the fields above, .* allows you to use capture groups in the replacements.  Note that I needed to run this a couple of times - as it was only doing the first comma.

There there is a little hovering over the results to see the "replace all" option - and we are done.  This did save me booting the virtual to do it at the command line.






Thursday, 17 December 2020

Performance patterns to analyse after upgrading your database.

19c is going to make everything faster, right?  Enterprise edition with multiple processes per query is going to dramatically improve JDE performance, right?  Wrong.  In my experience you will receive very little in terms of performance gains by using EE, or by upgrading to the latest oracle database release.  What you will get is patching and better responses to any issues that you might have.

The reason for this is fairly simple, JDE is no good at complex queries – period.  Generally joins a maximum of 3 tables together and then rarely uses aggregate based queries.  You can imagine that the relational operators that JDE employs on any database are tried and tested.  We are not really trying to push bitmap indexes or massively parallel queries.  I agree that you might get some benefits with some HUGE long running batch, but I’d also like to see that in practice.

In my experience, enabling parallel in an EE database for JDE caused more performance problems than benefits, as the query optimiser wanted to parallel simple queries and slowed the simple stuff down.  This might have been me doing the wrong things, but I spent quite a bit of time on it…  Anyway, this blog entry is about how I justify to a JD Edwards customer the actual results of a database upgrade.

Batch Performance

Firstly, lets look at the performance from a batch perspective.  The new database has only been live for 2 business days, so the data that we are looking at is limited to 20,000 batch jobs.  If course this is going to grow rapidly and our comparison is going to be quicker.  I can generally compare the trend analysis for EVERY batch job in a matter of minutes.


Firstly, we can look at yesterdays performance and compare this with the last 30 days and get a % change in that performance.  So really, for these often running jobs I know immediately if we have a problem or not.  I can also see that for all of the jobs, we are actually 1.8% better off after the implementation of 19c over the weekend – when comparing the average runtime of nearly 18000 jobs.  This is a positive result.

We can also drill down on any of these results and see how many times a day these jobs are running, how much data they are processing and of course – how long they are taking.



For example, for R47042 we can see that it ran 603 times on the 14th, it processed 6.8 rows on average and on average took 33 seconds to run…  This is slightly faster that the average for the last 30 days of 42 seconds.  So looking at the jobs that runs the most – the database upgrade is a great success.

Of course, we look through all of the jobs to ensure that there are not any exceptional results or slowdowns – which could mean missing statistics or missing indexes.



We are also able to quickly compare groups of jobs against the previous month to ensure that the performance is consistent.

The next place we look is the batch queue, to ensure that there has not been any excessive waiting or processing.




We can see from the above that QBATCH ran many more jobs than normal, but this is probably the scheduler catching up.

Looking at this data using runtime seconds.  We can see that perhaps QBATCH2 was not normal, but not too bad either:



Queue time is very interesting:


We can see that the queues have been held on the 3rd and the 12th of December, note the number of seconds that concurrent jobs have queued in qbatch2.

Although, in summary – everything is looking normal.



Note that we have the options of graphing any of the above metrics on the chart, jobs run, wait time, ruin time, rows processed or rows per second.  All of them help determine how things are processing and where you can make some improvements.

Generally I look at the most common jobs (the bread and butter jobs) and ensure that there is no negative impact.  Then some checking of the weekend integrities is also important.  Batch is generally a little easier to manage than the interactive side.

Note that I’m focused on the runtime, but you should also be looking at CPU utilisation and Disk activity.

Interactive Analysis




On aggregate, we can see from the above that the performance might have been impacted negatively from an interactive perspective.  We can see that the last 5 days have been quite a bit slower than has been recorded previously.

You can see that when I choose the last 5 days, I get the following


An average page load time of more than 2 seconds!

If I compare the previous 20 days, the average page load time is 1.8 seconds – so a 10% slow down – I guess that is not too bad.

We can begin to look at this data from a regional perspective too


Potentially this single table could answer all of your questions about JD Edwards performance and how things are changing month on month or week on week.

We can see how user counts change, how activity changes, and then we can see all of the performance metrics also.

You can see here how many pages are being loaded, the average server response time the average page load time and also the page download time.  Any changes to any of these metrics are very easy to find and very easy to report on.

Use this data to guarantee performance for your environment and regions.








Monday, 30 November 2020

Open API catalog import from JDE orchestration to Postman

I'm pretty sure I've blogged about how much I love postman before, it's an amazing tool for kicking off and testing your automation projects.  Being able to see the cURL commands is totally awesome too.

This is a simple blog on how to import ALL your orchestrations (or the ones that you have permission to) into a single postman collection.

There is probably a better way of doing this, but because JDE wants an authenticated call to the listing, I could not use the standard postman import from a URL - as this only seemed to be unauthenticated.


As you can see, there is an option to import from a link, you get a little error down the bottom right saying "Import failed, format not recognized".

So, plan B.

First create a new get request, that looks something like this:

https://f5dv.mye1.com/jderest/v3/open-api-catalog


Once you have done this, ensure that you're authenticating properly to AIS.  I'm using and allowing basic authentication:




Once you send that request, hopefully you get a reply like the one above.

You can use the save button in the middle right of the screen to save this to a local JSON file.



Then in your main postman window, use the import again:



Choose the file that you just saved (the JSON definition)






Choose your file, then you will get this:
Perfect.  Once you hit okay, you'll get a new collection with ALL of your orchestrations.
You will know all of the JSON inputs and all of the standards.  postman also does some nice stuff with variables and the relative paths for the calls.

This is a really neat way of getting all of your orchestration function definitions in one place.  Postman also makes the documentation process nice and easy!


As you can see above, if you create your scenarios and save the payloads, you can provide some amazing documentation for your orchestrations!