Performance Trouble Shooting Story No.1

Sometimes in my job I will have to troubleshoot some old technology. This is a case where I had a call that an application that run in CGI process spawned from an IIS website was running a lot slower that the same application in a test environment. This was for a single iteration of a request to the application. The customer was keen to find out why that although the test environment was identical to production there was such a performance difference. Luckily, we could take out one of the IIS servers from production so that we could us it for testing.

To start with we did some benchmark tests on the machine using tools like IOmeter and zCPU just to confirm the raw compute power was the same. As a separate thread various teams where checking to make sure the two environment where identical not only in terms of hardware and OS but security, IIS config etc. They all came back to say the production and test environments where identical. Next we wanted to see if the problem was in the application itself or the invocation of the application. Someone knocked up a CGI hello world application which we tested in test and production. This showed that the production environment was slower running the “Hello World” application and therefore the problem was around the invocation of the CGI process. Next I wrote a C# program that invoked a simple process and timed it. Running this in both test and prod showed no difference between the two environment. This lead to the problem being specific to the IIS/CGI.

The next step was to take a Process Monitor trace (Process Monitor – Windows Sysinternals | Microsoft Docs). The first thing we noticed was the trace size was significantly larger for the production trace. On the production server there are multiple calls made to Symantec. With the production trace showing multiple calls made to Symantec end point security before the start of the LoadImage event. At the start it only add 1 second to the trace before the image load but there are latter calls to the registry. With this information we asked the security team to check again the security settings and they discovered they where different this the exceptions set in test not set in production. The settings where reviewed and it was decided that the same exceptions rules could be applied across both environment and after this was done the retesting showed that both environment provided the same level of performance.

What is the lesson for this. Let data be your truth! I am not saying the people checking the security settings lied but comparing lots of settings is complex and often not automated. Add in the biases that people think they have configured the system correctly to start. leads to early conclusions that can be flawed.

Leave a Reply

Your email address will not be published. Required fields are marked *