Performance Trouble Shooting Story No.1

Sometimes in my job I will have to troubleshoot some old technology. This is a case where I had a call that an application that run in CGI process spawned from an IIS website was running a lot slower that the same application in a test environment. This was for a single iteration of a request to the application. The customer was keen to find out why that although the test environment was identical to production there was such a performance difference. Luckily, we could take out one of the IIS servers from production so that we could us it for testing.

To start with we did some benchmark tests on the machine using tools like IOmeter and zCPU just to confirm the raw compute power was the same. As a separate thread various teams where checking to make sure the two environment where identical not only in terms of hardware and OS but security, IIS config etc. They all came back to say the production and test environments where identical. Next we wanted to see if the problem was in the application itself or the invocation of the application. Someone knocked up a CGI hello world application which we tested in test and production. This showed that the production environment was slower running the “Hello World” application and therefore the problem was around the invocation of the CGI process. Next I wrote a C# program that invoked a simple process and timed it. Running this in both test and prod showed no difference between the two environment. This lead to the problem being specific to the IIS/CGI.

The next step was to take a Process Monitor trace (Process Monitor – Windows Sysinternals | Microsoft Docs). The first thing we noticed was the trace size was significantly larger for the production trace. On the production server there are multiple calls made to Symantec. With the production trace showing multiple calls made to Symantec end point security before the start of the LoadImage event. At the start it only add 1 second to the trace before the image load but there are latter calls to the registry. With this information we asked the security team to check again the security settings and they discovered they where different this the exceptions set in test not set in production. The settings where reviewed and it was decided that the same exceptions rules could be applied across both environment and after this was done the retesting showed that both environment provided the same level of performance.

What is the lesson for this. Let data be your truth! I am not saying the people checking the security settings lied but comparing lots of settings is complex and often not automated. Add in the biases that people think they have configured the system correctly to start. leads to early conclusions that can be flawed.

WAN emulator

This post talks about making a WAN emulator from a Raspberry Pi. As the Pi runs a derivative of the Debian Linux operating system which has native packet shaping features it was an ideal choice for making a WAN emulator. The aim was to connect the Pi between a client PC and the network and allow me to simulate things like packet delay or loss. I could then get the customer to repeat transaction on the client PC and we could observe/time the effect of different network characteristics. Not only a good tool for investigation but an ideal tool for demonstrating the effect of network latency for people considering data center moves or dismissive about complaints about poor performance for users in the “regions”

I had a RaspberryPI Model B but need a few things for the WAN emulation.

  • Additional Ethernet port – the Pi has only 1 ethernet port so I needed a Ethernet to USB device. A simple eBay purchase for a few quid
  • Screen – Again another ebay purchase for a 7 inch screen with separate PSU. The screen is a bit bulky compared to the rest of the kit and I have noticed recently that there are 5 inch screen that connect direct to the Pi with no PSU needed.
  • Keyboard – Another ebay purchase of a 7″ keyboard with MicroUSB and tablet case.
  • Finally I need a HDMI connector and a MicroUSB to USB converter

The kit is all connected together and can be seen in the picture below.


Next you have to create a bridge between the two Ethernet adapters. This is done with the follow commands, which I have in a .sh file and run once the Pi is booted. This turns the Pi into a transparent bridge between WAN and Client PC.
ifconfig eth0
ifconfig eth1
brctl addbr bridge0
brctl addif bridge0 eth0
brctl addif bridge0 eth1
ifconfig bridge0 up

Next you can use tc to inject delay and packet loss. For example to add a 50ms delay

tc qdisc add dev eth0 root netem delay 50ms

It has been a few months since I built this and I apologies if I have forgotten any steps around installation but I found a quick google solved any problems.

Extracting data from a LoadRunner results DB

I collegue wanted to record the resource usage from the weekly performance test automatically into an excel spreadsheet.  So this is how you do it. From excel choose Data -> Import External -> New Data Query and select a MS Access database from the dialogue box. Remember this is the access database created from running and saving the analysis, the default output.mdb produced at the end of a load test if for errors. Next you will need to open the access DB that was created when you saved the results analysis. After you have done this you will be able to use MS query to create the query.

You will need to join the Host, Event_map and Monitor_meter tables to construct the query. The equiry used is shown below, where it provides the resource average for an 1 hour of the test after the first 10 minutes.

SELECT Host.`Host Name`, Event_map.`Event Name`, Avg(Monitor_meter.Aminimum) AS ‘Avg ‘
FROM `C:\….\filename`.Event_map Event_map, `C:\….\filename`.Host Host, `C:\….\filename`.Monitor_meter Monitor_meter
WHERE Host.`Host ID` = Monitor_meter.`Host ID` AND Event_map.`Event ID` = Monitor_meter.`Event ID` AND ((Monitor_meter.`End Time`>=600 And Monitor_meter.`End Time`<=4200))
GROUP BY Host.`Host Name`, Event_map.`Event Name`

Once this is working in the query editor you can return back to excel and the data will be added to the spreadsheet

Python Packet Inspector for Network Captures

I recently was involved on a performance trouble shooting exercise for a company that uses Citrix to access their core ERP application. As part of the exercise individual users has taken packet captures using WireShark for key business transactions. The transactions types where the same for all the users but the users where are different locations. The common metric from the captures was the amount of data exchanged for the transaction and the session length. As there where many locations, transaction types and users the best way to do this was to automate the analysis.

I decided that I would use python as I was fairly familiar with the language and there where several libraries for WireShark analysis.

The chosen library was and a good tutorial on how to use this can be found here

The code imports, iterates through the files in the given directory (which should point to the wireshark capture files). When it finds a capture all the citrix packets are loaded into a capture array using a filter using the citrix port number
cap = pyshark.FileCapture(pcap_file,display_filter="tcp.port == 2598")

Once the cap array is populated then a loop iterates through the array summing the size of the payload data

#Iterate through the cap array
for i in cap:
# If the packet has payload data add that to the size counter
size = size + int(

Finally, once the array has been processed then the payload size and session time is printed out. The session time is the timestamp of the last frame in the array which is held in the iteration variable i minus the timestamp for the first frame in the array. I have used epoch time so the result is in seconds. print file, ": is :",size,": size and :",float(i.frame_info.time_epoch)-float(cap[0].frame_info.time_epoch)

The complete code with a bit of error processing is here:

import pyshark
import os
directory = "/home/andrew/Documents/CaptureDirectory"

#Iterate through every file in the directory
for file in os.listdir(directory):
#Analyse if it a wireshark capture files
if file.endswith(".pcapng"):
#Populate cap array with packets matching using the citrix port 2598
size = 0
pcap_file = (directory + "/" + file)
cap = pyshark.FileCapture(pcap_file,display_filter="tcp.port == 2598")

#Iterate through the capute array
for i in cap:
# If the packet has payload data add that to the size counter
size = size + int(
# Print out payload size and session duration
if size > 0:
print file, ": is :",size,": size and :",float(i.frame_info.time_epoch)-float(cap[0].frame_info.time_epoch)
print file," No Citrix"

An example of the output is

andrew@debian:~$ python
smith-tran12.pcapng : is : 89636 : size and : 47.6105160713
davies-tran1.pcapng : is : 267292 : size and : 62.6023669243
smith-tran11.pcapng : is : 242545 : size and : 37.8602318764
kirby-tran1.pcapng No Citrix

Using R to detect growth in perfmon resource metrics

I have started using the statistical package R to detect any trends in performance test data. In this example I am looking to detect windows perfmon metrics that increase over the duration of a performance test.

You can install R an open source statistical analysis package from here

The code below can be cut and pasted into the R gui command line but you will have to change the first line of the script to use the directory holding the data file (procs.csv).

The comments should give you an idea what it is doing..

setwd("C:\\Documents and Settings\\alee\\My Documents\\Projects\\youProject\\")

# Load the Data isetednto a Data Frame

pData <- read.csv("procs.csv",sep=",",header=TRUE)

NumOfCols <- length(names(pData))

# Create a metrix to hold the gradients

slope <- 1:NumOfCols

# Loop through the metrics and calculate the slope

for(i in 2:NumOfCols ) {
     x <- 1:length(pData[[i]])
     y <- pData[[i]]
     # Ignore any blank columns
     if ([[1]]) )
          {slope[[i]] <- NA }
          fit <- lm( y ~ x)
          slope[[i]] <- fit[[1]][[2]]


OrderResults <- results[order(-results$Co),]

#plot top 5 growing metrics

for(i in 1:5)
     title(main=names(pData[OrderResults[i,3]]), col.main="black", font.main=4)
     # Prompt for Enter so plot stays on screen long enough to be read
     readline(prompt = "Pause. Press to continue...")

Data points needed for Universal Scalability Curves

I did some testing for a single transaction type in to a SOA environment. For different numbers of threads (workload) in Jmeter I measured the throughput (transaction per second) for a 10 minute period. The results are from the graph below:

From looking at the results I wondered how well I could apply Neil Gunther’s
(Universal Scalability Law). The USL is an equation that allows you to take a sparse set of load measurements and from those determine how your application will scale under larger user loads than you may be able to generate in your test lab. This can all be done in a spreadsheet tool like Excel.

I was interested to see just how many data points I would need. So, I plugged my data into Excel. I did three predictions. The first was using all the 8 data points collected during the test. Next I used the first 4 collected during the test and finally I used 4 data points spread throughout the test. The graph below shows the predicted scalability curves.

Using the first 4 data points (which looked linear’ish) then it predicted the maximum throughput would be around 75 tps and the graph didn’t show the degradation at higher thread values. What was interesting was that the spreadout results are close to the predicted curve using all the measurements.

This was my first attempt to use the scalability curve. The spreadsheet from Neil website was easy to use and I am impressed in this example that with only a few datapoints the results were close to the prediction curve using all the data points. I think I will need to do more similar experiments before I am a convert! But it looks promising.