WAN emulator

This post talks about making a WAN emulator from a Raspberry Pi. As the Pi runs a derivative of the Debian Linux operating system which has native packet shaping features it was an ideal choice for making a WAN emulator. The aim was to connect the Pi between a client PC and the network and allow me to simulate things like packet delay or loss. I could then get the customer to repeat transaction on the client PC and we could observe/time the effect of different network characteristics. Not only a good tool for investigation but an ideal tool for demonstrating the effect of network latency for people considering data center moves or dismissive about complaints about poor performance for users in the “regions”

I had a RaspberryPI Model B but need a few things for the WAN emulation.

  • Additional Ethernet port – the Pi has only 1 ethernet port so I needed a Ethernet to USB device. A simple eBay purchase for a few quid
  • Screen – Again another ebay purchase for a 7 inch screen with separate PSU. The screen is a bit bulky compared to the rest of the kit and I have noticed recently that there are 5 inch screen that connect direct to the Pi with no PSU needed.
  • Keyboard – Another ebay purchase of a 7″ keyboard with MicroUSB and tablet case.
  • Finally I need a HDMI connector and a MicroUSB to USB converter

The kit is all connected together and can be seen in the picture below.

 

Next you have to create a bridge between the two Ethernet adapters. This is done with the follow commands, which I have in a .sh file and run once the Pi is booted. This turns the Pi into a transparent bridge between WAN and Client PC.
ifconfig eth0 0.0.0.0
ifconfig eth1 0.0.0.0
brctl addbr bridge0
brctl addif bridge0 eth0
brctl addif bridge0 eth1
ifconfig bridge0 up

Next you can use tc to inject delay and packet loss. For example to add a 50ms delay

tc qdisc add dev eth0 root netem delay 50ms

It has been a few months since I built this and I apologies if I have forgotten any steps around installation but I found a quick google solved any problems.

Extracting data from a LoadRunner results DB

I collegue wanted to record the resource usage from the weekly performance test automatically into an excel spreadsheet.  So this is how you do it. From excel choose Data -> Import External -> New Data Query and select a MS Access database from the dialogue box. Remember this is the access database created from running and saving the analysis, the default output.mdb produced at the end of a load test if for errors. Next you will need to open the access DB that was created when you saved the results analysis. After you have done this you will be able to use MS query to create the query.

You will need to join the Host, Event_map and Monitor_meter tables to construct the query. The equiry used is shown below, where it provides the resource average for an 1 hour of the test after the first 10 minutes.

SELECT Host.`Host Name`, Event_map.`Event Name`, Avg(Monitor_meter.Aminimum) AS ‘Avg ‘
FROM `C:\….\filename`.Event_map Event_map, `C:\….\filename`.Host Host, `C:\….\filename`.Monitor_meter Monitor_meter
WHERE Host.`Host ID` = Monitor_meter.`Host ID` AND Event_map.`Event ID` = Monitor_meter.`Event ID` AND ((Monitor_meter.`End Time`>=600 And Monitor_meter.`End Time`<=4200))
GROUP BY Host.`Host Name`, Event_map.`Event Name`

Once this is working in the query editor you can return back to excel and the data will be added to the spreadsheet

Python Packet Inspector for Network Captures

I recently was involved on a performance trouble shooting exercise for a company that uses Citrix to access their core ERP application. As part of the exercise individual users has taken packet captures using WireShark for key business transactions. The transactions types where the same for all the users but the users where are different locations. The common metric from the captures was the amount of data exchanged for the transaction and the session length. As there where many locations, transaction types and users the best way to do this was to automate the analysis.

I decided that I would use python as I was fairly familiar with the language and there where several libraries for WireShark analysis.

The chosen library was http://kiminewt.github.io/pyshark/ and a good tutorial on how to use this can be found here http://thepacketgeek.com/pyshark-using-the-packet-object/

The code imports, iterates through the files in the given directory (which should point to the wireshark capture files). When it finds a capture all the citrix packets are loaded into a capture array using a filter using the citrix port number
cap = pyshark.FileCapture(pcap_file,display_filter="tcp.port == 2598")

Once the cap array is populated then a loop iterates through the array summing the size of the payload data

#Iterate through the cap array
for i in cap:
# If the packet has payload data add that to the size counter
try:
if i.data:
size = size + int(i.data.len)
except:
pass

Finally, once the array has been processed then the payload size and session time is printed out. The session time is the timestamp of the last frame in the array which is held in the iteration variable i minus the timestamp for the first frame in the array. I have used epoch time so the result is in seconds. print file, ": is :",size,": size and :",float(i.frame_info.time_epoch)-float(cap[0].frame_info.time_epoch)

The complete code with a bit of error processing is here:


import pyshark
import os
directory = "/home/andrew/Documents/CaptureDirectory"

#Iterate through every file in the directory
for file in os.listdir(directory):
#Analyse if it a wireshark capture files
if file.endswith(".pcapng"):
#Populate cap array with packets matching using the citrix port 2598
size = 0
pcap_file = (directory + "/" + file)
cap = pyshark.FileCapture(pcap_file,display_filter="tcp.port == 2598")

#Iterate through the capute array
for i in cap:
# If the packet has payload data add that to the size counter
try:
if i.data:
size = size + int(i.data.len)
except:
pass
# Print out payload size and session duration
if size > 0:
print file, ": is :",size,": size and :",float(i.frame_info.time_epoch)-float(cap[0].frame_info.time_epoch)
else:
print file," No Citrix"

An example of the output is

andrew@debian:~$ python summary.py
smith-tran12.pcapng : is : 89636 : size and : 47.6105160713
davies-tran1.pcapng : is : 267292 : size and : 62.6023669243
smith-tran11.pcapng : is : 242545 : size and : 37.8602318764
kirby-tran1.pcapng No Citrix

Using R to detect growth in perfmon resource metrics

I have started using the statistical package R to detect any trends in performance test data. In this example I am looking to detect windows perfmon metrics that increase over the duration of a performance test.

You can install R an open source statistical analysis package from here http://www.r-project.org/

The code below can be cut and pasted into the R gui command line but you will have to change the first line of the script to use the directory holding the data file (procs.csv).

The comments should give you an idea what it is doing..

setwd("C:\\Documents and Settings\\alee\\My Documents\\Projects\\youProject\\")

# Load the Data isetednto a Data Frame

pData <- read.csv("procs.csv",sep=",",header=TRUE)

NumOfCols <- length(names(pData))

# Create a metrix to hold the gradients

slope <- 1:NumOfCols

# Loop through the metrics and calculate the slope

for(i in 2:NumOfCols ) {
     x <- 1:length(pData[[i]])
     y <- pData[[i]]
     # Ignore any blank columns
     if ( is.na(y[[1]]) )
          {slope[[i]] <- NA }
     else
     {
          fit <- lm( y ~ x)
          slope[[i]] <- fit[[1]][[2]]
     }
}

results=data.frame(metric=names(pData),Co=slope,OrgOrder=1:NumOfCols)

OrderResults <- results[order(-results$Co),]

#plot top 5 growing metrics

for(i in 1:5)
{
     OrderResults[i,3]
     plot(pData[[OrderResults[i,3]]],type="o",col="blue")
     title(main=names(pData[OrderResults[i,3]]), col.main="black", font.main=4)
     # Prompt for Enter so plot stays on screen long enough to be read
     readline(prompt = "Pause. Press to continue...")
}

Data points needed for Universal Scalability Curves

I did some testing for a single transaction type in to a SOA environment. For different numbers of threads (workload) in Jmeter I measured the throughput (transaction per second) for a 10 minute period. The results are from the graph below:

From looking at the results I wondered how well I could apply Neil Gunther’s
(Universal Scalability Law). The USL is an equation that allows you to take a sparse set of load measurements and from those determine how your application will scale under larger user loads than you may be able to generate in your test lab. This can all be done in a spreadsheet tool like Excel.

I was interested to see just how many data points I would need. So, I plugged my data into Excel. I did three predictions. The first was using all the 8 data points collected during the test. Next I used the first 4 collected during the test and finally I used 4 data points spread throughout the test. The graph below shows the predicted scalability curves.

Using the first 4 data points (which looked linear’ish) then it predicted the maximum throughput would be around 75 tps and the graph didn’t show the degradation at higher thread values. What was interesting was that the spreadout results are close to the predicted curve using all the measurements.

This was my first attempt to use the scalability curve. The spreadsheet from Neil website was easy to use and I am impressed in this example that with only a few datapoints the results were close to the prediction curve using all the data points. I think I will need to do more similar experiments before I am a convert! But it looks promising.

Performance Engineering and Toilets

I was reading the paper on Saturday and noticed this snippet about student Li Tingting occupying a men’s public toilet to protest about unequal waiting times. Local Officials have promised to increase the number of Ladies toilets by 50% to decrease waiting times. Strange to some but some of the calculations we use in performance engineering can help calculate if 50% is enough to reduce waiting times.

20120225-210830.jpg

Using queuing theory a branch of mathematics, that allows us to calculate the waiting times if we know the arrival rate and the time customers spend being “serviced.” There are other consideration when using these calcuations so google “queueing theory”.

We know the “service time” for ladies and gentlemen using toilets thanks to studies done in New Zealand where it is a legal requirement that there should be a sufficient number of public toilet that the average time taken by a man is 40 seconds while it takes 90 seconds by a woman. The calculation below can be used for a single toilet and gives the time in the system for a know arrival rate and service rate.

20120226-130207.jpg

For a male toilet we can service 90 men per hour (3600 seconds in an hour divided by 40 seconds per visit) and for a female toilet we can service 40 women per hour (3600 seconds divided by 90). For a range of arrival rates using the calculation above we can calculate the average time in the system (waiting plus doing business) and plot this on the graph below:

As you can see as arrival rate increases the time women spend increases more quickly. Also note how it gets worse as the arrival rate approaches the service rate.

The example above is just for a single toilet example, but there are equations that can calculate the time for multiple service centers (in this case toilets) and therefore can be used to calculate if a 50% increase in women’s toilets would reduce waiting times. Well the devil is in the details as to whether 50% is sufficient based on current arrival rates, current number of male/female toilets but I thought let’s try a few numbers. Let’s assume there is currently 10 toilets per sex and therefore we need to look at the time spent for a range of arrival rates for 10 male toilets and 15 female toilets. The graph below shows the times for different arrival rates.

As you can see the time significantly increase for ladies well before that of the men! 50% May NOT be enough.

OK, so why is an IT performance engineering blog talking about Chinese toilets. Well it is just an example of how some equations can be used to calculate things like response times where there are limited resources. Just like when considering “how many CPUs” you need for applications the same math can used.