# Performance Engineering and Toilets

I was reading the paper on Saturday and noticed this snippet about student Li Tingting occupying a men’s public toilet to protest about unequal waiting times. Local Officials have promised to increase the number of Ladies toilets by 50% to decrease waiting times. Strange to some but some of the calculations we use in performance engineering can help calculate if 50% is enough to reduce waiting times.

Using queuing theory a branch of mathematics, that allows us to calculate the waiting times if we know the arrival rate and the time customers spend being “serviced.” There are other consideration when using these calcuations so google “queueing theory”.

We know the “service time” for ladies and gentlemen using toilets thanks to studies done in New Zealand where it is a legal requirement that there should be a sufficient number of public toilet that the average time taken by a man is 40 seconds while it takes 90 seconds by a woman. The calculation below can be used for a single toilet and gives the time in the system for a know arrival rate and service rate.

For a male toilet we can service 90 men per hour (3600 seconds in an hour divided by 40 seconds per visit) and for a female toilet we can service 40 women per hour (3600 seconds divided by 90). For a range of arrival rates using the calculation above we can calculate the average time in the system (waiting plus doing business) and plot this on the graph below:

As you can see as arrival rate increases the time women spend increases more quickly. Also note how it gets worse as the arrival rate approaches the service rate.

The example above is just for a single toilet example, but there are equations that can calculate the time for multiple service centers (in this case toilets) and therefore can be used to calculate if a 50% increase in women’s toilets would reduce waiting times. Well the devil is in the details as to whether 50% is sufficient based on current arrival rates, current number of male/female toilets but I thought let’s try a few numbers. Let’s assume there is currently 10 toilets per sex and therefore we need to look at the time spent for a range of arrival rates for 10 male toilets and 15 female toilets. The graph below shows the times for different arrival rates.

As you can see the time significantly increase for ladies well before that of the men! 50% May NOT be enough.

OK, so why is an IT performance engineering blog talking about Chinese toilets. Well it is just an example of how some equations can be used to calculate things like response times where there are limited resources. Just like when considering “how many CPUs” you need for applications the same math can used.

# Why do performance tests fail?

I have just been thinking recently about some of the reasons performance testing fails to stop performance problems occuring in the production environment. Below is a list of some of the reasons why performance testing can fail to spot these problems. Hopefully, the list below will provide, as a reminder of things to check next time you have to write a performance test plan. However, we must remember that like all testing, performance testing is about reducing the risk of failure and can never prove 100% that there will be no production performance problems. Indeed it may be more cost effective for some problems to occur in production than during test! Though your customer may not fully appreciate this approach.

So here is my list:

1) Ignoring the client processing time, performance test tools a designed to test the performance of the backend servers by emultating the network traffic coming from clients. They do not consider the delay induced by the client such as rendering and script execution.

2) Ignoring the WAN, again test labs often inject the load across a LAN ignoring any outside the data center network delays. This is a particular problem for chatty application when it comes to network traffic.

3) Load test scripts that do not check for valid responses, performance testing is not functional testing but it is important that for the test script you write they check they are receiving correct responses back. The classic problem has been tools that just check that a valid HTML code is returned. The problem with this is that the “We are busy” page has the same valid code as the normal page.

4) Poor workload modeling. If we can not estimate the user workload correctly the load test will never be right. You might do a great test testing for 10,000 users but that is no real help if 20,000 user arrive on day one. Don’t under estimate the need to get a good workload model.

5) Assuming perfect users, alas users are not perfect and they make mistakes, cancel order before committing and forget to log off. This leads to a very different workload than if all the users where perfect, putting a different load on the environment.

6) Bad Test Environments, a test environment should be as representative as the production environment as possible. I have seen failures particularly when the test environment has been undersized but also where is has not been configured in a similar fashion to production.

7) Neglecting Go-live+10 days performance issues, Performance testing typically focuses on testing the peak hour and a soak test. What is difficult to do in a performance test is to represent how the system will be after several days of operations. Systems can ground to a halt as logs build up and nobody has got round to running the clean up scripts or transactions slow as SQL cannot cope with the increased rows in tables.

8) Unexpected user behavior, Very difficult to mitigate this one as it unexpected! However, in many cases a lack of end user training has resulted in users doing the unexpected like the car part salesman that didn’t know how to use the system and did a wild card search to return the complete part catalogue and then scrolled through it to find the part manually each time! Caused a killer performance issue.

9) Lack of statistical rigor. You don’t need to statistical guru to run a performance test but you should at least run the test long enough and enough times to be confident that the results are repeatable.

10) Poor test data, like the test environment the test data should be as representative as possible. Logging in all the virtual users with the same user id may put a different load on the system then if each had their own user id.

# How does Citrix Improve Response Times

I was asked the other day how does Citrix improve response times. The simple answer is that it cuts down on the number of times the user has to wait for information to be transfered across the network. For example the diagram below shows an application on a client PC communicating with the DB. In this case it is Oracle forms 6i and this can take between 20-60 network hops to get the data needed to display a screen.

If you are on the same LAN as the DB then you may not notice the delay but it you access across a WAN then the time to cross the WAN all adds up to a slow response time.

With Citrix the citrix server is placed in the datacentre close to the DB. The client then makes one request to the citrix server, the citrix server makes all the requests to the DB and then when the screen is complete send a picture of the screen back to the user. As can be seen in the diagram below.

As the citrix server and DB are closely located then the network hops needed to get the data to build the screen happen quickly and the user has to suffer the delay across the WAN only once. Where applications cross the WAN many times and the delay from the users to the server is high then Citrix is likely to help improve performance. However, you need to also consider the bandwidth of the pipe between the client and the server.

To do this you can create a performance model. The following presentation contain I performance model I built for determining when to deploy citrix to various locations as part of a large upgrade project. In that project the application made 7 trips across the network to generate the display for the user. Pages 12-21 provide the model and some results.

# Seven Stages of Performance Testing Denial

As you may know, many of the ancient religions have such doctrines as “the 5 pillars of wisdom” or the “4 noble truths” that lead humble pilgrims to true enlightenment. Although I am not suggesting we start a new Performance Test religion that perhaps worships the god “Mercury”, I have noticed that there are the 7 levels of denial that developers/system architects/managers (aka pilgrims) seem to have to go through before they realise or admit that they have performance problems (i.e.  true enlightenment).

Like following a religion, this is a personal journey and a unique path is followed by each pilgrim, with no two making the same realisations or decisions at the same point in the project lifecycle, some having to repeat parts of the journey several times (as I am writing this, Mike is explaining again to another set of pilgrims how LoadRunner works).

My Experience suggests there are 7 levels of denial, as follows:

1) The load test tool must be wrong – you may be using the industrial standard performance test tool costing £100K, but the quick test the pilgrims did with the free tool downloaded from the internet was better. When you ask whether HTTP 500 status messages were trapped or if the data returned was validated the pilgrims look confused. So, take a deep breath, explain the benefits of a proper tool and move on.

2) The performance scripts or workload model must be wrong – you have only been a performance tester for 5 years and worked on countless projects, so its nice to be told you are stupid. Take a deep breath, walk them through the code and enjoy their look of surprise as they suddenly realise what correlation is.

3) The system is not finished so doesn’t need to be tested – pilgrims can believe that the last 5% of functionality will increase performance so that a poorly performing system will obviously get better in a minute.. Explain politely how adding new code won’t magically improve the performance of the old.

4) It’s not our system, it’s the network etc- blaming the supplier of the components is a common area of denial. To correct this misconception, feign a look of surprise and then arrange a test to show that the offending component runs super fast.

5) We JUST need to configure parameter X – the pilgrim often has the belief that the correct setting of a single magic parameter will solve any problem. What is annoying is the condescending tone often in which the pilgrim states that this is surely the problem and that you the tester must be a complete donkey for not setting this. Of course you smile politely, say lets give it a go and, when nothing changes be dutifully diplomatic. Often you will iterate on “number 5” as several pilgrims in the development team search for the “silver bullet” (or is it “Rocking horse pooh”). An obvious attraction to pilgrims of this approach is that the solution is “only a test cycle away”. Many a manager pilgrim has followed this route.

6) Throw hardware at the problem – although this does often have an effect, adding an extra processor to a DB server that is crippled by too many stored procedures doing table scans is as useful as a chocolate tea pot. Take another breath, particularly when you are told the lead time to order the components and re-install the software. Just stay alert because the pilgrim will be very happy believing they are solving the problem and can relax while they await the new hardware.

7) We JUST need to tune a small part of the system – there is often a hope that only a small store procedure or code element, once tuned, will yield the magical performance improvement or that all the performance problems can be found in one part of the architecture. At last the pilgrims are getting somewhere on their journey allowing you to progress too – some measurements at least have to be taken to identify the bad boy item. You can smile now; the journey’s nearly over..

Journey’s End – Wow! we do have a performance problem – at last your bunch of pilgrims have made the journey to true enlightenment. Like any religious journey it is full of self-doubt and distractions along the way, but at last you can finally start to solve the problem. Just hope now that a new project manager doesn’t get involved and you have to start at step 1 again.

Some projects have less potential areas for deviating from the true path, and some have more, but each pilgrim has to find his own way. Your job is to guide and educate!

# Performance Test Best Practise

An old colleague asked if there are any standards identifying best practise in performance testing. I could not think of any but it started me thinking about what is best practise. Here are my thoughts on some areas of best practise in Performance Testing. They are NOT in any order of importance and the list is NOT exhaustive.

1.) Have a defined process and constantly refine it.
Before you start you should have a process defined and you should make sure you review this process to add improvements. The process needs to be flexible in order to accomodate different types of projects, from benchmarking a core application through to making sure an e-commerce site can handle the Christmas rush.

2.) Define the Goals up front.
This seems obvious, but you need to understand why are you testing and what the performance goals of the system under test are. (Note I use the word goals not requirements). Here, the move to ITIL may help where service design packages developed early on should include the performance requirements.

3.) Let Risk guide you.
The performance risk and consequences of failure should guide the type and amount of performance testing you do. Don’t just test what is easy to test.

4.) Don’t be afraid to say no.
If you are given responsibility for signing off on the performance of the system, you are the expert. If, subsequently, you are not given enough time or the correct tools then be prepared to say that you cannot test the system adequately. Remember that the caveats you place in your final report may never make it into the summary presented to the management board!

If you don’t test the system with the correct workload it won’t matter if everything else is perfect – the results will be wrong. This means you need to understand user behaviours and their frequency. Don’t forget to include error scenarios as well.

6.) Develop Quality Scripts.
Make sure your scripts emulate user behaviour as much as possible and remember that users make mistakes, leave processes early and have comfort breaks! Also, make sure your script check what is returned to the user is what is expected.

7.) Select an appropriate test environment.
Is best practise using a production-sized test environment? Not sure, but make sure your test environment is sized and up to the job involved. Make sure you can collect the necessary data about the performance of that environment during the load test.

8.) Run your performance tests for long enough and often enough.
Make sure your tests are repeatable and that the results they produce are statistically valid.

9.) Participation.
Get all the people that need to be involved in the performance test working togther. Unless you are superhuman and multi-skilled you will need DBAs, administrators, developers, Project Managers, etc., to assist in the test. Remember, for the best results get these stakeholders involved in the process early on.

10.) Remember, people want results not data.
Don’t just present the canned report from the performance test tool; you need to analyse the results and present the key facts of the load test. And remember that different people will want different results from the load test – a manager will want to know if it passed whereas the DBA will want to know if the SGA is sized correctly, for example.

# Small vs Large Scale Performance Test Environments

I have just added to the website a presentation that looks at sizing and extrapolation techniques for people considering building a small scale performance test environment instead of a large full scale performance test environment. In the paper several approaches are considered.
Factoring – This is where the architecture is easily scaled and therefore the performance test can be undertaken on a subset of the hardware.

Dimensioning – The architecture has known bottlenecks that drive the performance such as a central DB. The performance test environment must contain the bottleneck component but other components may not need to be representative of a full sized environment.

Modelling – This examines the use of modelling to take results from a small scale environment and predict the results for a larger scale environment.

Flipping – This looks at creating test environment that can be have the correct amount of resources allocated to them for a “full scale” performance test for example during off hours and then revert to a smaller scale performance test environment at other times.

Full Scale – Finally the advantages and disadvantages of a full scale performance test environment are discussed.

Finally the caveat for these techniques is that for any testing on a small scale performance test environment does not guarantee that all performance problems will be discovered due to application/scalability constrains that may only appear in the full sized environment!

# Calculating Concurrency from Performance Test Results

So you are on a performance test engagement and your boss asks how many people concurrently executing certain transactions like buying a book or doing a search. He wants is a measure of active concurrency – how many people are doing certain transaction. This should not be confused with Passive concurrency like how many people are logged in. Before we go anyfuther lets clarify that in this example a transaction is a request to the test system and a response back it does not include any think time. Now before you start getting out the virtual terminal server and incrementing counters at the start of the transaction and decrementing counters at the end. There is an easier way.

You can work this all out from your performance test results, without the need for code. Using a mathematical formula (it’s very simple so don’t panic) called Littles Law. Littles Law was first used to analyse the performance of telephone exchanges in 1969 by John Little.
Little’s law allows us to relate the mean number of items in the system in our case concurrent users with the mean time in the system (response time) as follows:

Number of Items in the system = Arrival Rate x Response Time

There is one rule to remember before you use little law you must make sure the system is balanced. That is the arrival rate into the system is the same at the exit rate.

I will begin with a none computer example the “Black Horse Pub” has a mean arrival rate of 5 customers per hour that stay for on average half an hour. Using Little’s law we can calculate the mean number of customers in the pub as Arrival Rate x Response Time = 5 x 0.5 = 2.5 customers.

To apply little law to a performance test we must first make sure that we are taking measurements from when the system under test is balanced. Remember a balanced system the rate of work entering the system matches the rate of work leaving the system. This for a typical load testing tool is after the ramp up period and the number of virtual users remains constant and response times have stabilised and the transaction per second graph is level. To capture this period of time in LoadRunner for example you would need to select the time period in the Summary report filter or under the Tools -> Options.
So record the average response time for the transaction of interest and the number of times per second the transaction is executed.

So from the example above the response time is 43.613 seconds. The arrival rate is the number of transactions executed divided by the duration. The duration for this example was a 10 minute period as can be confirm by the LoadRunner summary below.

This gives you an arrival rate of 2.005 calculated by taking the count 1203 divided by the duration 600.

So the concurrent number of users waiting for a search to return is 87.44
There you go from your performance test results you can easily calculate the concurrency for a particular transaction.

# How do you know if your Load Test has a bottleneck

The bottleneck in a system may not be obvious. (Life would be easier but less fun if there where always easy to find). This is because there are two types “hard” and “soft”. Hard bottlenecks are the ones where a resource such as a CPU is working flat out which limits the ability of the system to process more transaction. While a soft bottleneck is some internal limit such at number of threads or connections that once all used limit the ability to process more transaction. Therefore, how do you find know if you have a bottleneck. If you are looking at the results from a single load test you may not know you will need to run multiple load tests at different numbers of virtual users and then see if you number of transactions per second increase with each increase in virtual users. The results can be seen in the two graphs below. The first shows how the throughput (transaction per seconds) increases and levels off when saturated and the second shows the response time. You will probably have heard the express below the knee of the curve and this is an the point that is to the left of the bend in the response time graph.

Throughput Graph

The graphs above where actually generated using a spreadsheet model for the performance of a closed loop model. This is like LoadRunner and other testing tools where the are a fixed number of users that use the system then wait and return to the system. The reality is that the performance graphs may look different from the expected norm. An example is shown below from a LoadRunner test the first graph shows how the number of VUser where increased during the test and the second graph shows the increase in response times. In this case the jump in response time is dramatic. However, in some cases the increase in response time will be less dramatic as the system will start to error at high loads which will distort the response time figures.

Having discovered there is a bottleneck in the system then you have to start looking for it.

# Scalability

Scalability can be defined in many ways. However, in general it is the relationship of how an output increases with a change in input. Typically we may think of how throughput changes as we increase the number of CPUs. In a perfect world we would like to have linear scaling. . I came across a good example of non-linear scaling. It is from a presentation presented by Peter Hughes. It is where you are having a dinner party and have 1 meter square tables each table seats 4 people. As it is a dinner party you want to have everybody facing each other as much as possible. So with one table you can sit 4 people.

To increase the number of guests you need 4 tables but you can now only sit 8 people.

To increase the number of guests again you need 9 tables but you can now only sit 12 people.

If you plot the relationship between guests and tables on a graph is looks like the one below.