At The Mercy Of The Arrival Rate...
Article 01/30/96
The best-managed and best-tuned transaction systems still reject transactions for "mysterious" reasons. As system managers, we try to size for the maximum transaction rate that the system will be expected to handle. This is sizing for throughput. There is another issue which we must size for, and that is simultaneity.
Unfortunately, transaction systems must deal with the real world. The real world is not totally predictable. Requests coming from ATMs or POS devices or from other origins cannot be controlled or scheduled as we wish. It is possible, indeed probable, that at some instant during the day that N people will walk up to N ATMs and submit N transactions at exactly the same instant.
Will our system be capable of handling that flood of requests?
If we have properly monitored our system, we know the maximum sustained transaction rate that our system is normally required to handle. We know this number from monitoring our system over some time period. For the sake of our discussion, let's say that we are sized for a 10 TPS rate, having observed that our peak half hour last year was about 7 TPS.
A lot of things can happen in a half hour. We saw an average of 7 TPS. What does that mean? An average of 7 TPS means that we processed 12,600 transactions in the time period. Did they arrive evenly during the period, with every minute processing 420 transactions? Did they arrive all at once, with one minute processing 12,600 transactions and the other 29 minutes processing zero transactions? Both scenarios result in an average of 7 TPS. The answer is probably no to both questions.
A large ATM system I work with will frequently hit 20 TPS for a half-hour period. During that same half-hour period its peak TPS (1 second interval) can easily go over 60. None of us wants to size the throughput of a system for its peak 1-second interval. It would simply be too expensive.
There are two issues that are involved here. One is the throughput of the application system, and how quickly a set of incoming requests can be processed before they time out. And the second one, which is related, is how many transactions can be "in flight" at a given time.
A typical transaction system consists of clients and servers, with queues between them. The clients are controlling the devices which originate the transactions. How are they configured? Are they capable of managing simultaneously one transaction (or more) in progess from each device? If they are controlling a merchant POS network, configuring the clients to do this is impractical. It may be possible to do this for an ATM network.
When the client places the transaction in the queue to the server, will it be accepted? What is the maximum queue depth? What happens if N customers submit their requests at the same time?
At the server, what happens? Are there enough servers to process N transactions at a time? Probably not. How many transactions can the server process (have "in flight") simultaneously? If the server is talking to a host or another set of servers, what is that host or server set's capability to process simultaneous transactions?
How do we know what our system is handling now? We can find out from a combination of the TPS rate and the response time. To use averages, if we are currently running an average of 7 TPS, and our response time is an average of 2 seconds, then we are handling an average of 14 simultaneous transactions at any one time.
What happens if our servers slow down? For example, what happens if the host experiences some problem, or if the comm line to the host starts taking hits and we have to retransmit the transaction? If the average response time doubles, then the number of transactions in flight will double. Is our system configured to handle this?
If our system experiences frequent slowdowns, such as host response going from 2 seconds to 10 seconds for a significant period of time, then the number of in-flight transactions that we have to size for increases accordingly, or 5-fold in our example.
Another of my clients is a cash-management system. A couple of times a year their host will have problems, and the average session length will increase manyfold. The number of simultaneous users logged in and trying to pull reports will also increase, and occasionally hits the configured maximum.
Here are my recommendations for managing these issues:
1) Monitor your TPS and response times on the system on a half-hour or shorter basis. The larger the time frame for taking an average, the less meaningful it is.
2) Compute the average value for the number of simultaneous transactions by multiplying the TPS rate by the average response time. For non-transaction systems such as reporting systems, use some direct means to discover the average number of users online simultaneously. The accounting log or the syserr_log is a potential source of this information.
3) Configure your system to handle at least triple the average simultaneous transaction or user count. By this, I don't mean design for triple the throughput. Throughput is a different issue. I am referring to the buffer counts, queue depths, server counts, communication pipeline depth, and host server counts. We don't want to reject a transaction because there wasn't a buffer or queue to receive it.
4) If a transaction may be sent to one of several destinations, configure each destination to handle the expected in-flight maximum for that destination. For example, if 50% of our transactions at 7 TPS with a 2 second response time go to the host, configure the host interface to handle at least 21 (7x2/2x3) in-flight transactions, either by adjusting its buffers, queues, or comm pipeline depth.
5) When testing or benchmarking an application, always use multiple input sources to test the number of transactions the application can correctly handle simultaneously. This kind of test frequently finds unexpected lock conflict problems.

-