Fundamentals of Capacity Management
(Part III) Relating demand to usage.
As we discussed in the last article in this series, you must always design for your system's peak half-hour. We're referring to the peak half-hour as measured by the demand on the system: The transaction peak, for most on-line systems.
The ugly fact is that usage of the system may not always be related to demand. In fact, for most systems, usage throughout the day usually deviates significantly from demand. There are lots of things that interfere: Two unrelated applications may compete with each other; Periodic data extracts and data transmissions must occur on their own schedule; Operations must run backups and other maintenance jobs; And there's always the daily batch processing.
So, how do you get a handle on what's going on? We recommend that you do two things. Look at the peak half-hour, and investigate a heavy day.
Remember when we talked last time about capturing data from the peak half-hour? "You also will want to know the system's usage for that peak half-hour. Note CPU use, memory use, cache hit ratio, disk busy for each disk, and the throughput on all of the transaction-related comm lines during that half-hour." You should be keeping an 18-month history of this as well. But keep it in a particular format: Keep each number as a ratio to the transaction rate. For example, we recommend that you keep the CPU and disk busy numbers as a "percent per 1 TPS." The table below shows a relatively well-behaved system. We call it well-behaved since the ratios are not changing that much.
| Activity / 1 TPS |
TPS |
CPU % |
Intr % |
PgFlts |
Disk I/O |
| 12:02 Sa 10/31/98 |
42 |
10.44 |
1.93 |
0.00 |
10.22 |
| 17:02 Fr 09/25/98 |
41 |
10.43 |
1.98 |
0.00 |
11.56 |
| 17:02 Fr 08/28/98 |
38 |
11.88 |
2.36 |
0.14 |
12.18 |
| 12:02 Fr 07/31/98 |
37 |
11.23 |
2.20 |
0.03 |
9.07 |
| 12:02 Fr 06/19/98 |
32 |
10.76 |
2.16 |
0.01 |
9.48 |
| 17:02 Fr 05/29/98 |
36 |
10.85 |
2.20 |
0.02 |
10.49 |
| 12:02 Fr 04/17/98 |
27 |
11.16 |
2.38 |
0.01 |
10.62 |
| 12:02 Fr 03/13/98 |
27 |
10.95 |
2.35 |
0.01 |
10.37 |
A less well-behaved system is below. Look at the CPU/Sess/Minute column and note the bad day that they had on 9/28. Clearly this requires investigation to see what added CPU processing to the peak half-hour.
| Peak 1/2 Hr. |
Sessions |
Sessions/
Minute |
CPU
% Busy |
CPU/Session
/Minute |
| '08:00 Th 10/15/98' |
1,630 |
54.3 |
56.5% |
1.0% |
| '08:00 Mo 09/28/98' |
1,645 |
54.8 |
77.1% |
1.4% |
| '08:00 Tu 08/25/98' |
1,498 |
49.9 |
48.8% |
1.0% |
| '08:00 Mo 07/06/98' |
1,827 |
60.9 |
48.4% |
0.8% |
| '08:00 We 07/01/98' |
1,544 |
51.5 |
48.1% |
0.9% |
| '08:00 We 05/06/98' |
1,792 |
59.7 |
53.9% |
0.9% |
Some other things to be noted from the table above. The heaviest half-hour was on 7/6. This is also the best ratio of CPU/Sess/Minute. This is not unusual. Either (as sometimes happens) the system is more efficient as it gets busier, or there is some background processing component which becomes proportionally less important as transaction counts increase.
Also note that the CPU/Sess figure is getting higher. As a systems manager, you need to keep an eye on this: Is the application getting less efficient? Is there more background processing taking a larger share of CPU?
Now, by contrast, do the same comparison for a full day (by half-hour). You will see the interference of all the other processing that must be done. The table below has been edited for space.
| |
tps |
cp/tps |
cpu |
dk/tps |
disk |
trans |
| '02:00 Mo 10/26/98' |
0.4 |
38.0 |
13.8 |
92.8 |
33.7 |
654 |
| '06:00 Mo 10/26/98' |
6.3 |
3.3 |
21.0 |
9.0 |
56.3 |
11310 |
| '11:00 Mo 10/26/98' |
10.6 |
2.1 |
21.7 |
3.6 |
38.6 |
19053 |
| '11:30 Mo 10/26/98' |
10.7 |
1.9 |
20.8 |
3.5 |
37.1 |
19301 |
| '12:00 Mo 10/26/98' |
11.4 |
3.8 |
43.5 |
14.8 |
169.3 |
20540 |
| '13:00 Mo 10/26/98' |
10.9 |
2.4 |
25.8 |
9.4 |
102 |
19554 |
| '13:30 Mo 10/26/98' |
10.7 |
4.5 |
48.4 |
60.8 |
648.3 |
19193 |
| '14:00 Mo 10/26/98' |
11.0 |
2.2 |
23.8 |
4.1 |
45.2 |
19838 |
| '17:30 Mo 10/26/98' |
14.6 |
1.8 |
25.9 |
3.1 |
45.5 |
26349 |
| '18:00 Mo 10/26/98' |
13.4 |
3.3 |
44.2 |
13.1 |
175.5 |
24077 |
| '18:30 Mo 10/26/98' |
11.1 |
2.5 |
28.3 |
6.6 |
72.8 |
20002 |
| '21:30 Mo 10/26/98' |
4.3 |
2.9 |
12.7 |
4.9 |
21.3 |
7807 |
| '22:00 Mo 10/26/98' |
3.3 |
8.4 |
28.0 |
29.0 |
96.8 |
6012 |
So what's causing the variance? Background processing. Is there an opportunity here? Tune in next time.
|