Disks and Disk Cache
Article 4/95
This month the System Manager's Corner will deal with disk activity. We will try to help you find the answers to these questions:
- Are your disks too busy?
- Did you assign enough memory for disk cache?
Disks are probably the most important potential bottleneck on the system. Transaction-based systems frequently retrieve data from disks for each transaction, and invariably record each transaction on the disk. If a disk becomes too busy, response times for the disk and subsequently for the entire transaction system can become unacceptably long.
Disks are usually the reason for the "VOS mini-hang" phenomenon, where the system goes away for 10 to 20 seconds, and then comes back. Take a look at Paul Green's article on this subject in the September, 1994 VOS Corner.
Hot Disk List:
Paging Disks: If there is a lot of process start/stop, or program chaining, the disks with your paging partitions and your command libraries will be heavily used. Paging uses the page partition for unshared program memory, and the ".pm" file for shared program memory. To fix, spread your paging partitions and command libraries across multiple disks.
process_dir: Temporary files, and especially sort work files, run on this disk. The process_dir can be moved to a less busy disk. Or use the -work_dir parameter on sort and create_index.
Logs: Disks with application log files, especially with indexes, can make a disk very busy. Over-frequent use of RUNOUT can compound the problem.
When you try to balance disks, there are several important factors to keep in mind.
1) Never allow your programmers to "hard code" a pathname in the program. Always make that path relative to the module's master disk. Always include the file's hierarchy in the pathname. If you do this, at any time you can move the file and use links to point to the correct disk. For example, code: (master_disk)prodcashwirewireswire_950401_123317 This way, to balance the disks, any part of the path "prodcashwirewires" can be a link to a directory anywhere on the disk farm.
2) Writes are more expensive than reads. If you have verify on, writes are three to four times more expensive than a read operation. Remember that a read takes one physical operation. A write requires a physical write to each disk of the pair, and then a verify of each disk of the pair. No, I don't recommend turning verify off.
3) Write (or update) operations to indexed files are very expensive. You may end up doing one logical write to the data file, and then more than N additional writes to the index blocks of N indexes. I once had a client who couldn't understand why his system was slow. I showed him that for every 120-byte record he added to the file, he ended up modifying over 1.2 megabytes of index blocks.
Disk activity and disk cache cannot be investigated separately. Disk activity is very dependent on the cache hit ratio. To state the issue correctly, disk activity is directly proportional to the cache miss ratio. A cache miss usually means a physical I/O to a disk.
The first question we need to ask is: Is there an artificial limit being imposed on the cache_manager? I have seen quite a few systems where the system has been choked by unnecessarily restricting cache memory. To check on your system, login as privileged and type:
display_tuning_parameters
You should see the report below. The line containing the "max buffers" value is what we are interested in. If this number is less than 2,048 someone may have deliberately restricted your cache. Go look in your module_start_up.cm for the set_tuning_parameters command that is restricting cache buffers. Ask around to find out why it is there.
The tuning parameters on %sys#m1 are: max events per process 64 cache_manager, max buffers 8192 cache_manager, min buffers 32 cache_manager, max virtual pages 8192 cache_manager, modified grace time 60 seconds cache_manager, unreferenced grace time 300 seconds cache_manager, referenced grace time 60 seconds cache_manager, free grace time 300 seconds max processes per module 1023 recover_disk priority 1 scheduler short wait timeout 0.000 seconds unused directory timeout 120 seconds max events per task 64 max events per module 0
Restricting the amount of cache was occasionally done in the past as a supposed method of decreasing the probability that modified disk cache blocks would be caught in memory during a crash, thereby causing corrupted indexes. VOS has advanced quite a bit since those early years, and this method should not be used. It has a high negative performance impact, and is not a substitute for runout or transaction protection.
If the timers have been modified from the default tuning_parameters, pay attention. Doing this incorrectly can contribute to the VOS mini-hang phenomenon. I typically don't change them.
Now we need to do a little bit of research and gather some data on your system. The command macro shown below is designed to capture all the relevant statistics that we will need. Once started, it will log cache, page, and disk meters every thirty minutes to a log called "mon_disks.(date)".
Type in the macro "mon_disks.cm".
&begin_parameters INTV minutes:number,required,min(10),=30 &end_parameters priv & &echo command_lines & &label TOP !create_file mon_disks.(date) !set_implicit_locking mon_disks.(date) !attach_default_output mon_disks.(date) -append & !display_tuning_parameters & &set_string DATE (date) & &attach_input !analyze_system & &label AS_TOP & &set TIME (time) &display_line *** dump_cache_info &TIME& dump_cache_info & &set TIME (time) &display_line *** cache_meters -report &TIME& cache_meters & &set TIME (time) &display_line reset cache_meters &TIME& cache_meters -reset & &set TIME (time) &display_line *** page_meters -report &TIME& page_meters -report -reset & &set TIME (time) &display_line *** page_meters -report &TIME& disk_meters -report -reset -brief & !sleep -until (substr (date_time + &INTV& minutes) 1 13)0 & &set_string NDATE (date) &if &NDATE& = &DATE& &then &goto AS_TOP & !quit & !detach_default_output &goto TOP & & ***********************************************
Start it with the command "start_process 'mon_disks 30' -privileged". FYI, this process takes less than 0.4% of a M210-class processor at a 30 minute interval.
I recommend that you run this macro for at least a day, if not continuously. After it has run for a while, we are ready to get the answers to some questions:
First Question: Are the disks too busy, or are they unbalanced?
Answer: Issue the command "display mon_disks.(date) -match m1_d". The results should look like this: ( I fudged the heading for readability.) Each repetition of the same disk in the displayed lines is for a different time interval.
busy con con ret Nrm Rcv da dv fa dm devID drive 5 650 33 0 0 0 0 0 0 0 26/01/01 m1_d01.0.pri 3 1472 49 0 0 0 0 0 0 0 26/02/01 m1_d02.0.pri 3 1446 48 0 0 0 0 0 0 0 27/02/01 m1_d02.0.sec 25 97 24 0 0 0 0 0 0 0 27/01/01 m1_d01.0.sec
Pay attention only to the first column, which is the disk % busy value. Jon's rule of thumb: If either disk of a pair is over 60% busy during peak transaction time, it's too busy. I prefer to keep all the disks that are involved in the transaction processing at or below 30% busy. On the other hand, during batch time a disk can and should go to 100% busy. Yes, it will slow down transaction processing, but hopefully you are running batch at your slack transaction time.
If your disks are too busy, you can:
a) Increase your disk cache, maybe (see below).
b) Move files and directories around to spread the workload to less-busy disks. (Preferred method.)
c) Add members to the logical volume. This costs dollars, but it can be a very effective way to increase the throughput of a logical disk. If and when you do this, offload, empty, and reload the entire logical volume. Just adding a disk to a set that is already full will merely move the new work to the new disk, and the other disks will not share equally in the workload unless the files are distributed equally across the members.
Second Question: Is the cache under pressure, and would there be a benefit if I gave the system more disk cache memory?
Answer: Issue the command "display mon_disks.(date) -match phys:". (Don't forget the colon ":".) The results should look like this: ( I fudged the heading for readability.) Each line displayed is for a different time period.
real mem max cur min max max phys: 8192 997 32 1939 1939 phys: 8192 1011 32 1939 1939 phys: 8192 1011 32 1939 1939 phys: 8192 929 32 1939 1939
How to read this: The first column after the "phys:" shows the set_tuning_parameters limit for physical cache buffers. This is the artificial constraint, if at all. The next column shows what is being used (allocated) by the cache manager at that time sample. The fourth column, "real max", is what the cache manager is limiting itself to allocating based on the memory configuration of the computer and operating system.
If the "cur" column goes to the "real max" value and stays there, your cache is "under pressure." That means that if you gave it more memory, the cache manager would be able to use it. Conversely, if the "cur" column is consistently below "real max", the cache manager would not use more cache even if you allocated it. In the case shown above, cache is not under pressure.
DON'T try to force up the amount of memory allocated to cache, yet. This could have dire consequences for paging and memory available for programs and data. In our next column we'll talk about the memory indicators that will show if you have enough spare memory to give to the cache_manager.
|