"We show you how to process the future".
 
SYSTEMS MANAGER CORNER
 


» Security Corner

 

Systems Manager Corner

Disk Cache and Memory

Article 7/95

Last time we talked about how to find how much memory was allocated to the disk cache manager. Many sites find that their disk cache has been artificially restricted by someone placing a set_tuning_parameters command in their module_start_up.cm. Before you remove this restriction, you need to confirm that there is enough memory available, and that you will not adversely affect the amount of memory set aside for system, programs, and data.

Page faults are not a good indicator of whether memory is available. Most systems run with memory available, and also with significant page faulting. Page faults are caused by three events:

1) Loading a new program into memory. VOS loads an initial subset of the program into the process address space, and then begins the program's execution. This initial subset is minimal, and as it begins to execute the program will cause page faults as it references additional code segments that are not yet in memory.

2) Allocation of data space. As a program runs, it may call the VOS allocate function to acquire memory for data structures. The operation of acquiring and accessing this data can cause page faults.

3) Contention for memory, or thrashing. If there are too many processes demanding memory, the system will "thrash." This means that memory pages, in use by a process, must be flushed out to disk in order for a competing process to run. This particular type of page faulting is known as a "force-evict" page fault, or "eviction."

Evictions are bad. They cause significant overhead, and result in significant degradation of the computer system. There is only one way to detect evictions, and that is through the analyze_system page_meters command.

Last month I gave you the macro mon_disks.cm. If you have been running this, you have been gathering data on the state of your cache, disks, and memory. The numbers that we want to look at are in the page_meters report. Examine the "Free Pages:" and "Force evict:" lines (shown in bold face below).

Total metering time: 356:55:53 Total pages: 10237 Wired pages: 542 Temp-wired pages: 3113 Free pages: 682 Kernel in-memory pages: 5903 Paged in-use pages: 5900 Total ATB/msec Page faults: 344024 3735 Kernel page faults: 1313 978638 Average page fault time: 3.59 msec Total ATB/msec New pages: 212913 6035 Reads: 483279 2658 Disk: 115769 11099 Null: 228589 5621 No I/O: 138921 9249 Writes: 1298 989948 PC read/write waits: 0 Posts: 117067 10976 Posts queued: 1466 876502 Max post queue depth: 14 Memory Management Total ATB/msec Get memory: 437211 2938 Free taken: 437211 2938 Force evict: 0 Evict free taken: 0 Evictions: 0 Laps: 0 ---- etc. ----

Free pages is an indication of how much slack we have to work with. The first thing to say is that a value of zero showing in this column doesn't necessarily mean your module is in trouble. I know systems that run consistently and run well with zero free pages. Force evict page faults will tell you if you have a problem.

If free pages is non-zero, its value will be an indication of how much memory is available for use. Note that this number is neither a high- or low-water mark, but rather a snapshot. A lot can happen in a 30-minute interval.

Force evict will tell us if we have run out of memory at any time during the interval. If this value is zero, then we are fine. If it is non-zero, then we know that we are occasionally out of memory and are thrashing. If this is the case, we should be very careful about doing anything that would increase memory usage.

Jon's rule of thumb: Force-evict page faults are okay up to a point. If the ATB (average time between) value for the force-evict page faults is 2000 (one eviction every 2 seconds) or larger, then the module is probably okay. Anything smaller than that is a cause for concern. If the ATB number is consistently below 500 (2 evictions per second) your module doesn't have enough memory.

If your disk cache is "under pressure" (see last month's System Manager's Corner), and if you have free memory and a zero or very low eviction rate, and the max_buffers allowed to the cache manger is artificially restricted, you can increase the max_buffers parameter and give more memory to the cache manager. I recommend that, unless the "free pages" value is large and you have zero evictions, that your increase the max_buffers parameters in steps of 500 to 1,000 pages, checking each time that you don't have an increase in the eviction rate.

More on Multi-Member Logical Disks

One of the ways to help a disk that is too busy is to add a member to the logical disk. This way, the workload is spread across another set of physical drives, and, hopefully, the drives all participate equally handling the workload. If your site chooses to do this, however, you need to be aware of the following issue.

Two of my clients discovered that their logical disks weren't balanced properly, with the result that only one member of the logical disk set was handling all the workload. This resulted in a slowdown of the online application. In both cases the cause was the same.

VOS allocates files equally across all members of a logical disk. To state this differently, as a file grows the blocks that are allocated for that file are spread equally across all members of a logical disk. This occurs unless a particular member runs out of space. In that case, the blocks that are allocated are placed on the members that still have space remaining. A very busy file could end up having all of its active blocks reside on a single member, if the other members are out of space.

When you add a member to an existing logical disk, offload, empty, and reload the entire logical volume. Simply adding a disk to a set that is already full will merely move the new work to the new disk, and the other disks will not share equally in the workload unless the files are distributed equally across the members.

Both clients had situations where they had unequally-sized members of a logical disk. One set consisted of 1.5, 1.5 and 3.2 GB disks. They started having problems when the total space allocated exceeded 4.5 GB, filling up the two smaller members. This forced all new file allocations to the 3.2 GB member, with the resultant slowdown.

The solution is to either delete some files so that space becomes available on all the disks (the total allocated drops below 4.5 GB), or to make all members of a logical disk the same size.

 
©Copyright 2009
Company | Ban Bottlenecks | Consulting | Software | Papers | Home | Sitemap