"We show you how to process the future".
 
SYSTEMS MANAGER CORNER
 


» Security Corner

 

Systems Manager Corner

The System Manager's Column

Article 1/95

This is the first in a series of columns that will attempt to deal with issues facing the manager of a high-availability computing complex. We'll discuss topics such as performance, availability, disk management, communications management, and applications design.

I'm hoping that the column will start an exchange of information. I'd be delighted to try to respond to any questions you may have. I've published my fax and email address at the bottom of the column, so please send me your suggestions, questions, and comments.

Those of you who know me know that I prefer to work on the larger issues relating to the successful operation of a high-availability computing complex. Spending time on the nanosecond issues is a requirement during systems development. But I've found that the successful efforts of an experienced development team can be totally negated by the improper operation of the system. Operations and the outside world can destroy the responsiveness and availability of a computer system. In this column we'll suggest ways to try to bring those forces under control, or at least help you find out what's going on.

In the online world, we must always design for the worst case. Many of our systems are designed to handle that daily 10-minute peak when the market opens, or when the West Coast customers login for their morning cash position report. If our systems are not designed for that peak, they should be.

The VOS Scheduler, or, Who's Really On First?

The priority scheduler within VOS is configurable. What you may not realize is that the scheduler as delivered by Stratus is configured so that there is not a lot of prioritizing going on. Depending on the number of processors in your module, and how busy they are, you may or may not have a problem. If you have lots of processors in your module, and they never go above 70% busy, you probably don't have a problem. If you have a single processor, and it's pretty busy, pay close attention...

As you know, the priorities in VOS go from 9 to 0, with 0 being the lowest. The scheduler maps these priorities into "quantums" (the actual priority). Quantums go from 0 to 255, with 0 being the highest(!), just the opposite of the VOS priorities. There are two sets of maps, one for the login ("interactive") processes and sub-processes, and one for the background ("batch") or started processes. The use of "batch" here is unnecessarily confusing. All started processes use the "batch" priority mapping, including the VOS system processes and communications processes.

The table below shows the actual priorities for the batch scheduler:

Batch Priority: 9 8 7 6 5 4 3 2 1 0 Actual Quantums: 2 4 6 8 10 10 10 12 12 14

Note that there is no real difference between priorities 5-4-3, and again between 2 & 1.

This table shows the initial actual priorities for the interactive scheduler:

Login Priority: 9 8 7 6 5 4 3 2 1 0 Actual Quantums: 2 2 2 2 2 2 2 2 4 4

What this means is, at least for the initial time slice, login priorities 9 through 2 are all equal to batch priority 9!! At a typical site, that means that a login user at any priority between 9 and 2 will be competing equally for CPU time against all the background processes, including VOS system processes and application requesters and servers.

This graph shows the relative priorities of the batch and interactive scheduler as delivered by the VOS default configuration.

[default scheds]

The scheduler is not a preemptive scheduler. This means that if a process becomes ready to execute, and all the processors in the module are busy, that process will have to wait until a time slice expires in a busy processor. Again, the default scheduler has time slices up to 2 seconds long. If there are five login users on a single-processor module, the background processes could end up waiting a long time before they get service. The result: Possible communication protocol problems and uneven system response time.

Note: Scheduler changes have been made for the XA/R and Continuum architecture.   Consult the VOS SRB's for fruther information.

Jon's recommended scheduler:

A caution: Applications may have gotcha's which are hidden by the default VOS scheduler. Be sure to test the new scheduler settings before installing it in production.

1) Set all time slices to the same value. I recommend 0.25 seconds times the number of processors in the module. If you have a three-processor machine, set all the times to 0.75. This scheme gives you an average wait time of 0.25 seconds, if the processors are maxed. Refer to analyze_system sched_meters to see what your system is really doing.

2) Set the priorities to be meaningful and proportional to each other:

VOS Priority: 9 8 7 6 5 4 3 2 1 0 Batch Quantums: 2 4 6 8 10 12 14 16 18 20 Login Quantums: 2 4 6 8 10 12 14 16 18 20 (Initial slice) Login Quantums: 12 14 16 18 20 22 24 26 30 32 (Subsequent slice)

The subsequent slice drops the quantum of login processes who go compute bound to something significantly less than the Batch priority. I don't see a need for more than two quantum steps for each interactive priority.

This graph shows the relative priorities of the batch and interactive scheduler as delivered by a proportional scheduler.

[relative sched]

Jon's suggested process priorities:

The priorities below assume an application structure with a classic requester-server structure. If your application is primarily login, this probably won't apply.

Batch 9: X25, RJE, SNA, and other protocols 8: VOS Overseers 7: VOS Maintenance Servers 6: Application clients (terminal handlers) 5: Application Data Base Servers 0: True batch processes Interactive: 9: Emergency SysAdmin Login 5: Default SysAdmin 4: Operations 3: Development/Test 0: Compiles, backups, etc.

Your site will, of course, be different.

 
©Copyright 2009
Company | Ban Bottlenecks | Consulting | Software | Papers | Home | Sitemap