"We show you how to process the future".
 
BAN BOTTLENECKS
 


» Overview

The Service

Summary

Ban Bottlenecks® is a performance and capacity audit service. For those organizations that don't have the talent, tools, or, most importantly, time to implement a performance/capacity review, the Ban Bottlenecks service will do it with minimal impact on your systems and staff.

Ban Bottlenecks is a thorough "black-box" analysis of your critical systems. It is perfectly secure since it does not require either physical or network access to your systems.

It provides:
Our free Toolkit for connectionless high-resolution data collection with low overhead.
Coverage of all important objects, including processes and communications ports.
Business transaction traffic analysis with service level reporting by service providers, transaction originators, and transaction authorizers.
Automatic daily reports covering the business, CPUs, memory, paging, disks, processes, and communications, suitable for feeding SAS.
Automatic weekly peak and problem analysis reports.
Near-time ad-hoc analysis reports.
Analyst-scored, easily-navigated, superbly-detailed, customized Ban Bottlenecks reports with 24 month historical and forward-looking perspectives.
“Needle in the haystack” multi-system, multi-tier, and/or multi-architecture problem determination.
Web conferences to review the report, discuss our findings, understand your technical plans, and learn your business plans to plan for the future.

The ITIL Model: Continuous, Closed-Loop Process

Ban Bottlenecks follows a process very similar to the IT Infrastructure Library definitions for Capacity Management. It includes:

  • Continuous measurement of the system factors.
  • Capture of the business traffic and service levels received and provided by the system.
  • Correlation and projection of the business and system factors.
  • Frequent review of the information provided with the operations, applications, capacity/performance, and systems teams.
  • Review of the business traffic and service levels with the business representative, and incorporating the projected business plans.
  • Monthly repetition of these steps.

 Architectures Supported

  • HP NonStop (Tandem) running the NSK operating system and OSS.
  • Stratus Continuum and V-series running the proprietary VOS operating system.
  • UNIX variants: AIX, HP-UX, Solaris (SUN), and Red Hat Linux.
  • Windows 2000 and later.

Features

The Ban Bottlenecks® Report

The Ban Bottlenecks report is a customized audit of a client's computer systems. It contains a tremendous amount of information on each system and is designed to satisfy the needs of management, and the Business, Operations, Systems Administration, Capacity Planning, and Application departments within an organization.

The analyst-created scoring and reporting within the report allows the user to quickly find and focus on the issues affecting the systems.

It is HTML-based, delivered via email, FTP, and CD/DVD. It includes the following sections:

Executive Summary

The Executive Summary is a top-level view of all the systems. It highlights problems or issues with the systems, and provides top-level links and explanations. It may contain customized reports and charts documenting system and application data including transaction rates and service levels.

Group Diagnostics

Group diagnostic reports compare and contrast sets of systems. Systems may be grouped by architecture and/or application. The diagnostic reports are excellent at finding the problem areas or devices across large populations of systems, disks, comm. interfaces and application sets.

Emphasis Reporting

Some clients have clearly-defined processing windows such as the "market open," "trading day," or batch processing periods. In cases such as these additional reports and charts can be created for each window.

Individual System or "Cluster" Audits

Each system or cluster of systems will receive an individual comprehensive audit report.

 Main Report

The main report begins with a Report Card where the system is assigned a grade by our analysts against a dozen or so system and application factors. Each factor is graded A-F according to our grading scale:

A - Adequate
B - Basically adequate
C - Cause for concern
D - Deficient - should be fixed
F - Failure - caused a problem or outage

The Report Card includes links (when necessary) to an In-Depth or Incident Report.

Checkup Report

The Checkup Report is an inventory and configuration report of the system.

Diagnostic Report

The Diagnostic Report contains a thorough analysis of the system performance and capacity factors. Each factor is usually analyzed in at least three reports:

  • Worst (high/low) or heaviest five days for the preceding month
  • Worst or heaviest day each month for the last 24 months
  • Average day for the last 24 months

The factors which usually appear in the report are:

  • Application traffic
  • Peak application half-hour
  • CPU
  • Memory
  • Paging rates
  • Paging space
  • Disk I/O
  • Disk space
  • TCP/IP traffic

Application traffic is summarized by:

  • 24 month history of total for the month, average day, and peak day each month
  • Year-to-date growth, month-to-month growth, and 12-month growth
  • 24-month history of the peak half-hour traffic and growth
  • Worst service level days

Peak Reports

The peak reports in conjunction with the In-Depth Reports provide an additional level of historical detail on the operation of the system. The Peak Reports contain a 24-month history of the peak half-hour by month on the following factors:

  • Application traffic
  • CPU
  • Disk usage
  • TCP/IP traffic in and out
  • OS transaction statistics when available
  • File operations when available

Each report includes:

  • 24-month history and month-to-month and 12-month growth analysis
  • 24-month history of the computer profile during the peak
  • 24-month history of the computer profile per TPS or transaction
  • 24-month history of the utilization by processor
  • Disk busy/queuing/cache utilization analysis of current peak
  • TCP/IP throughput analysis of the current peak
  • Legacy/proprietary protocol throughput analysis of the current peak

In-Depth Reports

In-Depth reports show the system in extreme detail during the peaks shown above. The In-Depth Reports are in HTML and CSV format, and show every process, processor, memory, disk, communications interface, etc. during the peak half-hour selected.

Incident Reports

Incident reports discuss problems discovered during the analyst's review of the system. The report discusses the problem or issue, provides additional data through link to reports or charts, and may include a further analysis of the system and processes similar to the In-Depth report. The Incident Report may also include a recommendation on fixing the problem or avoiding it in the future.

Last Month Charts

The Last Month charts are a suite of charts showing daily totals, averages, and distributions for the various system factors for the preceding month.

Historical Charts

The Historical charts are a suite of charts showing daily totals, averages, and distributions for the various system factors for the preceding 18 months.

Detailed Charts

The Detailed charts are a suite of charts showing totals, averages, and distributions for the various system factors for a selected week each month. The Detailed charts typically show the week by half-hour intervals.

The Ban Bottlenecks® Web Conference

System management and capacity planning cannot exist in a vacuum. For this reason we schedule a monthly web conference with our clients to discuss the report and the business. A typical agenda may include:

  • A review of the report, with focus on business growth and any problems or issues which we may have found.
  • A discussion with the client to elicit any problems or issues they may be concerned about.
  • A discussion with the client about plans for the system and application.
  • A discussion with the client about business plans such as marketing promotions, mergers/acquisitions/divestitures, and any other factor which could affect the business traffic through the system.

The Ban Bottlenecks® Process

One-Time or Ongoing

Ban Bottlenecks is available as a one-time process. In order to get a representative view of the system under investigation we recommend that the process include data collection over at least a one-month period. During this time we will typically produce two to three reports (audits) with web conferences to discuss our findings. Most of our clients work with us on an continuing basis, with reports on an arranged schedule. This enables us to collect and report a 24-month history of the system, with comprehensive growth analysis.

We don't:

  • Modify the OS or application.
  • Fix anything. We're strictly an advisory service.

Basic and Full Report Schedules

The Ban Bottlenecks report is available in Basic and Full versions. Basic reports don't include the Peak Report, the Incident/In-Depth Reports, or Historical or Detailed Chart sets. They do not include review by our analyst. While most of our clients request Full reports each month, others elect to receive as few as four Full reports per year. Clients with multiple systems frequently institute a rolling schedule specifying which systems are reported in Full each month. Clients receive at least a Basic report on each system each month.

Pricing

Ban Bottlenecks is a fixed-price service. It is extremely affordable, and in many cases it is less than the cost of competing software-only products. Any cost-justification of Ban Bottlenecks must also include the significant staff time reduction that Ban Bottlenecks provides.

The Ban Bottlenecks service includes a license of the TDI Toolkit® data collection/reporting software at no extra charge.

Ban Bottlenecks pricing is dependent on the number and type of systems, the operating systems used, and the size of the systems. There may be an additional charge for custom Executive Summary reports and application-specific reports.

Data Collection Agent: TDI Toolkit®

The TDI Toolkit is the proprietary agent which is installed on each system under review. It is provided at no additional cost as part of the service. It is a process or service on the box which does its work with minimal impact. It does not require modification of the OS or the application.

"Don't become part of the problem!" is a rule which we strictly observe.

Data Objects Collected

We require the use of the proprietary TDI Toolkit because we collect data on more objects, in greater detail, and at a higher frequency than most other agents. This gives us the ability to quickly discover the cause and impact of transient or intermittent problems or software "features."

Data collection on the objects listed below is architecture-dependent. Collection intervals are designed to provide the most information with the least overhead. The most important objects may be sampled each minute. Others may be sampled at intervals up to once every thirty minutes.

  • System CPU
  • Individual processors/cores/processor threads
  • Physical memory
  • Virtual memory
  • Paging activity and paging space
  • Physical disk activity and space
  • Logical disk activity and space
  • Disk cache size and usage
  • File system activity and space
  • File activity
  • Ethernet traffic by individual physical port (packets, bytes, errors)
  • Legacy communications by individual physical port (BSC, X.25/LAPB, SNA/SDLC, Token Ring, Expand, ServerNet) (packets, bytes, errors)
  • All processes, including transients
  • Threads
  • Process queues (HP NonStop and Stratus VOS)
  • OS statistics such as logins, system data transfer rates, context switches, system/exec calls, processor and disk queuing, forks
  • SQL Server

On-System Reports

The TDI Toolkit includes a daily batch job which creates on-box daily summary and detailed reports. These reports are created in text and/or HTML and CSV formats. These reports are not the Ban Bottlenecks report. They are a summary of what occurred on the box during the previous day. The daily job runs at a low priority usually in the early morning when it will have a minimal impact on the system. It takes from 1 to 10 minutes to do its work. The daily job can be configured to copy the reports to a shared disk for access by support teams.

Application data reporting is provided for custom and industry-standard applications. As a matter of best practice we always attempt to relate the business traffic to the system usage. Whenever possible we also collect and report service levels and success rates. We do not modify the application. We provide report programs which look at the previous day's log.

  • Banking and point of sale (Base24, ON/2, Open/2, Connix)
  • Securities industry (ticker plant and trading systems)
  • Messaging (Omnimessaging, Network Express)
  • Data transfer (Data Express)
  • Webserver traffic
  • Warehouse management

Weekly Extract

Once a week the daily job will run a series of additional reports and then create an extract file (packed or zipped) which must be sent to us via email or FTP. The extract includes:

  • The tdi_checkup report which is an inventory and status of the system
  • An extract of the daily summary CSV reports
  • A set of In-Depth reports showing the system during various peak events. These peak events may include:
    -- Application traffic peak
    -- Ethernet traffic peak
    -- CPU peak
    -- Disk I/O peak
    -- Page fault peak
    -- Page eviction peak
    -- TMF peak
  • An extract of the detailed CSV reports for a selected week, typically the week for the application peak

Ad-hoc reports

The TDI Toolkit is available for ad-hoc reporting by client staff. The utilities may be used to investigate issues, including (for most architectures) issues which have happened in the last couple of minutes.

Unattended operation

The TDI Toolkit is designed for unattended operation. This means:

  • The batch processing schedules itself
  • Files are cleaned up automatically
  • The Toolkit restarts itself after a system reboot

For some architectures we request that the Toolkit be configured as a "persistent" process, and/or be included in normal site monitoring to ensure that it is up. The only attention required is the transmission of the weekly extract.

Benefits

Ban Bottlenecks is intended for those organizations needing to implement a stronger system management program for their most critical systems. Most organizations have departments tasked to do similar functions as we provide through Ban Bottlenecks. However, with a population of hundreds or thousands of systems to manage, these departments find themselves short of time or other resources when challenged to implement a complete proactive business-traffic based program.

Typically, customer-facing systems or high dollar-volume systems such as banking ATM, retail POS, customer loyalty, pharmacy, and brokerage require the attention to detail and forward-looking orientation that Ban Bottlenecks provides. Ban Bottlenecks provides, as a service:

This results in:

Problem identification

Because of what we do and how we do it, we find problems in supposedly mature, stable systems. Our techniques examine every minute of every day, looking for unusual and out-of-pattern use of the various system factors. Since we are capturing and analyzing down to the individual disk, process, and comm interface we will find things like:

  • Intermittent, self-correcting application issues which may cause short disruptions and be hard to find.
  • Operations problems or errors, such as running backups or batch at the wrong time.
  • "Window" issues, such as batch jobs beginning to encroach on the peak online periods.
  • Service level problems. These problems may be:
Internal, caused by conflicts with other applications on the system,
External, caused by poor service from back-ends,
Growth or capacity-related which is starting to hit a bottleneck.
Customer-induced problems, such as broken or looping scripted customer interactions.

Business trend analysis

Whenever possible we capture statistics directly related to the business traffic being processed through the system. We do this without modifying the application. We provide a program which examines "yesterday's" application log. When possible, we extract and analyze:

  • Transaction (as defined by the application) counts and rates
  • Service levels and response times
  • Success or denial rates
  • Transactions and response by transaction originator and/or authorizer
  • Transactions by type or product

Capacity planning

The bottom line is that Ban Bottlenecks provides the client with a forward-looking view of the systems. In practice, we give a 3 to 4 months advance warning if we discover that the system is approaching a bottleneck. We identify the bottleneck and recommend action, and after the action is taken we confirm that the bottleneck is fixed.

Continuous service improvement

The ongoing Ban Bottlenecks report service implements a best practice of discovery, repair, and confirmation of the effect of the repair. In practice, the collaboration between our analysts and the client teams produces systems that run cleanly and predictably, implementing "no surprises" processing.

High availability

Non-reactive capacity planning combined with continuous service improvement results in improved uptime.

Increased staff effectiveness

The Ban Bottlenecks process relieves the client staff from many time-consuming chores. Performance/capacity data collection, presentation, and analysis is at best tedious and at worst so time-consuming that it may not be possible to motivate the team to do the job as effectively as required. Our experience with clients is that the performance and systems teams are happy to have the "dirty work" performed for them, so that they can concentrate on their core business of system management and improvement.

Capital budget planning

Lastly, Ban Bottlenecks provides all the information that a client needs to plan its capital budget. The proactive view that Ban Bottlenecks provides allows the client to anticipate growth and to plan for just-in-time system improvements. The Ban Bottlenecks report enables the client to relate system, device, and communications usage to the business traffic.

Contact us for a sample web conference and report.

 
 
©Copyright 2009
Company | Ban Bottlenecks | Consulting | Software | Papers | Home | Sitemap