Browse
Tools
Categories

SLA - Downtime, Uptime Explained

Reference Number: AA-00787 Views: 942 Last Updated: 08-30-2023 03:24 PM 0 Rating/ Voters

This document explains the meaning of Downtime, Uptime, and how the values are computed for Service Level Agreements.

The following Work Hours Calendar, Service Level Agreement, and Time Span shown below are used in examples.

Work Hours Calendar 

The Work Hours Calendar below specifies that the Service Level Agreement is in effect Monday through Friday between 8 AM and 5 PM Pacific Time.  Incidents that match those specified in the Service Level Agreement that occur during the Work Hours Calendar period count toward Downtime hours. This calendar is used in the examples below.

  
Figure 1. Work Hours Calendar

Be sure to always specify the time zone for your Work Hours Calendars. If the Time Zone is not specified, then UTC time is implied. 


Service Level Agreement

The Service Level Agreement (SLA) highlighted below is used in the following examples. An SLA is in effect between the Starts On and Ends On dates. In this case, "Critical" alerts are covered by the SLA. If a critical alert occurs during the Work Hours Calendar, then it counts toward the Downtime hours. Critical alerts that occur outside the Work Hours Calendar do not count toward Downtime. The hpslacalendar shown in Figure 1 specifies that if a critical alert occurs between 8 AM-5 PM Monday through Friday, then it will effect the Downtime hours.


Figure 2. Service Level Agreement (SLA)

Time Span 

The Time span is used to show data for the period specified. It can be configured to show a custom time as in the example below.


Figure 3. Time span

In Figure 2, the Time span has been set to a custom period between January 6, 2015 8:00 AM and January 9, 2015 5:00 PM.  The total number of hours between these two dates/times is 81 hours. This is the period for which data is examined in the examples below.

Downtime

Downtime indicates the number of hours that a device is in violation of the Service Level Agreement during the Work Hours Calendar over the specified Time span.  Using Figures 1, 2, and 3, if a critical alert occurred on a device on January 7, 2015 at 10 AM and is resolved the same day at 10 PM, the following can be concluded:

  1. The alert lasted for 12 hours
  2. The alert started during the period specified in the Work Hours Calendar
  3. The alert ended after the period specified in the Work Hours Calendar
  4. The alert existed for 7 hours during the period specified in the Work Hours Calendar so this time counts toward Downtime
  5. The alert existed for 5 hours after the period specified in the Work Hours Calendar so this time does not count toward Downtime
  6. The Downtime for this alert is 7 hours

It is possible that more than one incident matching the Service Level Agreement overlap. For example, suppose a critical alert occurs on a device at 9 AM and is not resolved until 4 PM. A second critical alert occurs on the same device at 1 PM and is resolved at 6 PM.  The total time to resolve the two incidents are 6 hours and 5 hours respectively.  However, the Work Hours Calendar in Figure 1 specifies that the SLA is in effect between 8 AM and 5 PM.  Since the first incident started at 9 AM and the second incident was resolved at 6 PM, only 8 hours are counted toward Downtime (9 AM to 5 PM).

When does an incident count toward Downtime?

All criteria below must be true for an incident to count toward Downtime

  • The incident matches the Service Level Agreement conditions
  • The incident is active between the Service Level Agreement Starts On and Ends On date
  • The incident is active during the Work Hours Calendar
  • After the date that the device was discovered by SiteAudit

In-Time Incidents

Each incident of an alert that is active during the Work Hours Calendar and matches the criteria specified in the Service Level Agreement is counted toward In-Time Incidents.  In-Time Incidents count toward Downtime.

Out-Time Incidents

Out-Time Incidents include all of the In-Time Incidents plus incidents that match the Service Level Agreement criteria but are active outside the Work Hours Calendar period. Out-Time incidents are always equal or greater than In-Time Incidents.

Example

The following is an example of a device incident shown in the Incident History view. The goal of this example is to show how Uptime and Downtime are calculated using the Work Hours Calendar, SLA, and Time span shown in Figures 1, 2, and 3.


Figure 4. Incident History showing an incident that matches SLA conditions

The Incident History Time span is configured to show 4 days in which the SLA is in effect. The total time between 1/6/2015 8:00 AM and 1/9/2015 5:00 PM is 81 hours. The Work Hours Calendar over this period is in effect between 8 AM and 5 PM each day (9 hours) for a total of 36 hours during this period.

The Response Time shows the number of hours the incident has been active during the Work Hours Calendar. In this case, the incident was active during the entire 36 hours so all of this time is shown in Response Time. Once an incident is resolved, the Response Time value no longer increments.

Note that the device was Discovered on 1/5/2015 and the Duration is 41.16:53:30, which is 41 days, 16 hours, 53 minutes, and 30 seconds. This represents how long the incident has been active on this device. However, the device was not discovered until 1/5/2015 so no time prior to this date is included in the Response Time.

Thus far, we know that the period specified in the Time span is 81 hours and the Response Time for this incident is 36 hours.

Figure 5 shows the same device in the Service Level Agreement view. Note that the Downtime is equivalent to the Response Time shown for the device in Incident History, 36.00 hours. If there were multiple incidents for a device, the Response Time would be accumulated for each incident and these times would be aggregated for Downtime.


Figure 5. Service Level Agreement showing Uptime and Downtime

The Uptime indicates how long a device was available during Work Hour Calendar. In this example, the time span is set for 5 day period between July 15 and July 19. The Work Hour Calendar for the service level agreement is in effect between 8 AM and 5 PM Monday thru Friday. This is 9 hours each day for the 5 day period specified in the time span for a total of 45 hours. During this interval, the device was in violation of the service level agreement for 27 hours. Therefore, the Uptime is calculated as follows:

((45 - 27) / 45) * 100 = 40%

Note that there are 3 In-Time Incidents that have occurred during this period. This means that three separate incidents occurred that violated the service level agreement rules. The downtime for the three incidents combined is 27 hours.

Uptime

Uptime is shown in the Service Level Agreement view and represents the percent of time that a device is not in violation of the Service Level Agreement. Downtime hours effects the value of Uptime

The formula compute Uptime value is:

((x – Downtime) / x) * 100     where x = the number of hours specified in the Work Hours Calendar.

Legacy Uptime

Legacy Uptime column introduced in version 6.2 is a percentage value that is calculated similarly to the Uptime value. However, instead its value is based on the time span instead of the Work Hours Calendar. 

There is a total of 120 hours specified in the time span shown in Figure 5. The Legacy Uptime value is calculated as follows:

((120 - 27) / 120) * 100 = 77.5%

((x – Downtime) / x) * 100     where x = the number of hours specified in the Time Span.