Detailed view of Services associated to a Host
The Service Detail View linked to a Device allows the user to directly access comprehensive and up-to-date information about the selected service. All this information is displayed in five sections:
BP Trace (only in services belonging to a Business Process)
In the top bar, next to the device name, there is a selector listing the set of services associated with the device in question. This functionality facilitates direct access to the detailed view of the chosen service. Selecting a service will result in changes in the behavior of panels and graphics in the interface.
Following the name of each service, the system sets the identifying icon of the Monitoring Pack to which it belongs.
As seen in the following image, this selector will only be visible if a host has more than one associated service. Otherwise, only the name of the service will appear next to the host it depends on.
To access this view, go to the Hosts Inventory where the monitoring services associated with each Host are listed. By clicking on the name of the service of interest, the detailed view will be displayed.
Status
The Status tab provides precise information about the status and configuration of the selected service. The data is displayed through six panels and is collected over the last 24 hours.
Service Status
This first panel indicates the operational status of the service and the monitoring checks carried out on it. It also provides other complementary data:
- ✓ Name of the service
Located above the state circumference.
- ✓ Current operational status of the service
It is indicated inside the central circle. The possible states of a service are explained in detail in Statements of Assets in WOCU.
The rapid alternation of state changes in a service is known as Flapping. It is also represented on the panel by a yellow half-circle with a flashing grey arrow inside it.
- ✓ State active time
Indicates how long the current state has been active. In the image above, the service has been in the ‘UP’ state for one minute and three seconds.
- ✓ Monitoring check
WOCU-Monitoring performs checks to assess the operational status of the services associated with the device. The semi-circular gauge on the panel indicates whether the check option is enabled or disabled (“On active checks” or “Off active checks”). Additionally, the text located to the right of the gauge informs about the next check to be performed. In the previous image, this functionality is activated, and the check will be performed in three minutes and fifty-four seconds (“Next check: 3 m 54 s”).
The option to deactivate service monitoring checks is accessible from the Actions field, located in the Hosts Inventory. To access the available options, simply click on the actions icon, and a dropdown menu will appear showing the Disable Active Checks option.
Related to the check task, this panel provides complementary information about the types of statuses obtained and their level of severity, using the following labels:
Soft: is assigned when the service status obtained is not definitive, as it may or may not be reverted in the next check attempt. In the case of exceeding the predefined number of attempts obtaining negative statuses, the error severity level will be raised to HARD type. The objective is to avoid false alarms due to transient problems.
Hard: is assigned when the status of the service obtained is continuously erroneous, without being corrected. That is, when the service returns a negative status on the first attempt and also on subsequent checks, exceeding the predefined number of attempts. This new situation is notified to the contact user.
- ✓ Check Now
Button for refreshing and updating the data displayed in the panel. Clicking it will force an immediate refresh without waiting for the refresh scheduled in User preferences. This way you get a real-time view of the status of the service.
Last check information
This panel provides information related to the last check performed on the service in question. The data offered are:
✓ Last checked at: indicates how long ago the last check of the service was performed.
✓ Check status: indicates the status resulting from the executed check.
✓ Check attempt: indicates the number of attempts that must be made (obtaining an erroneous result) for the service to pass from SOFT to HARD error level. For example, in the previous image, it is defined that after two failed checks, the error severity level is raised from SOFT to HARD.
✓ Check latency: is the difference between the scheduled check time and the actual execution time, i.e. it indicates the check delay time.
✓ Check duration: time taken by the server to give a check response.
✓ Information box: Plugin Output: gives a literal answer of the result obtained.
✓ Check Command: specifies the complete Check Command that has been executed to determine the status of the Service. This information shall only be visible after enabling the Show full check command option in the User preferences.
Attention
Due to the sensitive data this command may display, the parameter will be disabled by default in the User preferences.
It may happen that the text of the Plugin Output and Check Command elements exceeds the available space. In this case, by clicking on either of the two blocks, a new information box will pop up showing the full text. The basic function of copying the text to the clipboard (📋) for future use is also included. This action does not make any changes.
Service availability (Last 24 hours)
This panel has two graphical elements that provide information related to the level of availability of the service in question.
Pie chart
It represents the percentage of service availability reached in the last 24 hours. The percentage value is indicated inside the graph, together with the Threshold predefined by the user, which is also indicated on the graph with a dashed line.
Depending on the compliance or non-compliance with the minimum availability threshold, a green “✓” will appear in front of the availability percentage if the availability exceeds the set value, or a red “X” if it does not.
In this example, the percentage obtained (62.96%) is below the established threshold (80%).
Time bar
This time bar provides a history of the availability of the service over the last 24 hours, showing the exact moment or period of time when it was not operational.
The complementary legend informs about the period of time (in days, hours and seconds) of availability (green colour) and non-availability (red colour) of the service.
There is also the possibility that the system has not collected enough data to determine the initial state of a service. In this case, the user can assume and assign a starting status. This new status and its duration is also recorded and displayed in the time bar and in the legend.
Note
The user can configure the initial state of the service in the User preferences, specifically in the filter Status Initial SLA Service.
Monitoring Events (Last 24 hours)
This panel reports, through a bar chart, the four most frequent Events in the last 24 hours for the selected service.
The legend indicates the type of event to which each bar refers, and each bar indicates the number of times an event has occurred. Placing the cursor over a bar displays a pop-up message with the type of event, plus its total value of occurrences.
The different types of Events are explained in detail in the Event field of the Monitoring Event List Fields section.
Events
The Events of a particular service are stored and presented in this tab. All monitoring event messages produced in the last 24 hours related to the service in question are displayed.
This listing provides the user with information similar to that offered in the Events section for a specific realm. Therefore, the fields available in this table are explained in detail in the Monitoring Event List Fields section.
Metrics
The Metrics tab provides precise information about the evolution and operation of the selected Service, through performance values collected by WOCU-Monitoring and stored in metrics during a check. They are displayed through graphs that show data collected over specific time periods, while also indicating defined thresholds for Warning and Critical states.
The selectors available on the graph allow:
Select a date range for data display
By default it is set to 1 day, but there are other time criteria to choose from, such as: 4 hours, 1 week, 1 month, 1 year and Custom Range.
In case you want to set a specific period of time there is the option Custom Range. To configure the time frame it is necessary to set a start date and an end date. By clicking on one of the days, this date will be set as a selection, being marked with a blue background. In addition to the day, a specific time can be set for that day. To do so, use the drop-down menus in the hour, minute and second boxes to set the desired time.
Attention
It is of course not possible to choose start and end dates after the current date, nor to set an end date before the start date.
Select the metric whose information you want to visualise
Unlike the fixed metrics of a Host (RTA and PL), in the case of Services these will vary depending on the type. See in the following example how for the HTTP Service, the available metrics to select are: Time and Size.
Note
Thanks to the legend, it’s easy to identify the values of both the metric, series, or state change thresholds: Warning (yellow color) and Critical (red color). Additionally, alongside the legend, maximum (max
), average (avg
), and latest (last
) registered values are provided.
On the other hand, when there were no metric data available for specific points in time, the line will display a dashed line that facilitates the identification of those null spaces.
Monitor all the series of a metric on a graph
The captured series of the same metric are represented in a single graph. This facilitates monitoring resources on a broader scale, as it allows comparing and analyzing all series related to a metric in one place.
Dynamic thresholds
This parameter sets the value (in percentage) of the minimum threshold of service level that is considered adequate or acceptable.
The metric graphs of Services accept Dynamic thresholds, which means that this value is no longer fixed and can be dynamically adjusted within a predefined range. This feature provides greater flexibility and adaptability in managing SLA levels.
In the following graph, you can observe how the Warning and Critical states are now configured with this new function, represented according to the defined ranges for each of them.
Actions on the graph
Below are the detailed actions applicable to the graph:
Interval selection of a metric in the graph itself
In addition to being able to select (pre-set) time ranges for particular metrics, it is possible to manually select intervals and sub-intervals in the data series and to display a graph of these intervals and sub-intervals.
The selection will be made directly on the graph itself using the mouse. To do this, it is necessary to place the mouse in the area of the generated graph. By clicking and dragging with the left mouse button, a sub-interval within the interval we are visualising will be selected.
Once the range has been selected, we can release the left mouse button, which will result in the graph being updated to show only the defined range.
In addition, in the upper date range bar, the time period corresponding to the manual selection applied shall be specified.
To go back to the initial graph (according to the previously chosen range), the filter has to be re-applied and the displayed data will be restored.
Note
This action can be repeated indefinitely, each time obtaining a smaller selected range than the previous one.
Exporting graphs to PDF
Through the Export button, downloading a report in PDF format containing all the metric graphs of the respective service is facilitated. If the number of graphs is large, they will span across multiple pages.
After clicking on the export button, a file will automatically be initiated to the hard disk for further processing or later use.
BP Trace
The BP Trace tab, visible only in services belonging to a Business Processes, shows the user a tree traced from the previously defined BP Rule. Thanks to the representation with nodes and logical relationships, in addition to knowing the state of the Business Process, the user will be able to analyse and locate the root cause of an anomalous monitoring state.
Remember
Having established the Business Rule, WOCU-Monitoring will first evaluate the status of each Business Process element. Then, taking into consideration these individual states and the logical operators that link and relate the elements of the Business Process, the system will calculate and determine a state for the Business Process.
On the other hand, the system only considers states of type HARD to determine the overall state of the node. Therefore, any internal changes of type SOFT will be rejected and will not affect the monitoring state calculation.
Important
Go to section BP Trace and get to know this view in detail.
Performance
The Performance view collects the monitoring metrics generated after the checks that the service launches on the Appliance on which it depends. Each metric records performance and capacity values, allowing a deeper analysis of the service, and consequently of the Appliance in terms of availability.
Remember
A Pack monitors Host via services, which in turn generate monitoring metrics. The metrics acquire as metric thresholds and ranges, the values of a pack’s configuration macros.
See the following example: The LINUX-SNMP
pack, generates the Disk
service which after each check gets the monitoring metrics: /run_used_pct, /_used, /boot_used, etc.
The set of values is presented in a tabular format, where an entry is included for each metric, which facilitates their individualised study.
The data are distributed in the following columns, classified in three blocks:
1. Monitoring metrics data:
Name: name of the monitoring metric.
Value: last performance value recorded in that metric.
2. Measurement thresholds:
Minimum: defined minimum threshold that the metric can reach. It is identified by a green vertical line positioned (normally) at the beginning of the bar. If the metric value recorded is below the threshold, the minimum bar will be moved to the far right.
Warning: defined threshold above which the metric will reach a warning or alarm state. It is identified by a vertical orange line.
Critical: defined threshold above which the metric will reach a critical state. It is identified by a vertical red line.
Maximum: maximum defined threshold that the metric can reach. It is identified by a vertical black line positioned (normally) at the end of the bar.
Attention
Metric thresholds are defined or modified in the configuration of the Monitoring Pack in a generic way, by configuring specific macros. The system will use the default values in the pack when they have not been added by the user.
3. Graphing the metric:
Graph: the current value of the metric (column Value) is represented graphically by a rectangular bar of length proportional to the recorded value. As can be seen, there will be a graph for each of the metrics of the monitoring service.
Its behaviour is very simple: when the bar advances and exceeds the established threshold, the colour will change according to the margin reached.
Thresholds act as indicators and are represented by vertical lines along the bar. Each threshold has an identifying colour associated with it.
The colour distinction is as follows:
Light grey colour: indicates the absence of metric data (value = 0).
Dark grey colour: indicates that the value is lower than the set minimum.
Blue colour: indicates the existence of a recorded value but no alert thresholds. This situation generates some uncertainty about the actual status of the metric.
Green colour: indicates that the value is above the minimum value and below the warning thresholds.
Orange colour: indicates that the value is above the WARNING threshold but does not exceed the upper thresholds.
Red colour: indicates that the value is above the CRITICAL threshold or higher.