Before starting

Concepts

Before starting to use this tool and in order to facilitate the understanding of this User Manual, a simple explanation of some key concepts in the WOCU-Monitoring environment is included below.

Assets in WOCU

All the elements monitored in WOCU-Monitoring can be generically called Assets. This term includes all the elements without distinguishing between their different types (Hosts, Services, Contacts, Host Groups, Business Processes, etc.).

Types of Assets in WOCU-Monitoring.

Term

Icono

Description

Host

../../_images/2_030c_aggregator_realm_assets_hosts-logo_0-60.png

A Host is any element connected to the network (PC, server, firewall, network electronics, etc.) and which is monitored by WOCU-Monitoring to supervise its availability, calculate its status and record its events.

Host Group

A Host Group is a logical grouping of certain Hosts to make it easier to extract statistics and availability data from the group in WOCU-Monitoring.

Services

../../_images/2_030e_aggregator_realm_assets_services-logo_0-60.png

A Service is each of the checks that WOCU-Monitoring performs on a Host, to which it will associate a status after examining each of them. Thanks to these checks, a deeper analysis can be made in terms of availability.

Business Process Hosts (BP Hosts)

../../_images/2_030d_aggregator_realm_assets_bphost-logo_0-60.png

A Business Process Hosts is a logical definition created by the user in WOCU-Monitoring, which groups one or more logical and/or physical Hosts, i.e. other Business Process Hosts and/or physical Hosts.

Once defined, this definition will have the same entity as the rest of Devices monitored in WOCU-Monitoring appearing in your Asset inventories. Likewise, the system will monitor this asset logical, to determine its state, through rules to apply to the same (Reglas de Negocio (Business Rule)).

Business Process Services (BP Services).

../../_images/2_030f_aggregator_realm_assets_bpservices-logo_0-60.png

A Business Process Service is a logical definition created by the user in WOCU-Monitoring, which groups one or more Services defined in logical and/or physical Services.

Realms in WOCU

Realms are completely independent monitoring systems that can be organised and managed thanks to WOCU-Monitoring. Each realm itself will have its own monitored assets, on which the tool will maintain isolated inventories, alarms, event logs, etc.

Types of Realms in WOCU-Monitoring.

Typology

Icono

Description

Standard Realm

../../_images/realm-standard-icon.jpg

These types of Realms are made up of Assets manually configured by the WOCU Administrator. The elements are manually selected and become part of one of these monitoring systems. The absence of a specific type icon informs the user that this is a standard Realm.

Hostgroups

../../_images/realm-HG-icon.jpg

A quick and easy way to create a Realm and provide it with Assets is to associate it to one or more Host Groups. On this occasion, the WOCU Administrator will include in the created Realm the elements that already constitute one or more Host Groups (Hostgroups), which are logical sets of elements that group together different Assets.

Multirealm

../../_images/multirealm-icon.jpg

Realm resulting from including other existing Realms within it, i.e. an aggregation of Realms is created. In this instance, the WOCU Administrator instead of individual Assets or groups of Assets can directly include entire monitoring systems by aggregating their members to create a larger Realm.

Statements of Assets in WOCU

Each of the assets monitored in WOCU-Monitoring always has a status associated with it that defines its situation, from an availability point of view, over time. In other words, the status of an asset is dynamic. WOCU-Monitoring is responsible, through its various checks, for evaluating the present situation of each asset in order to calculate its status and update it if there is a change.

It is important to understand this concept well as statuses are recurrently present in WOCU-Monitoring listings, inventories, graphs and reports.

In WOCU-Monitoring there are four basic states:

State types in WOCU-Monitoring.

Typology

Icono

Description

Down/Critical

../../_images/2_001_aggregator_status-down_0-60.png

The Down/Critical status is the one that indicates a total loss of availability of the asset, therefore its severity is maximal. The term Down is used to describe the status of Hosts that have completely lost availability, while Critical is the equivalent term used for Services when one of these becomes unavailable or when some parameter of the monitored Service exceeds a certain threshold (the value of this threshold is set in the configuration of the host when it is discharged).

In WOCU-Monitoring, the red colour and/or the down-pointing arrow icon are associated with the Down/Critical Status.

Warning

../../_images/2_002_aggregator_status-warning_0-60.png

Warning status applies only to Services. It indicates some kind of malfunction, although the asset is still in service, so its severity is lower than Critical status.

In WOCU-Monitoring, the yellow colour and/or the exclamation icon are associated with the Warning status.

Unreacheable/Unknown

../../_images/2_003_aggregator_status-unknown_0-60.png

The status Unreacheable/Unknown reports a loss of contact with the asset. It is not known whether the asset is still in service or not, so this status indicates some uncertainty about the actual situation that deserves consideration. In that sense the severity is a priori lower than the previous ones. The term Unreacheable is used to describe the status of Hosts with which WOCU-Monitoring has completely lost connectivity, although a total loss of service by the Host has not been verified. Unknow is the equivalent term used for Services.

In WOCU-Monitoring, the blue colour and/or the question mark icon are associated with the Unreacheable/Unknown status.

Up/OK

../../_images/2_004_aggregator_status-ok_0-60.png

The Up/OK state is the normal operational state of an asset when it is in service. It is therefore the desirable state for all monitored elements and any change from this state to another is considered an anomaly that will need to be addressed. The term Up is used with Hosts to describe their normal operating state, while OK is the term used for Services to describe this normal operation.

In WOCU-Monitoring, the green colour and/or the tick icon are associated with the Up/OK status.

Alarms and Events at WOCU

Two other concepts that are closely related to the monitoring and operation of networks and systems are the concepts of alarms and events. It is important to understand their meaning and difference when using WOCU-Monitoring in your daily tasks.

Term

Description

Problems

Alarms are alert messages generated by WOCU-MOnitoring indicating an abnormal status for a Host or Service. The tool performs different monitoring tasks for each of the monitored elements in order to determine at all times the status for each of them. A change from the Up/OK state to any other state will trigger an Alarm in WOCU-Monitoring. In addition, WOCU-Monitoring will provide additional information to the Status to indicate the nature of the detected anomaly.

Events

Events are messages about the functioning and operation of the different monitored Hosts that are collected and analysed by WOCU-Monitoring. Events are used by WOCU-Monitoring as an additional element in the calculation of the status of the assets and also offer the opportunity to analyse the causes or circumstances of an incident, since a good technical analysis must always be supported by the log messages of the Hosts involved.

Reglas de Negocio (Business Rule)

La monitorización de procesos de negocio proporciona una visión detallada del estado de un servicio, fundamentada en la representación de las interrelaciones y el estado de los distintos componentes necesarios para su correcto funcionamiento. La creación de reglas es altamente flexible y puede ser tan compleja como abarcar desde la capa de hardware, sistema operativo, red y aplicaciones, hasta las experiencias de usuario. También puede ser simplificada, basándose simplemente en los estados de un dispositivo.

El sistema es capaz de monitorizar Procesos de Negocio de Dispositivos y Servicios a través de Business Rules (en adelante BR). Se trata de una función lógica que se aplica al estado de un conjunto de activos dando como resultado un estado de alarma. Estas reglas son una parte integral dentro de la gestión y monitorización de estos procesos.

../../_images/2_041b_aggregator_realm_assets_modal-host-BP-trace-ok_0-43.jpg

Los Procesos de Negocio que estén definidos por reglas integradas únicamente por Dispositivos, tendrán el status de Proceso de Negocio de Dispositivos (BP Hosts) y aparecerán identificados con el siguiente icono:

../../_images/2_030d_aggregator_realm_assets_bphost-logo_0-60.png

Y por ende, podrán generar alarmas de tipo:

Typology

Icono

Description

Down

../../_images/2_001_aggregator_status-down_0-60.png

Indica una pérdida total de la disponibilidad del Dispositivo, por tanto su gravedad es máxima.

Unreacheable

../../_images/2_003_aggregator_status-unknown_0-60.png

Reports a loss of contact with the Device. It is not known if theactive is still providing service or not, so this status indicates certainuncertainty about the real situation that deserves to be taken into accountconsideration. In that sense gravity is, a priori, lower than the previous.

Up

../../_images/2_004_aggregator_status-ok_0-60.png

The UP state is the normal operating state of an asset when it is providing its service without incident. Therefore, it is the desirable statefor all monitored elements and any changes to this Stateto another it is considered an anomaly that will have to be attended to.

Likewise, Business Processes that are defined by rulesintegrated solely by Services will have the status of ProcessService Business (BP Services) and will be identified with the next icon:

../../_images/2_030f_aggregator_realm_assets_bpservices-logo_0-60.png

Y por ende, podrán generar alarmas de tipo:

Typology

Icono

Description

Critical

../../_images/2_001_aggregator_status-down_0-60.png

Indicates a total loss of the availability of the Service, therefore its gravity is maximum. Used when one of these becomes unavailableor when any parameter of the monitored Service exceeds a certainthreshold (the value of this threshold is set in the configuration of the device when it is registered).

Unknown

../../_images/2_003_aggregator_status-unknown_0-60.png

Reports a loss of contact with the asset. It is not known ifService is operational or not, so this status indicates certainuncertainty about the real situation that deserves to be taken into accountconsideration. In that sense gravity is, a priori, lower than the previous.

OK

../../_images/2_004_aggregator_status-ok_0-60.png

The OK state is the normal operating state of an asset when it is providing their service. Therefore, it is the desirable state for allitems monitored and any change from this State to another will be considers an anomaly that will have to be addressed.

Warning

../../_images/2_002_aggregator_status-warning_0-60.png

Warning status applies only to Services. It indicates some kind of malfunction, although the asset is still in service, so its severity is lower than Critical status.

Before creating a BR, we must have the :doc_url:`Packs deployedMonitoring <docs/packs/packs.html#pack-categories>`relevantto monitor the Devices or Services that make up the environment.With this data and applying the logical operators abovecommented, the BR are built.

Next, the condition of each component element of the BR. Then, taking into consideration these individual states and the logical operators that link and relate the elements of theBusiness Process, the system will determine its availability statusown.

Attention

The system only considers states of type HARD to determine the overall state of the asset or node. Therefore, any internal changesSOFT type will be rejected and will not affect the calculation of statesmonitoring.

Remember

SOFT: is assigned when the status of the service obtained is not definitive, as it may or may not be reverted on the next attemptcheck. In case of exceeding the predefined number of attempts getting negative states, the error severity level will be raised to HARD type. The objective is to avoid false alarms due to problemstransient.

HARD: is assigned when the service status obtained is wrongcontinuously, without being corrected. That is, when the service returns a negative status on the first attempt and also on subsequent attemptssubsequent checks exceeding the predefined number of attempts. This new situation is now notified to the contact user.

Construction and examples of Business Rules

The definition of a BR will always begin with the command:

bp_rule!
BR simple

If you want to create a Business Process made up of a single element, for example, the Device with name HostOne, the BR will be the following:

bp_rule!(HostOne)
../../_images/2_098_aggregator_BP_rule_example_0-59.png
BR con Servicios

To include a Service in the rule, you must enter the name of the Device and the name of the Service separated by a comma (,).

Continuing with the previous example, on this occasion it is necessary to create a Business Process that will be composed of a single element, the Device with name HostOne. The status of the CPU Service”. The Business Rule will be as follows

bp_rule!(HostOne,cpu|HostOne, partition)
../../_images/2_098a_aggregator_BP_rule_example_0-59.png
BR with logical operators

By using logical expressions (AND, OR, NOT) it is establishedalso a relationship between the different component elements, which makes it easier for WOCU-Monitoring to calculate the status of the asset through the analysis and evaluation of the operational states of its elementsmembers (physical and/or logical assets).

The names or identifiers of the assets will then be entered whose state the system will examine when executing the rule. Therefore, the syntax that must be respected is the following:

bp_rule!(Host_1 op Host_2 op Host_3)

Where “op” is the binary operator that can take the following values:

Operator

Value

&

AND

|

OR

!

NOT

Below are other more complex examples that employlogical operators:

Operador OR


In professional environments it is common to find scenarios where there are elements to ensure the availability of the services that they provide the same. Let’s imagine a web page hosted on two serversredundant web, one as active server and the other as the serverbackup. A Business Process intended to ensure the availability of the web page, composed of two web servers, WebServerActive and WebServerBackup, check for at least one of theweb servers provide service, will have a rule like the following:

bp_rule!(WebServerActive|WebServerBackup)

In this case, as can be seen, the logical operator has been used**OR** (or) represented by the character “|” and giving a value positive if at least one of the elements of the logical relationship is positive.

../../_images/2_098b_BP_rule_example_OR_0-60.png

Operador AND


Even more complexity can be added to BR with the use of otherslogical operators and expressions. Suppose that, in the example web previous, a database hosted on two database servers intervenes,*DBServerActive* and DBServerBackup, so we want the Rule to be Business determines the availability of the website based on theavailability of at least one of the web servers and at least one of the DB servers. The rule would then be:

bp_rule!(WebServerActive|WebServerBackup) & (DBServerActive|DBServerBackup)

As you can see, in this case the logical operator AND has been used (y) represented by the character “&” and which gives a positive value if both elements of the logical relationship are positive.

../../_images/2_098c_BP_rule_example_AND_0-60.png

Operator NOT


Let’s now look at the use of another logical operator. Let’s think about a scenario in there is a router, Router, that provides an Internet connection to a sitevia two independent dedicated lines (ADSL and ISDN) viatwo interfaces. We want to monitor the availability of the connection to through the main line with a Business Rule. Taking into accountthat the interface connected to the ISDN backup line (if_ISDN) onlywill be active when there is a drop in the main ADSL line (if_ADSL),The Rule to create would be:

bp_rule!(Router,ifADSL & !Rourter,ifISDN)
../../_images/2_098d_BP_rule_example_NOT_0-60.png

Operator OF


Next let’s think about the following scenario: to provide the serviceof a website there are three web servers (WebServer1, WebServer2 and WebServer3) and three DB servers (DBServer1, DBServer2, DBServer3). You need to create a rule that determines a correct operation of the web service when at least two of the threeweb servers are working properly and two of the three servers are workingDBDDs work correctly. The rule would be like this:

bp_rule!(2 of: WebServer1 | WebServer2 | WebServer3 ) & (2 of: DBServer1|DBServer2|DBServer3)

Using the of: operator preceded by a number or a percentage, youestablishes a minimum of elements that must meet the condition.

../../_images/2_098e_BP_rule_example_OF_0-60.png