Before starting

Concepts

Before starting to use this tool and in order to facilitate the understanding of this User Manual, a simple explanation of some key concepts in the WOCU-Monitoring environment is included below.

Assets in WOCU

All the elements monitored in WOCU-Monitoring can be generically called Assets. This term includes all the elements without distinguishing between their different types (Hosts, Services, Contacts, Host Groups, Business Processes, etc.).

Types of Assets in WOCU-Monitoring.

Term

Icono

Description

Host

../../_images/2_030c_aggregator_realm_assets_hosts-logo_0-36.png

A Host is any element connected to the network (PC, server, firewall, network electronics, etc.) and which is monitored by WOCU-Monitoring to supervise its availability, calculate its status and record its events.

Host Group

A Host Group is a logical grouping of certain Hosts to make it easier to extract statistics and availability data from the group in WOCU-Monitoring.

Services

../../_images/2_030e_aggregator_realm_assets_services-logo_0-36.png

A Service is each of the checks that WOCU-Monitoring performs on a Host, to which it will associate a status after examining each of them. Thanks to these checks, a deeper analysis can be made in terms of availability.

Business Process Hosts (BP Hosts)

../../_images/2_030d_aggregator_realm_assets_bphost-logo_0-36.png

A Business Process Hosts is a logical definition created by the user in WOCU-Monitoring, which groups one or more logical and/or physical Hosts, i.e. other Business Process Hosts and/or physical Hosts.

Once defined, this definition will have the same entity as the rest of Hosts monitored in WOCU-Monitoring appearing in your Asset inventories. Likewise, the system will monitor this asset logical, to determine its state, through rules to apply to the same (Business Rule).

Business Process Services (BP Services).

../../_images/2_030f_aggregator_realm_assets_bpservices-logo_0-36.png

A Business Process Service is a logical definition created by the user in WOCU-Monitoring, which groups one or more Services defined in logical and/or physical Services.

Realms in WOCU

The Realms are completely independent monitoring systemsmanaged in WOCU-Monitoring. The typology of Realms is determined by themanner in which each Realm has been constituted.

Each Realm will have its own monitored assets, over which the tool will maintain isolated inventories, alarms, event logs, etc.

Types of Realms in WOCU-Monitoring.

Typology

Icono

Description

Standard Realm

../../_images/realm-standard-icon.jpg

These types of Realms are made up of Assets manually configured by the WOCU Administrator. The elements are manually selected and become part of one of these monitoring systems. The absence of a specific type icon informs the user that this is a standard Realm.

Hostgroups

../../_images/realm-HG-icon.jpg

A quick and easy way to create a Realm and provide it with Assets is to associate it to one or more Host Groups. On this occasion, the WOCU Administrator will include in the created Realm the elements that already constitute one or more Host Groups (Hostgroups), which are logical sets of elements that group together different Assets.

Multirealm

../../_images/multirealm-icon.jpg

Realm resulting from including other existing Realms within it, i.e. an aggregation of Realms is created. In this instance, the WOCU Administrator instead of individual Assets or groups of Assets can directly include entire monitoring systems by aggregating their members to create a larger Realm.

Statements of Assets in WOCU

Each of the assets monitored in WOCU-Monitoring always has a status associated with it that defines its situation, from an availability point of view, over time. In other words, the status of an asset is dynamic. WOCU-Monitoring is responsible, through its various checks, for evaluating the present situation of each asset in order to calculate its status and update it if there is a change.

It is important to understand this concept well as statuses are recurrently present in WOCU-Monitoring listings, inventories, graphs and reports.

In WOCU-Monitoring there are four basic states:

State types in WOCU-Monitoring.

Typology

Icono

Description

Down/Critical

../../_images/2_001_aggregator_status-down_0-60.png

The Down/Critical status is the one that indicates a total loss of availability of the asset, therefore its severity is maximal. The term Down is used to describe the status of Hosts that have completely lost availability, while Critical is the equivalent term used for Services when one of these becomes unavailable or when some parameter of the monitored Service exceeds a certain threshold (the value of this threshold is set in the configuration of the host when it is discharged).

In WOCU-Monitoring, the red colour and/or the down-pointing arrow icon are associated with the Down/Critical Status.

Warning

../../_images/2_002_aggregator_status-warning_0-60.png

Warning status applies only to Services. It indicates some kind of malfunction, although the asset is still in service, so its severity is lower than Critical status.

In WOCU-Monitoring, the yellow colour and/or the exclamation icon are associated with the Warning status.

Unreacheable/Unknown

../../_images/2_003_aggregator_status-unknown_0-60.png

The status Unreacheable/Unknown reports a loss of contact with the asset. It is not known whether the asset is still in service or not, so this status indicates some uncertainty about the actual situation that deserves consideration. In that sense the severity is a priori lower than the previous ones. The term Unreacheable is used to describe the status of Hosts with which WOCU-Monitoring has completely lost connectivity, although a total loss of service by the Host has not been verified. Unknow is the equivalent term used for Services.

In WOCU-Monitoring, the blue colour and/or the question mark icon are associated with the Unreacheable/Unknown status.

Up/OK

../../_images/2_004_aggregator_status-ok_0-60.png

The Up/OK state is the normal operational state of an asset when it is in service. It is therefore the desirable state for all monitored elements and any change from this state to another is considered an anomaly that will need to be addressed. The term Up is used with Hosts to describe their normal operating state, while OK is the term used for Services to describe this normal operation.

In WOCU-Monitoring, the green colour and/or the tick icon are associated with the Up/OK status.

Alarms and Events at WOCU

Two other concepts that are closely related to the monitoring and operation of networks and systems are the concepts of alarms and events. It is important to understand their meaning and difference when using WOCU-Monitoring in your daily tasks.

Term

Description

Problems

Alarms are alert messages generated by WOCU-MOnitoring indicating an abnormal status for a Host or Service. The tool performs different monitoring tasks for each of the monitored elements in order to determine at all times the status for each of them. A change from the Up/OK state to any other state will trigger an Alarm in WOCU-Monitoring. In addition, WOCU-Monitoring will provide additional information to the Status to indicate the nature of the detected anomaly.

Events

Events are messages about the functioning and operation of the different monitored Hosts that are collected and analysed by WOCU-Monitoring. Events are used by WOCU-Monitoring as an additional element in the calculation of the status of the assets and also offer the opportunity to analyse the causes or circumstances of an incident, since a good technical analysis must always be supported by the log messages of the Hosts involved.

Business Rule

Business process monitoring provides a detailed view of the status of a service, based on the representation of the interrelations and the status of the various components necessary for its proper functioning. The creation of rules is highly flexible and can be as complex as encompassing layers from hardware, operating system, network, and applications, to user experiences. It can also be simplified, based solely on the statuses of a host.

The system is capable of monitoring Business Processes of Hosts and Services through Business Rules (hereafter BR). This is a logical function applied to the status of a set of assets, resulting in an alarm state. These rules are an integral part of the management and monitoring of these processes.

../../_images/2_041b_aggregator_realm_assets_modal-host-BP-trace-ok_0-43.png

Business Processes that are defined by rules integrated solely by Hosts will have the status of Host Business Process (BP Hosts) and will be identified with the following icon:

../../_images/2_030d_aggregator_realm_assets_bphost-logo_0-36.png

And therefore, they will be able to generate alarms of the type:

Typology

Icono

Description

Down

../../_images/2_001_aggregator_status-down_0-60.png

It indicates a total loss of availability of the Host; therefore, its severity is maximum.

Unreacheable

../../_images/2_003_aggregator_status-unknown_0-60.png

Reports a loss of contact with the Device. It is not known if theactive is still providing service or not, so this status indicates certainuncertainty about the real situation that deserves to be taken into accountconsideration. In that sense gravity is, a priori, lower than the previous.

Up

../../_images/2_004_aggregator_status-ok_0-60.png

The UP state is the normal operating state of an asset when it is providing its service without incident. Therefore, it is the desirable statefor all monitored elements and any changes to this Stateto another it is considered an anomaly that will have to be attended to.

Likewise, Business Processes that are defined by rulesintegrated solely by Services will have the status of ProcessService Business (BP Services) and will be identified with the next icon:

../../_images/2_030f_aggregator_realm_assets_bpservices-logo_0-36.png

And therefore, they will be able to generate alarms of the type:

Typology

Icono

Description

Critical

../../_images/2_001_aggregator_status-down_0-60.png

Indicates a total loss of the availability of the Service, therefore its gravity is maximum. Used when one of these becomes unavailableor when any parameter of the monitored Service exceeds a certainthreshold (the value of this threshold is set in the configuration of the host when it is registered).

Unknown

../../_images/2_003_aggregator_status-unknown_0-60.png

Reports a loss of contact with the asset. It is not known ifService is operational or not, so this status indicates certainuncertainty about the real situation that deserves to be taken into accountconsideration. In that sense gravity is, a priori, lower than the previous.

OK

../../_images/2_004_aggregator_status-ok_0-60.png

The OK state is the normal operating state of an asset when it is providing their service. Therefore, it is the desirable state for allitems monitored and any change from this State to another will be considers an anomaly that will have to be addressed.

Warning

../../_images/2_002_aggregator_status-warning_0-60.png

Warning status applies only to Services. It indicates some kind of malfunction, although the asset is still in service, so its severity is lower than Critical status.

Before creating a BR, we must have the relevant Monitoring Packs deployed to monitor the Hosts or Services that make up the environment. With this data, and by applying the previously mentioned logical operators, the BR are built.

Next, the condition of each component element of the BR. Then, taking into consideration these individual states and the logical operators that link and relate the elements of theBusiness Process, the system will determine its availability statusown.

Attention

The system only considers states of type HARD to determine the overall state of the asset or node. Therefore, any internal changesSOFT type will be rejected and will not affect the calculation of statesmonitoring.

Remember

SOFT: is assigned when the status of the service obtained is not definitive, as it may or may not be reverted on the next attemptcheck. In case of exceeding the predefined number of attempts getting negative states, the error severity level will be raised to HARD type. The objective is to avoid false alarms due to problemstransient.

HARD: is assigned when the service status obtained is wrongcontinuously, without being corrected. That is, when the service returns a negative status on the first attempt and also on subsequent attemptssubsequent checks exceeding the predefined number of attempts. This new situation is now notified to the contact user.

Construction and examples of Business Rules

The definition of a BR will always begin with the command:

bp_rule!
BR simple

If you want to create a Business Process made up of a single element, for example, the Host with name HostOne, the BR will be the following:

bp_rule!(HostOne)
../../_images/2_098_aggregator_BP_rule_example_0-59.png
BR con Servicios

To include a Service in the rule, you must enter the name of the Host and the name of the Service separated by a comma (,).

Continuing with the previous example, on this occasion it is necessary to create a Business Process that will be composed of a single element, the Host with name HostOne. The status of the CPU Service”. The Business Rule will be as follows

bp_rule!(HostOne,cpu|HostOne, partition)
../../_images/2_098a_aggregator_BP_rule_example_0-59.png
BR with logical operators

By using logical expressions (AND, OR, NOT) it is establishedalso a relationship between the different component elements, which makes it easier for WOCU-Monitoring to calculate the status of the asset through the analysis and evaluation of the operational states of its elementsmembers (physical and/or logical assets).

The names or identifiers of the assets will then be entered whose state the system will examine when executing the rule. Therefore, the syntax that must be respected is the following:

bp_rule!(Host_1 op Host_2 op Host_3)

Where “op” is the binary operator that can take the following values:

Operator

Value

&

AND

|

OR

!

NOT

Below are other more complex examples that employlogical operators:

Operador OR


In professional environments it is common to find scenarios where there are elements to ensure the availability of the services that they provide the same. Let’s imagine a web page hosted on two serversredundant web, one as active server and the other as the serverbackup. A Business Process intended to ensure the availability of the web page, composed of two web servers, WebServerActive and WebServerBackup, check for at least one of theweb servers provide service, will have a rule like the following:

bp_rule!(WebServerActive|WebServerBackup)

In this case, as can be seen, the logical operator has been used**OR** (or) represented by the character “|” and giving a value positive if at least one of the elements of the logical relationship is positive.

../../_images/2_098b_BP_rule_example_OR_0-60.png

Operador AND


Even more complexity can be added to BR with the use of otherslogical operators and expressions. Suppose that, in the example web previous, a database hosted on two database servers intervenes,*DBServerActive* and DBServerBackup, so we want the Rule to be Business determines the availability of the website based on theavailability of at least one of the web servers and at least one of the DB servers. The rule would then be:

bp_rule!(WebServerActive|WebServerBackup) & (DBServerActive|DBServerBackup)

As you can see, in this case the logical operator AND has been used (y) represented by the character “&” and which gives a positive value if both elements of the logical relationship are positive.

../../_images/2_098c_BP_rule_example_AND_0-60.png

Operator NOT


Let’s now look at the use of another logical operator. Let’s think about a scenario in there is a router, Router, that provides an Internet connection to a sitevia two independent dedicated lines (ADSL and ISDN) viatwo interfaces. We want to monitor the availability of the connection to through the main line with a Business Rule. Taking into accountthat the interface connected to the ISDN backup line (if_ISDN) onlywill be active when there is a drop in the main ADSL line (if_ADSL),The Rule to create would be:

bp_rule!(Router,ifADSL & !Rourter,ifISDN)
../../_images/2_098d_BP_rule_example_NOT_0-60.png

Operator OF


Next let’s think about the following scenario: to provide the serviceof a website there are three web servers (WebServer1, WebServer2 and WebServer3) and three DB servers (DBServer1, DBServer2, DBServer3). You need to create a rule that determines a correct operation of the web service when at least two of the threeweb servers are working properly and two of the three servers are workingDBDDs work correctly. The rule would be like this:

bp_rule!(2 of: WebServer1 | WebServer2 | WebServer3 ) & (2 of: DBServer1|DBServer2|DBServer3)

Using the of: operator preceded by a number or a percentage, youestablishes a minimum of elements that must meet the condition.

../../_images/2_098e_BP_rule_example_OF_0-60.png