High Availability (HA) scenarios

This section compiles highlights on the High Availability (hereafter HA) options of WOCU-Monitoring:

1) Assuming a deployment scenario as in the attached image. In the event of a VPN failure in either data centre, does the satellite continue to collect metrics and then send them to index once the connection is re-established?

../../_images/7_039_HA-WOCU_0-48.png

The events produced and sent from the client infrastructure to the data backends in the cloud are as follows:

⇨ Monitoring metrics.

  • They are stored in Influxdb, a specific database for time series.

  • A plugin has been developed in the monitoring engine that saves the generated metrics in a buffer in memory when there are connectivity problems.

  • Sending of metrics is resumed when the connection is restricted.

  • The size of the buffer is configurable.

⇨ Monitoring events

  • Events generated by the monitoring engine itself are stored in Mongodb, a document database.

  • The Mongodb client itself, which is used for sending events, supports retryable-writes in case of communication failure. More information can be found at the following link.

⇨ Logs and traps

  • Any other type of events external to the monitoring engine, such as system logs (syslog), application logs, SNMP traps, etc., are stored in Elasticsearch, a search engine specialised in data analytics.

  • Events are sent through the td-agent/fluent event collector (similar to logstash).

  • This collector has the ability to write to disk the events it cannot send and executes the dump when it recovers the connection.

  • The size of the dump buffer is configurable.


2) What is the maximum number of administrator and query users I can implement on the platform?

There is no stipulated limit.

Currently there are clients with thousands of registered users and around 150/200 concurrent users.


3) Can WOCU-Monitoring satellites be deployed in HA mode while keeping their configurations synchronised?

WOCU-Monitoring satellites operate in HA by default.

If one satellite loses connectivity, the rest of the satellites take over the work, sizing up the task to continue operating normally.

The following picture shows an overview of how the satellites and the scheduler work.

../../_images/7_041_satelite.png

4) Is the installation of WOCU-Monitoring distributed and where can I install it?

WOCU-Monitoring is composed of several software components that communicate with each other. They can be distributed on a single node or on separate nodes to gain flexibility or performance.

It can also be installed on physical machines, virtualised environments, containers, clouds, etc.

The following image shows the general architecture of WOCU-Monitoring, consisting of the following components:

../../_images/7_040_software-architecture.png