Monitoring and alerting

Monitoring is one of the most important things for secure and flawlessly working service and/or system. Combined with load-balancing (scaling), backups, firewalling and encryption, monitoring can ensure stable setup with nearly constant uptime, can predict and can help prevent most failures and outages ahead of time.

Monitoring and alerting can be as trivial as observing server resources (system load, RAM, disk and CPU usage) and can often come out of the box with some cloud providers. Sometimes however a more granulated metrics are required such as:

  • I/O (disk) operations
  • network traffic and load
  • per process resource usage and uptime
  • per protocol or application custom/dedicated metrics
  • last activity or value for a metric
  • expiration or outage alerting
  • and so on

Properly setup monitoring and alerting policy and rules can help plan new solutions, solve issues ahead of time and with minimal or no downtime. Monitoring is essential for maintaining systems and services operational and keeping end clients happy.

Our expertise

The monitoring solution we can recommend depends on the technologies used and the implemented business logic. We have experience with trivial and complex systems; standard and custom cases and solutions like:

  • resource monitoring (CPU, RAM, I/O, network and so on)
  • system processes and services monitoring and alerting
  • backup success/failure monitoring
  • different monitoring solutions (Nagios, Icinga2, Prometheus, Alertmanager, Grafana and so on)
  • specific service or cases monitoring (i.e. EMQx, HAProxy, Let's Encrypt/TLS certificate expiration)
  • scheduled tasks execution success/failure monitoring
  • pending security and non-critical updates altering
  • single host and multiple copies (replicated) setups
  • and more

