Skip to content

Using Nagios exclusively to monitor the YubiCloud


Nagios is a very important part of our infrastructure. We rely on it to provide us with extensive and detailed checks on all our machines and services. We don’t just monitor if a service is up or down, but also a lot of other metrics which allow us to quickly understand the origin of a problem and hopefully fix it before the external service is effected.

Up until recently we also used Pingdom in conjunction with our Nagios infrastructure. Pingdom made monitoring an external service easy enough, while providing us with graphs we could display here on this blog.

Unfortunately, for various reasons, we have come to the decision to stop using Pingdom and rely solely on Nagios instead.

For the time being this means that we cannot, as transparently as before, display uptime statistics for our api* endpoints. We are investigating alternative ways in how we can render this data from the Nagios monitoring system.

Below are uptime statistics for each of our YubiCloud endpoint for the past year.

You may have noticed that api2 and api5 have considerably higher downtime than other endpoints. These two machines are currently hosted with Linode, which went through several DDOS attacks between December 25th 2015 and January 5th 2016. We are considering to move one of these machines to another hosting provider.

The rest of the outages were caused by restarts to the machines, due to updates and security patches. Please note that as a whole, the YubiCloud still has had 100% uptime since its inception.

Comments are closed.

%d bloggers like this: