Delayed Notifications

NodePing now offers delayed notifications for your uptime monitoring. This is a powerful new feature that will help make your notifications actionable. There are two primary use cases for delayed notifications: flapping services and escalating notifications.

Flapping Services
Not all services or networks are rock solid. Sometimes three or even two nines is “good enough”. Some locations have inherently lower expectations for availability or are just prone to frequent, short-lived outages. When a check often fails but recovers by itself quickly (flaps), it’s difficult to get actionable notifications.

Adjusting the check sensitivity setting down is useful to give your check more time to recover but if unassisted recovery takes longer than a minute the check will still likely fail. Use delayed notifications for flapping checks to receive alerts only when services are ‘really’ down. You can configure NodePing to send a alerts if your check remains down after say 5 minutes. Set the delay (from 1 minute to 1 hour) to your tolerance and receive only alerts when human intervention is required.

Escalating Notifications
Not everyone needs to know about every outage right away. If the sysadmin on call can get the site back up within a few minutes, there’s no action required by senior staff or for the help desk to be informed. If an outage lasts longer, however, you may need to let your boss know things are still offline or give a heads up to the help desk that there are issues on the website and to expect some calls. Use delayed notifications to set escalating alerts to others if an outage continues.

You can even escalate alerts to yourself. I have several checks set to email me immediately and then send me an SMS if they’re still failing after 5 minutes and a voice call if the outage lasts longer than 10 minutes.

Setting Notification Delays
When editing a check, you’ll see the contact method drop down in the ‘Notifications’ section of each check. Choose a contact method and the ‘Delay’ and ‘Schedule’ dropdowns will also appear. You can set different delays on the same contact method by adding additional lines with the same contact method.

Actionable Alerts
Delayed notifications can be useful to make all your alerts more actionable. If your contacts are ignoring NodePing notifications, they’ll succumb to alert fatigue and eventually ignore a truly important notification.

If you need any help tuning your checks to avoid flapping or adjusting your notifications to make them more actionable, please reach out to us at support@nodeping.com. We really are happy to help.

If you aren’t using NodePing for uptime monitoring yet, please sign up for our 15-day, free trial and let us help you increase your uptime.

The Best of Times, the Worst of Times

As 2016 is coming to a close, we decided to look back on the past 4 years of data to see if we could find any interesting patterns in the ‘down’ events our customers experience.

Downtimes caused by timeouts have fallen each year from 85% of all ‘down’ events in 2012 to 67% this year while websites 500 errors are on the rise, nearly 80% year-over-year. I suspect it’s due to the increase use of CDNs like CloudFlare that are now kicking back 503s rather than the timeouts the source servers are showing.

Looking at our numbers, if your servers go down, there’s a slightly higher chance it will happen around 3:20 UTC on a Wednesday. Alerts will be most quiet around 16:45 UTC on Sundays.

From our data, a sysadmin’s worse day is around Nov 10 and the best is easily January 1.

So brace yourself for this Wednesday and hang in there… New Years is right around the corner.

Disable Monitoring

Last week we added several features aimed at making our monitoring service easier to manage and use.  One of those features adds new ways to easily disable checks, either in the UI or using the API. Disabling a check stops all monitoring and notifications. It can be useful to keep scheduled maintenance or downtime from affecting your uptime statistics. I’ll explain a bit on the various ways you can disable monitoring within NodePing.

Disabling a Single Check
There are two ways to disable a single check within the NodePing web interface. The first is in the check details drawer. If you click the name of a check in the list, the details drawer will slide out. There you can see the last 5 results, links to the various reports, and a toggle to disable the check. Simply click on the “toggle” link next to the text “This check is currently active”. To re-enable, click the toggle again.

The second way to disable a single check is within the check edit screen. Click on the “Edit” button to call up the check edit modal and remove the checkbox from the “Enable Check:” field, then click on “Save”. To re-enable the check, edit it again and check that same box.

Sometimes you may need to disable all the checks to a particular datacenter or for a particular server or service. You can do it one at a time as described above but that can be a click-fest if you have a large number of checks to disable. Our new features allow you to easily disable all of your checks at once, or to disable a group of them based on some powerful filtering capability (described more below).

Disabling All Checks
You can disable and re-enable all your checks with just a couple of clicks. In the “Account Settings” – “General Settings” tab, you’ll see a link for “Disable All Checks”. Click on it and all your currently enabled checks will be immediately disabled. To re-enable the checks, click on the new link that appeared that says “Re-enable Checks”. Please note that this will only re-enable checks that have been disabled using the “Disable All Checks” link. If you disabled a check using one of the methods described above in the “Disabling a Single Check” section of this post, the check will remain disabled.

Disabling all checks can be useful to silence monitoring and notifications during major outages, planned maintenance, or to quiet logs when troubleshooting.

Disabling Multiple Checks
Our new disable feature has some powerful filtering that can help you disable all checks where the label, type, and/or target are similar. Clicking on the “Show optional filters” link in the “Account Settings” – “General Settings” tab will display the available filter fields of “Type”, “Label”, and “Target”. After choosing the dropdown or typing your desired filters in the fields there, you need to click on the “Disable All Checks” link and NodePing will disable all currently enabled checks that match your filters.

If you’d like to disable all your HTTP Content checks, you can choose it from the “Type” drop down.

If all the checks you need to disable are named similarly (example: “Server1: website A”, “Server1: website B”, “Server1: website C”), you can disable all of them by putting “Server1” in the “Label” field. The matching works on any part of the label.

If a particular server is failing, you can disable all checks (no matter what type or label) that point to that server by putting the name or IP address used in the check “Target” field. For example, I can disable all checks that point to all nodeping.com hosts by typing “nodeping.com” in that field. It will disable checks to ‘smtp.nodeping.com’ and ‘www.nodeping.com’, no matter what the check type is.

The filters are additive so if you choose the “HTTP” type and type something in the “Label” field, only HTTP checks that match that label will be disabled.

Use the “Re-enable Checks” link to re-enable checks that were previously disabled using the “Disable All Checks” link and filters.

These filters have superpowers too, thanks to regex. You can geek out and provide a valid javascript regex expression for each filter in our UI or API. Run a curl one-liner to our API before your maintenance fires off to disable some checks and then re-enable them when it’s done. See our API reference for details.

NodePing is committed to bring you more functionality like these new disable check features available in all accounts now. If you don’t yet have a NodePing server monitoring account, we encourage you to sign up for a free, 15-day trial and see how our fast, accurate uptime monitoring can help you keep your services up and available.

NTP Monitoring

Host clock synchronization is important for server clusters and many other services. Having a node with the system clock drifting can cause all kinds of hard-to-troubleshoot issues (I’m looking at you, Cassandra). Thankfully NTP (Network Time Protocol) has been there since before 1985 to help us keep our clocks within a few milliseconds of each other.

If you run your own NTP servers or use someone else’s for mission critical services, you need to monitor that they are up and running. NodePing’s new NTP check can make sure the NTP services you rely on are available and responding and will send you actionable notifications when they aren’t.

Alternatively, if you have a private NTP server that should not be available to the relentless interwebs, we can monitor it’s expected silence well. If your private NTP server starts responding to the world, we’ll send you an alert that the dog got out of the yard.

NTP monitoring is available on all account plans today. If you don’t have a NodePing account yet, sign up for your free, 15-day trial today and we’ll keep an eye on your NTP servers for you.

Probe Server IP Address Change – [DE]

Our probe in Frankfurt, Germany will be changing IP addresses on 2016/11/02:
Frankfurt, Germany (DE) is changing from 62.113.242.111 to 5.1.70.107.

Please adjust your firewalls appropriately so your checks do not fail because of the probe IP address change.

An always current and updated list of all the IP addresses for our probe servers can be found in the FAQ

[UPDATE – 2016-11-02 06:55GMT-4] – IP change complete.

Site update and feature releases Oct 2016

We’ve rolled out some UI and feature updates for NodePing today. We hope you find them helpful. I’ll summarize the changes here. Look here for future posts, which will go into more details for each.

Delayed Notifications:
You can now set a delay on ‘down’ alerts. This will help make your notifications more actionable for frequently flapping services. This new feature can also be used to escalate alerts or notify support/management if services remain offline. This feature has been available for a while in our API, but hasn’t been in our documentation, and has now been added to our UI as well. See the ‘Delay’ drop down in the Notification section of your check.

Check Cloning:
You can now clone an existing check, with all its settings, in our UI to create a new check. This will help reduce “clickty-clickty” syndrome when setting up a lot of checks with similar settings. Click on the label of the check you want to clone to display the details to reveal the ‘Clone Check’ link on the far right.

Notification Dependencies:
When an edge router or server fails, it’s assumed that all the services that depend on them will also fail. It’s not helpful to receive hundreds of alerts for dependent services. You can now set another check as a notification dependency on each check. If the dependent check is already failing, notifications will be suppressed. Use this to avoid alert floods when bottleneck services fail. You can find the ‘Dependency’ drop down in the Notification section of each check.

Disable All Notifications:
There is now a link in the Contacts tab to “Disable notifications”. Use this to suppress all alerts until you re-enable them using the same link. It’s another way to help avoid the distraction of alert floods during big outages.

Disable Checks:
Now you can disable multiple checks with one click. You’ll find the “Disable All Checks” link in the Account Settings – General Settings tab. You can also apply filters based on label, target, or check type to, for example, disable all PING checks or all checks pointing to “example.com”. Use this to disable checks during planned outages/maintenance or to quiet down your logs when troubleshooting.

All the above new features, except check cloning, are also available via our API. If you have any questions about these new features, reach out to support@nodeping.com; we’re happy to help.

Probe Server Changes – [UT,RO,FL,AM]

The following probe servers will be changing IP addresses on 2016/10/04:
Ogden, Utah USA(UT) will change from 192.154.111.10 to 192.154.102.130
Bucharest, Romania (RO) will change from 77.81.108.115 to 89.45.10.135
Miami, Florida (FL) will change from 107.161.178.251 to 162.254.202.35

We’ll also be adding the following new probe in our East Asia/Oceania region on 2016/10/04:
Melbourne, Australia (AM) will be added – 103.207.28.11

We’ll also be removing the following probe in our Latin America region on 2016/10/04:
Curico, Chile (CL) will be removed – 190.114.254.203

Please adjust your firewalls appropriately so your checks do not fail because of the probe IP address changes.

An always current and updated list of all the IP addresses for our probe servers can be found in the FAQ

Edit 2016-09-28 – Fixed the incorrectly listed the current IP for the Florida probe FL.

[UPDATE – 2016-10-04 23:15GMT-4] – changes complete.