Notification Dependencies

Oh no! A power supply failure has taken your website server offline and here comes 120 HTTP ‘down’ notifications from NodePing. When a major outage hits, the last thing you need is an alert flood for all the checks you already know are bound to fail.

When a check depends on other services or networks, you don’t need more notifications that it’s failing when you already know the service that it depends on is failing. NodePing recently released a new feature called ‘Notification Dependency’ to help mitigate that unhelpful alert flood.

Set a ‘Notification Dependency’ on your checks when you want to suppress notifications for checks that depend on another check for availability. Web sites on the same web server can all be set to have their HTTP checks dependent on the server PING check. Or, you can set server PING checks to be dependent on the network router PORT check. If the dependent check is failing, no notifications will be sent for the check. The checks will still fail, only alerts won’t be sent for those failures.

Choose your dependent check from the ‘Dependency’ dropdown in the ‘Notifications’ section of the check edit modal and then ‘Save’ your changes for that check.

Notification dependencies are another way to help you receive only actionable alerts for your uptime monitoring and are available to all NodePing customers.

If you aren’t using NodePing server monitoring yet, sign up for your free, 15-day trial today.

Delayed Notifications

NodePing now offers delayed notifications for your uptime monitoring. This is a powerful new feature that will help make your notifications actionable. There are two primary use cases for delayed notifications: flapping services and escalating notifications.

Flapping Services
Not all services or networks are rock solid. Sometimes three or even two nines is “good enough”. Some locations have inherently lower expectations for availability or are just prone to frequent, short-lived outages. When a check often fails but recovers by itself quickly (flaps), it’s difficult to get actionable notifications.

Adjusting the check sensitivity setting down is useful to give your check more time to recover but if unassisted recovery takes longer than a minute the check will still likely fail. Use delayed notifications for flapping checks to receive alerts only when services are ‘really’ down. You can configure NodePing to send a alerts if your check remains down after say 5 minutes. Set the delay (from 1 minute to 1 hour) to your tolerance and receive only alerts when human intervention is required.

Escalating Notifications
Not everyone needs to know about every outage right away. If the sysadmin on call can get the site back up within a few minutes, there’s no action required by senior staff or for the help desk to be informed. If an outage lasts longer, however, you may need to let your boss know things are still offline or give a heads up to the help desk that there are issues on the website and to expect some calls. Use delayed notifications to set escalating alerts to others if an outage continues.

You can even escalate alerts to yourself. I have several checks set to email me immediately and then send me an SMS if they’re still failing after 5 minutes and a voice call if the outage lasts longer than 10 minutes.

Setting Notification Delays
When editing a check, you’ll see the contact method drop down in the ‘Notifications’ section of each check. Choose a contact method and the ‘Delay’ and ‘Schedule’ dropdowns will also appear. You can set different delays on the same contact method by adding additional lines with the same contact method.

Actionable Alerts
Delayed notifications can be useful to make all your alerts more actionable. If your contacts are ignoring NodePing notifications, they’ll succumb to alert fatigue and eventually ignore a truly important notification.

If you need any help tuning your checks to avoid flapping or adjusting your notifications to make them more actionable, please reach out to us at support@nodeping.com. We really are happy to help.

If you aren’t using NodePing for uptime monitoring yet, please sign up for our 15-day, free trial and let us help you increase your uptime.

Disable Monitoring

Last week we added several features aimed at making our monitoring service easier to manage and use.  One of those features adds new ways to easily disable checks, either in the UI or using the API. Disabling a check stops all monitoring and notifications. It can be useful to keep scheduled maintenance or downtime from affecting your uptime statistics. I’ll explain a bit on the various ways you can disable monitoring within NodePing.

Disabling a Single Check
There are two ways to disable a single check within the NodePing web interface. The first is in the check details drawer. If you click the name of a check in the list, the details drawer will slide out. There you can see the last 5 results, links to the various reports, and a toggle to disable the check. Simply click on the “toggle” link next to the text “This check is currently active”. To re-enable, click the toggle again.

The second way to disable a single check is within the check edit screen. Click on the “Edit” button to call up the check edit modal and remove the checkbox from the “Enable Check:” field, then click on “Save”. To re-enable the check, edit it again and check that same box.

Sometimes you may need to disable all the checks to a particular datacenter or for a particular server or service. You can do it one at a time as described above but that can be a click-fest if you have a large number of checks to disable. Our new features allow you to easily disable all of your checks at once, or to disable a group of them based on some powerful filtering capability (described more below).

Disabling All Checks
You can disable and re-enable all your checks with just a couple of clicks. In the “Account Settings” – “General Settings” tab, you’ll see a link for “Disable All Checks”. Click on it and all your currently enabled checks will be immediately disabled. To re-enable the checks, click on the new link that appeared that says “Re-enable Checks”. Please note that this will only re-enable checks that have been disabled using the “Disable All Checks” link. If you disabled a check using one of the methods described above in the “Disabling a Single Check” section of this post, the check will remain disabled.

Disabling all checks can be useful to silence monitoring and notifications during major outages, planned maintenance, or to quiet logs when troubleshooting.

Disabling Multiple Checks
Our new disable feature has some powerful filtering that can help you disable all checks where the label, type, and/or target are similar. Clicking on the “Show optional filters” link in the “Account Settings” – “General Settings” tab will display the available filter fields of “Type”, “Label”, and “Target”. After choosing the dropdown or typing your desired filters in the fields there, you need to click on the “Disable All Checks” link and NodePing will disable all currently enabled checks that match your filters.

If you’d like to disable all your HTTP Content checks, you can choose it from the “Type” drop down.

If all the checks you need to disable are named similarly (example: “Server1: website A”, “Server1: website B”, “Server1: website C”), you can disable all of them by putting “Server1” in the “Label” field. The matching works on any part of the label.

If a particular server is failing, you can disable all checks (no matter what type or label) that point to that server by putting the name or IP address used in the check “Target” field. For example, I can disable all checks that point to all nodeping.com hosts by typing “nodeping.com” in that field. It will disable checks to ‘smtp.nodeping.com’ and ‘www.nodeping.com’, no matter what the check type is.

The filters are additive so if you choose the “HTTP” type and type something in the “Label” field, only HTTP checks that match that label will be disabled.

Use the “Re-enable Checks” link to re-enable checks that were previously disabled using the “Disable All Checks” link and filters.

These filters have superpowers too, thanks to regex. You can geek out and provide a valid javascript regex expression for each filter in our UI or API. Run a curl one-liner to our API before your maintenance fires off to disable some checks and then re-enable them when it’s done. See our API reference for details.

NodePing is committed to bring you more functionality like these new disable check features available in all accounts now. If you don’t yet have a NodePing server monitoring account, we encourage you to sign up for a free, 15-day trial and see how our fast, accurate uptime monitoring can help you keep your services up and available.

NTP Monitoring

Host clock synchronization is important for server clusters and many other services. Having a node with the system clock drifting can cause all kinds of hard-to-troubleshoot issues (I’m looking at you, Cassandra). Thankfully NTP (Network Time Protocol) has been there since before 1985 to help us keep our clocks within a few milliseconds of each other.

If you run your own NTP servers or use someone else’s for mission critical services, you need to monitor that they are up and running. NodePing’s new NTP check can make sure the NTP services you rely on are available and responding and will send you actionable notifications when they aren’t.

Alternatively, if you have a private NTP server that should not be available to the relentless interwebs, we can monitor it’s expected silence well. If your private NTP server starts responding to the world, we’ll send you an alert that the dog got out of the yard.

NTP monitoring is available on all account plans today. If you don’t have a NodePing account yet, sign up for your free, 15-day trial today and we’ll keep an eye on your NTP servers for you.

Site update and feature releases Oct 2016

We’ve rolled out some UI and feature updates for NodePing today. We hope you find them helpful. I’ll summarize the changes here. Look here for future posts, which will go into more details for each.

Delayed Notifications:
You can now set a delay on ‘down’ alerts. This will help make your notifications more actionable for frequently flapping services. This new feature can also be used to escalate alerts or notify support/management if services remain offline. This feature has been available for a while in our API, but hasn’t been in our documentation, and has now been added to our UI as well. See the ‘Delay’ drop down in the Notification section of your check.

Check Cloning:
You can now clone an existing check, with all its settings, in our UI to create a new check. This will help reduce “clickty-clickty” syndrome when setting up a lot of checks with similar settings. Click on the label of the check you want to clone to display the details to reveal the ‘Clone Check’ link on the far right.

Notification Dependencies:
When an edge router or server fails, it’s assumed that all the services that depend on them will also fail. It’s not helpful to receive hundreds of alerts for dependent services. You can now set another check as a notification dependency on each check. If the dependent check is already failing, notifications will be suppressed. Use this to avoid alert floods when bottleneck services fail. You can find the ‘Dependency’ drop down in the Notification section of each check.

Disable All Notifications:
There is now a link in the Contacts tab to “Disable notifications”. Use this to suppress all alerts until you re-enable them using the same link. It’s another way to help avoid the distraction of alert floods during big outages.

Disable Checks:
Now you can disable multiple checks with one click. You’ll find the “Disable All Checks” link in the Account Settings – General Settings tab. You can also apply filters based on label, target, or check type to, for example, disable all PING checks or all checks pointing to “example.com”. Use this to disable checks during planned outages/maintenance or to quiet down your logs when troubleshooting.

All the above new features, except check cloning, are also available via our API. If you have any questions about these new features, reach out to support@nodeping.com; we’re happy to help.

Public Status Report Update

NodePing’s public status report feature allows you to create an uptime report for your sites or services in your own domain. It’s a popular part of our website and server monitoring service, and is available on all NodePing accounts. Today we added a couple of (hopefully very useful) enhancements to the public status reports.

The report now has a column on the right side that shows the uptime for each service over the past 30 days. It also allows you to display a column to show the check type, which can be turned on and off in the report’s configuration page. Plus, we’ve also tweaked the filtering on the title field, which has opened it up to a wider degree of customization. For example, you can include image tags and style tags in this field, which allows you to add your logo, as well as having significant control over the overall look of the report.

The report already gave you the ability to set which checks should appear on the report, and to set a custom URL for the report (so, for example, you could have it on the status subdomain of your own domain, so the URL would be status.example.com). And if you have public reports turned on for individual checks, those reports will automatically be linked from the status report.

We hope that these enhancements, on top of the features we already had for the status report, will make this report very useful to all of our customers. We put a lot of emphasis on feedback from our users, so please let us know what other features would help you make the most of our monitoring service.

If you run web sites or other Internet services and haven’t tried out our monitoring service, give us a try with out 15 day free trial.

How to integrate PagerDuty into NodePing

Many of our customers are also big PagerDuty fans. What’s not to like! PagerDuty offers great escalation and on-call hand-off capabilities as well as flexible voice, sms, and even pajama alerts.

To make it easier for you to integrate your already existing PagerDuty workflow, we’ve added a new contact notification type to NodePing. The ‘PagerDuty’ type accepts a ‘Service API Key’. You can find information on how to set up a PagerDuty generic API service at their support site.

Our system will send a ‘trigger’ event on each failure and a ‘resolve’ event on each recovery. Add an entry in your contact record by specifying your PagerDuty ‘Service API Key’ (they kind of look like a big random string “47b3a13848514c3fa3def842464eeaa8”) and selecting ‘PagerDuty’ in the notification type drop down. Then specify that contact when you edit or create your NodePing checks.

pagerduty

You can specify as many different PagerDuty contacts as you like. This allows you to use multiple ‘Services’ with NodePing and have full control of your PagerDuty escalations and notifications.

We strive to bring you the best solutions for your monitoring needs. We’ve set our eyes on Android and iOS push notifications next so follow this blog for that notification. We’d also love to hear from you. What notification types or other features would you like to see in NodePing?

If you’re not a NodePing customer yet, you can sign up for a free 15-day trial and kick the tires for yourself. We’re confident you’ll like what you find.

Minor API enhancements added today

We have a couple of updates to our API.

You can obtain the current status of your checks using /api/1/results/current. This returns a list of checks that currently have an “event,” which means that the check is currently disabled or is listed as “down.” The information returned will include a timestamp when the event started. Checks not listed in the results for this call are currently “up.”

We’re also adding a couple of convenience tweaks. When you are getting a list of checks, you can add a “current” parameter in order to have any current events added to the check information. This basically mixes the information from the “current” call mentioned above in with the list of checks.

Additionally, when you are getting a single check, you can add a “lastresult” parameter to the request and get the most recent result for that check along with the check information.

All three of these changes are included in our API Reference documenation. Hopefully these minor enhancements will be of help. Feedback is welcome here or at support@nodeping.com.

Webhooks for Business Plan Customers

At NodePing we are constantly striving to provide the best server monitoring service at absolutely the best price we can. Our goal is to be as useful as we can to meet your day to day monitoring needs for both web and server monitoring, at a price that removes all barriers for best practice monitoring for every service, everywhere.

In line with that goal, today we are changing the Business plan to add unlimited webhook notifications. Previously those were only available in the Provider plan.

Webhooks allow you to trigger actions on other sites based on events from NodePing’s monitoring system. This allows you to automate actions on your servers when a web site goes up or down. Common uses include changing DNS settings when a server goes offline and restarting a database when it fails.

Our hope is that including webhooks in the Business plan will make NodePing’s most popular plan even more useful for businesses the world over. A lot of what we do is based on feedback, so please continue to let us know how we can continue to make our service fit your needs even better.

Email monitoring done right

For several years before starting NodePing I worked in a number of different roles in IT, including system administration, project management, infrastructure and network management, and development. A sizable chunk of that time was spent at an organization that ran email servers in a number of different countries scattered around the world. Making sure that all of those email systems were working properly and generating useful reporting was a huge challenge, and involved a lot of repetitive manual steps.

Availability report outputNodePing’s monitoring services were largely motivated by the desire to make widespread monitoring of web sites and other Internet accessible services as simple and automatic as possible. One of the reasons I’m so excited about our suite of email monitoring checks is that I know from personal experience how important these tools are, both from a sys admin’s point of view as well as from technical management roles.

The core of this set of tools is SMTP monitoring. This check has several options that allow you to check the remote SMTP server in a variety of ways. At its most basic, it can be used to check that the server is operating and answering to SMTP connections and is accessible. It can also watch the SSL/TLS certificates, and notify you in advance of when certificates will expire. The check also can be used to monitor if the SMTP server accepts or denies specific email addresses, which can be used for open relay monitoring. Authentication verification can make sure that the server is logging people in properly. This is particularly important when email servers are integrated with separate directory services, such as an LDAP service or Active Directory.

SMTP server monitoring should also be paired with RBL monitoring. This checks the server’s address against a number of different RBL services, and can notify you if the server has been blacklisted. Any experienced email administrator knows that staying off of these lists is critically important, and it is possible to get on a black list without doing anything outside of normal business practices. When it happens you need to know quickly so you can remedy or clarify the situation and get off of the black list before it negatively impacts business.

The IMAP and POP checks go hand in hand with the SMTP check to ensure that your customers and employees can retrieve mail from their inboxes. Like the SMTP check, these checks not only monitor that the server is accepting connections, but can verify authentication and warn you in advance if an SSL certificate is nearing expiration.

The final piece of the email service monitoring tool set is monitoring the web interface. Here NodePing’s HTTP Content check can be used to make sure that the service is responding with the proper web page, and the SSL check can verify that the web interface’s SSL certificate is in place and working properly, as well as warn of a nearing expiration date.

These checks together provide a full complement of tools for monitoring email services. For most systems, we’d suggest a full set of checks:

  • The SMTP service is operating properly on port 25, accept a STARTTLS command, accepts authentication, and accepts a given address for relay from an authenticated user. All of this, with verification of the TLS certificate, can be done with one check.
  • The SMTP service is listening and accepting SSL based connections on port 587.
  • The SMTP service rejects open relay requests.
  • The SMTP service accepts a local address from non-authenticated hosts.
  • The server is not on any RBL’s.
  • The IMAP server is operating properly on port 143 and authenticating properly
  • The IMAP server is operating properly on port 993 and the SSL certificate is good
  • The POP server is operating properly on port 110 and authenticating properly
  • The POP server is operating properly on port 995 and the SSL certificate is good
  • The web interface is operating properly on port 80 (if that is supported)
  • The web interface is operating properly on port 443 and the certificate is good.

This is a long way from a check that just monitors if a port is listening somewhere. It is the full set of checks that together help to ensure a healthy email system. We continue to extend our monitoring service and make our checks smarter, with the goal to take as much of the manual busy work out of the hands of busy administrators and allow them to focus on tasks that use their actual skills.

If you are responsible for email servers and haven’t added NodePing’s monitoring to your tool set yet, sign up for our free trial and give a try!