Using NodePing with Ansible

Ansible is a configuration management and application deployment tool that is designed to help automate IT. NodePing offers a module that allows our customers who use Ansible in their infrastructure to automate tasks such as managing checks and creating ad-hoc and scheduled maintenance with our maintenance feature. For example, you can include setting up monitoring in your Ansible playbook, so new servers or virtual machines are automatically added to your monitoring.  Or, if you have a playbook that automates maintenance, you can have Ansible set ad-hoc maintenance on your monitoring for the affected host before it runs the rest of the playbook.

 

Getting Started

To get started with the NodePing Ansible modules, you will have to download the modules and copy them to your Ansible modules directory. You may have to edit your ansible.cfg. The path is configured via the library variable and by default is /usr/share/my_modules/. You can download the zip file here with the Ansible modules and extract them to your computer. In the unzipped folder you should find two Python files: nodeping.py and nodeping_maintenance.py. These two files should be copied into your Ansible library (modules) folder. There are also a couple example playbooks that show you a snippet of using the nodeping and nodeping_maintenance modules.

The NodePing Ansible module depends on the nodeping-api library for Python. This can be installed using the pip package manager, like so:

# python2

pip install nodeping-api

# python3
pip3 install nodeping-api

# Alternate python3
python3 -m pip install nodeping-api

# If installed for the user
python3 -m pip install --user nodeping-api

Creating Checks

You can create checks with the NodePing module, and if you need to access any of the values from the result after creation, you can register the result and access it elsewhere in your playbook. Here is an example:


---
- hosts: test
  
  vars:
    nodeping_api_token: secret-token-here
    
  tasks:
    - name: Create a NodePing check for target host
      delegate_to: localhost
      nodeping:
        action: create
        checktype: PING
        target: "{{ ansible_default_ipv4.address }}"
        label: mytest ping
        enabled: False
        interval: 1
        token: "{{ nodeping_api_token }}"
        notifications:
        - group: My Contact Group
          notifydelay: 2
          notifyschedule: All the time
        - contact: 4QT82
          notifydelay: 0
          notifyschedule: All the time
      register: result

Note that the checktype is in all caps. This will be necessary when creating a check. Our API documentation provides a list of check types as well as parameters for creating the check. It is recommended that you delegate the task to localhost if you can. That way your deployment server is the only server that needs the Python library installed. If you wish to create your checks on the target server, be sure to install the nodeping-api package via the pip module. The returned value is registered to the variable result and stored as a dictionary, so you can access the values easily. In this next example, the check id is used to get the check contents from NodePing:


- name: Get a check, run from localhost
  delegate_to: localhost
  nodeping:
    action: get
    checkid: "{{ result.message._id }}"
    token: "{{ nodeping_api_token }}"

Here you can see we queried result.message._id of the result we registered earlier. This is an example of how you can use data returned from NodePing through the rest of your playbook for whatever your needs may be. You can also get check info by providing a label, but note that if you have many checks with the same label, it will grab only the first one.

 

Maintenance

The Maintenance functionality will let you disable a list of checks while you do work on your server. That way, you can do your maintenance work on your server without it affecting your uptime during planned operations. An example of using this could be running the nodeping_maintenance module to disable your checks. It will take about 30 seconds for the maintenance schedule to start once created. You will want to ensure services aren’t being stopped or servers rebooted while the changes propogate across our distributed service and make sure all of your checks are disabled. At that point you can take your services offline without affecting your uptime metrics. Once the set duration is complete, NodePing will automatically enable those checks again. Here you can see an example of creating an ad-hoc maintenance that lasts for 30 minutes.

tasks:
  - name: Create ad-hoc maintenance
    delegate_to: localhost
    nodeping_maintenance:
      token: "{{ nodeping_api_token }}"
      name: ad-hoc maintenance
      duration: 30
      scheduled: False
      checklist:
        - 201911191441YC6SJ-4S9OJ78G
        - 201911191441YC6SJ-XB5HUTG6

  - name: Pause a minute to ensure checks are disabled
    pause:
      seconds: 30

# ...do stuff

You can also set scheduled to True, and provide a cron-syntax schedule to create a recurring maintenance.

Life is Easier with Automation

Pairing your Ansible automation with NodePing monitoring is a great way to automate processes, making them both easier and increasing reliability and trust in your systems.  We hope this module and other tools for integrating NodePing into your infrastructure management will make life easier and help you get the job done. If you aren’t using NodePing yet, you can sign up for a free, 15-day trial and test out monitoring your services today and take advantage of integrating NodePing monitoring and maintenance in your Ansible playbooks.

PUSH Client Wizard

Last year, we introduced a new feature called PUSH Checks. This check type allows your server to push numeric metrics into our system, track the metrics, send a heartbeat, and receive alerts based on the results. This is a powerful tool, and we use it internally at NodePing to monitor system load, backup processes, gather metrics from logs, and a variety of other things. We’re also glad to hear about customers using this feature in interesting ways as well.

However, until now setting up a PUSH check could be challenging. You would have to create the check, download a copy of the client and configure it with the Check ID and Checktoken as well as configure the metrics. So today we’re releasing a PUSH Client Wizard (available on GitHub) that makes PUSH Checks really easy to configure and deploy across your systems using an interactive command line wizard. This Python 3 client is able to run on any system with Python 3.5 or newer, and has been tested on Linux, Windows 10, and FreeBSD.

Features

So what can it do? The wizard lets you list your existing PUSH checks, create new PUSH checks, and delete PUSH checks you no longer want.

When listing checks, it will show information such as:

  • Your check’s label
  • ID
  • Checktoken
  • If the check will fail when its results are old
  • PASS/FAIL status
  • If it’s enabled/disabled
  • Run Interval

When creating a check you can configure all sorts of information for the check such as:

  • The client you will use (POSIX, Python, Python3, PowerShell)
  • Information about the check (Label, interval, enabled, public reports, fail when old)
  • Metrics to gather for the check (or none for basic heartbeat functionality) and values for pass/fail
  • Contacts and their notification schedules
  • Client configuration
  • Remote/local deployment

Configuring the client is an optional step if you want to do it yourself. When configuring the client, you have the ability to deploy the new PUSH check client locally or remotely over SSH! Once the client has been configured, a cron job or Windows Task Scheduler event information will be provided so you can simply copy/paste the provided information at the end.

This tool will allow you to quickly and easily manage your PUSH checks so you can monitor your systems with PUSH checks in less time.

Give the wizard a try today!

We encourage pull requests for new features so if you make changes you think others would find useful, please do share.

If you aren’t using NodePing yet, you can sign up for a free, 15-day trial and test out our new PUSH checks yourself and give the new wizard a try.

The Best of Times, the Worst of Times

As 2016 is coming to a close, we decided to look back on the past 4 years of data to see if we could find any interesting patterns in the ‘down’ events our customers experience.

Downtimes caused by timeouts have fallen each year from 85% of all ‘down’ events in 2012 to 67% this year while websites 500 errors are on the rise, nearly 80% year-over-year. I suspect it’s due to the increase use of CDNs like CloudFlare that are now kicking back 503s rather than the timeouts the source servers are showing.

Looking at our numbers, if your servers go down, there’s a slightly higher chance it will happen around 3:20 UTC on a Wednesday. Alerts will be most quiet around 16:45 UTC on Sundays.

From our data, a sysadmin’s worse day is around Nov 10 and the best is easily January 1.

So brace yourself for this Wednesday and hang in there… New Years is right around the corner.