Finding the Best Server Providers

Great services need great boxes to run on. How do we know if a server or VPS host is performant and reliable?

We use dozens of different hosts for NodePing and our standards for performance and reliability are really high. There are many SaaS out there that host only on AWS. Putting all your eggs in one basket is nice for billing but would make our service fragile and vendor-dependent. We spread our boxes around to make it resilient and better represent the Internet’s disperate architecture for monitoring.

We have to take new boxes out and put them through their paces; kick the tires and make sure they’re solid. This is how we test out a new provider before we use a dedicated server or VPS.

Blacklisted

As soon as we have our IP assignments from the provider, we check to make sure the IPs aren’t listed in any spam blacklists using NodePing RBL checks. Most of our hosts don’t send any actual email but our public probes do a lot of SMTP connections to ensure our customers’ mail servers are functioning properly. If the IPs are blacklisted, we’ll need a clean IP from the provider or cancel and look elsewhere.

We’ll leave this RBL check running once an hour to make sure it doesn’t get listed half way through our testing period.

Blacklisted IPs can be a good indicator of provider quality even if the server won’t be sending any email. A provider that can’t keep spammers out of their service is unlikely to be able to keep a reliable network.

Incoming Traffic

Solid networks can be hard to find. We test for inbound packet-loss and routing issues using NodePing PING checks. We’ll sometimes test from a few different geographical regions to ensure global routing is stable. Anything less than 100% uptime for 30 days is unacceptable for us. If the provider had announced planned maintenance well in advance, we’d use NodePing’s maintenance feature to ensure the uptime stats remained accurate despite planned outages. In our decade-plus experience, a network that sees even one episode of packet-loss or route failure is going to continue to see them and isn’t stable enough for our use.

We’ll do the same for IPv6 addresses as routing and packet-loss can be independent of the IPv4 stack. Some providers have a hard time keeping their IPv6 blocks broadcasted and we’ve seen IPv6 completely fail while IPv4 continued to function normally.

We enable automated diagnostics for all our PING checks so we can see where on the route the packet-loss or routing failure is happening. Getting immediate MTRs can show us the weak links in a network and if we see issues with some of the usual suspects, we will for sure dump it. Yes, I’m looking at you, Cogent!

Outbound Traffic

Sometimes a network issue seems to only impact outbound routing. We use the AGENT functionality to assign additional PING checks to originate from the server being tested towards some of the other servers it would be connecting to if it’s moved into production. The AGENT software will run NodePing checks just like the public probes but originating from our test host. It’s a great way to detect outbound packet-loss and routing issues from the server. Again, anything less than 100% uptime on this test and the service isn’t going to make muster.

System Load

The performance of a VPS can be greatly impacted by issues outside our control. Two of the most frequent system load issues we’ve seen on VPS are noisy neighbors and host server backups.

A good provider won’t oversell their VPS host servers and will suspend anyone who is abusing more than their fair share of resources. If we end up on a box with noisy neighbors, the system load on our VPS will likely spike, starving our processes from getting the CPU, memory, networking, or storage I/O they need to function properly.

We’ve also come across providers where we saw system load rise every Saturday around midnight (GMT) for 30 mins or so. Turned out their backup process was overwhelming the disks and causing load issues on all the VPS on the host.

These types of issues are simple to find using PUSH checks that monitor the system load. Since we aren’t using these boxes for anything yet, we have to set the thresholds pretty low to detect load issues caused by resource starvation. This is one test that we’ll give a bit of slack to a provider if it fails though. Noisy neighbors or hungry backups can happen to any provider and we’ll give them a chance to find and address the cause. If it keeps happening though, pull the plug on that provider. It’ll just be worse once you start using the machine and an ongoing headache trying to get their support to do anything about it.

If a server can keep humming along for 30 days without any of the checks above failing, there’s a pretty good chance that provider and network are going to be solid and reliable. I hope this look into our vetting process will help you with your provider search for those elusive reliable networks and servers.

If you don’t yet use NodePing, please sign up for our free, 15-day trial and see for yourself how our monitoring can increase your uptime.

MTR Check to Monitor Packet Loss

Packet loss and routing issues can impact any provider. Our newest check type, MTR, can help you detect and pinpoint the root of the problem. Faster detection and troubleshooting means less downtime for your websites and services.

The MTR command line tool has been around since 1997. Ask any graybeard sysadmin, they’re sure to be familiar with it. It’s great for revealing the presence of packet loss on a host and where along the route that packet loss starts.

Since routing is different for IPv6 than IPv4, you’ll want to create 2 MTR checks per host – one for your IPv4 address and another for the IPv6 address. You can either force IPv6 DNS resolution on your FQDN or use the IPv6 address itself as the check target.

MTR results from our probes is only half the story though. To get the full picture, you may need to run an MTR from your server. Use NodePing AGENT software to run MTR, PING, and nearly all our other check types from your server. It’s like having your own private NodePing probes. Results are quickly pushed to NodePing for processing and notifications.

No other network tool is more widely used among sysadmins for troubleshooting connectivity issues than MTR. Now you can automate it on both sides of the network using NodePing’s new MTR check.

If you don’t yet have a NodePing account, please sign up for our free, 15-day trial. Your graybeards will thank you.

Diagnostic Tools

“Why is my check failing?”

It isn’t always obvious what’s causing the failure when a check does ‘down’ and additional information about what our probes are experiencing can be helpful. For example, if your website is timing out, is it the web server, a DNS problem, or maybe packet loss on the network?

Our new diagnostic tools allow you to run several utilities on our probes and give visibility to what our probes are seeing to help you troubleshoot a failing service. These tools can be useful to narrow down where the failure is so you can get things fixed and services restored as quickly as possible.

Tools available:

  • Ping
  • Traceroute
  • MTR
  • Dig
  • Page Load (browser loading with page speed – HAR viewer)
  • Screenshot

More information about the tools and some troubleshooting advice can be found in our documentation.

You can find these tools on the “Diagnostic Tools” tab when you login to your NodePing account.  If you don’t yet have a NodePing account, you can create one and try out these tools with our 15-day, free uptime monitoring trial.

What other tools would be helpful on that page? Let us know in the comments.