Finding the Best Server Providers
2023/08/01 Leave a comment
Great services need great boxes to run on. How do we know if a server or VPS host is performant and reliable?
We use dozens of different hosts for NodePing and our standards for performance and reliability are really high. There are many SaaS out there that host only on AWS. Putting all your eggs in one basket is nice for billing but would make our service fragile and vendor-dependent. We spread our boxes around to make it resilient and better represent the Internet’s disperate architecture for monitoring.
We have to take new boxes out and put them through their paces; kick the tires and make sure they’re solid. This is how we test out a new provider before we use a dedicated server or VPS.
Blacklisted
As soon as we have our IP assignments from the provider, we check to make sure the IPs aren’t listed in any spam blacklists using NodePing RBL checks. Most of our hosts don’t send any actual email but our public probes do a lot of SMTP connections to ensure our customers’ mail servers are functioning properly. If the IPs are blacklisted, we’ll need a clean IP from the provider or cancel and look elsewhere.
We’ll leave this RBL check running once an hour to make sure it doesn’t get listed half way through our testing period.
Blacklisted IPs can be a good indicator of provider quality even if the server won’t be sending any email. A provider that can’t keep spammers out of their service is unlikely to be able to keep a reliable network.
Incoming Traffic
Solid networks can be hard to find. We test for inbound packet-loss and routing issues using NodePing PING checks. We’ll sometimes test from a few different geographical regions to ensure global routing is stable. Anything less than 100% uptime for 30 days is unacceptable for us. If the provider had announced planned maintenance well in advance, we’d use NodePing’s maintenance feature to ensure the uptime stats remained accurate despite planned outages. In our decade-plus experience, a network that sees even one episode of packet-loss or route failure is going to continue to see them and isn’t stable enough for our use.
We’ll do the same for IPv6 addresses as routing and packet-loss can be independent of the IPv4 stack. Some providers have a hard time keeping their IPv6 blocks broadcasted and we’ve seen IPv6 completely fail while IPv4 continued to function normally.
We enable automated diagnostics for all our PING checks so we can see where on the route the packet-loss or routing failure is happening. Getting immediate MTRs can show us the weak links in a network and if we see issues with some of the usual suspects, we will for sure dump it. Yes, I’m looking at you, Cogent!
Outbound Traffic
Sometimes a network issue seems to only impact outbound routing. We use the AGENT functionality to assign additional PING checks to originate from the server being tested towards some of the other servers it would be connecting to if it’s moved into production. The AGENT software will run NodePing checks just like the public probes but originating from our test host. It’s a great way to detect outbound packet-loss and routing issues from the server. Again, anything less than 100% uptime on this test and the service isn’t going to make muster.
System Load
The performance of a VPS can be greatly impacted by issues outside our control. Two of the most frequent system load issues we’ve seen on VPS are noisy neighbors and host server backups.
A good provider won’t oversell their VPS host servers and will suspend anyone who is abusing more than their fair share of resources. If we end up on a box with noisy neighbors, the system load on our VPS will likely spike, starving our processes from getting the CPU, memory, networking, or storage I/O they need to function properly.
We’ve also come across providers where we saw system load rise every Saturday around midnight (GMT) for 30 mins or so. Turned out their backup process was overwhelming the disks and causing load issues on all the VPS on the host.
These types of issues are simple to find using PUSH checks that monitor the system load. Since we aren’t using these boxes for anything yet, we have to set the thresholds pretty low to detect load issues caused by resource starvation. This is one test that we’ll give a bit of slack to a provider if it fails though. Noisy neighbors or hungry backups can happen to any provider and we’ll give them a chance to find and address the cause. If it keeps happening though, pull the plug on that provider. It’ll just be worse once you start using the machine and an ongoing headache trying to get their support to do anything about it.
If a server can keep humming along for 30 days without any of the checks above failing, there’s a pretty good chance that provider and network are going to be solid and reliable. I hope this look into our vetting process will help you with your provider search for those elusive reliable networks and servers.
If you don’t yet use NodePing, please sign up for our free, 15-day trial and see for yourself how our monitoring can increase your uptime.
NodePing’s monitoring services were largely motivated by the desire to make widespread monitoring of web sites and other Internet accessible services as simple and automatic as possible. One of the reasons I’m so excited about our suite of