5 Basic Questions About Web Site Monitoring

The other day I was looking for a place to get some good Mexican food. That’s fairly easy in my part of the world, but I was looking for somewhere I hadn’t eaten before. I found a place the same way I always do, on the web. I typed my search into a search engine, pulled up a map of my area, and started clicking on web sites. I looked through the menu for each place I found and picked based on my impression of the restaurant from the web site.

I do this same kind of thing for all kinds of businesses at least several times a week. Plumbing parts, accountants, property management companies, mechanics, toys, web design, banks… pretty much everything. If I am looking for a business, I find them on the web. If it’s not on the web, I will probably not find it.

Increasingly business relies on the Internet, even for non-Internet businesses. If your site is down, you lose business. If your email doesn’t go through, you lose business. If your business is technology or web related, this is doubly crucial. People’s impression depends largely on how you come across on the Internet, and if it doesn’t work you are in trouble.

That seems obvious, but how do you ensure that everything works all the time? If you are a large enough business to have a highly skilled IT department, they are using monitoring tools or services. If you are not a large enough business to keep IT staff on the payroll, this largely falls to you. How do you make sure your Internet presence is a plus for your business, bringing in new customers instead of driving them away? How do you do that without spending too much money and too much time?

Monitoring services exist specifically for this purpose. Technology professionals use monitoring services to make sure that the services they are responsible for are always available. This has been the normal way to do business for tech professionals for many years. There are sophisticated tools to help with this job that can monitor all kinds of things and notify someone immediately if there are problems. Until more recently, doing this inexpensively without spending a lot of time was out of reach for most people. Not any more.

A few years ago monitoring as a service started to pop up on the Internet. Now there are a number of companies out there providing these types of services. Some of them are easy to use, some are not. Many of them cost a lot, but a few do not. Some of them sell snake oil and fancy gadgets that don’t really tell you what you need to know. Increasingly, smart IT technical people are realizing that they can save time and money by using outside services to do things they had to do before themselves, and these same services are available to everybody without requiring a significant investment or a lot of knowledge.

If you are not a tech professional, and you are thinking about finding a way to make sure that your Internet presence is always there when your customers are looking for you, you might be asking questions like these:

  1. Is it easy? There is absolutely no reason monitoring should be hard to set up or use. If it is hard or takes you more than a few minutes to get going, chose a different service provider. You don’t need to know a lot to use a good monitoring service. In most cases all you need is the address of your web site or email service.
  2. How does it work? Monitoring services mostly all work about the same way. You login to the web site and create checks for the monitoring service. Setting up the check generally consists of typing in the address of your web sites and how often you want them checked. Sometimes there are a few other simple questions, but it doesn’t need to be more complicated than that. You also enter in email addresses or phone numbers to notify when your site or service is down. Typically the service takes it from there and starts monitoring right away. Monitoring services just connect to your site and log what happened. It’s all automated.
  3. How do I know what I need? Just about any monitoring service will do the checks that most businesses need. If you have specific needs in your monitoring, this might be something to shop around for. However, most businesses need HTTP checks, which is the basic check that makes sure a web site is up, and SMTP, which checks email services. Just about all monitoring companies do HTTP checks, and most of them do SMTP.
  4. How often should it check? This is up to you, but if you are using a service that checks every 10 or 15 minutes, your site could be down for several minutes before you know about it. The better services check as frequently as every minute. This is not a lot more expensive for the service to provide, and it should not cost you a lot more either.
  5. How much will it cost? This is currently the biggest differentiator in the monitoring business. Some monitoring services cost a lot, especially if you have more than a couple of web sites to watch. It doesn’t have to be expensive. NodePing costs a flat rate of $10 per month to check up to a thousand sites or services every minute. If you’re paying more than that, you’re paying too much. If a service needs a special calculator or a talk with a sales person to tell you how much it will cost, it’s too much. If prices are per check, read the fine print. It should be inexpensive, and it should be simple.

If you are not doing website and email monitoring yet you should start today. We think that NodePing is a great choice, but there are other good providers out there. Shop around. It is important to your success, it is easy, and it is inexpensive. You just have to do it.

Using iptables to balance Node.js processes

One of the challenges in building a web application on any platform is making sure it can handle enough visitors. That’s a fairly well known and understood challenge for apps hosted on Apache, which is where most of our experience has been. It’s a new ballgame when we’re talking about Node.js based apps.

NodePing HomeThe NodePing web application is all Node.js on the server, with lots of jQuery driven Ajax on the client. We actually have two different request patterns within the web app. The application piece is used by customers managing their monitoring checks and accounts. That part is a single page app (SPA). The “static” content pages are simpler. From a request and response point of view they are a more traditional web page, in part so that they’ll be easily crawlable. The components that actually run checks and process results for the NodePing services are a whole different thing, which we’ll write a post about later. This post is just about the web application.

Early on we started looking for information about how fast we should be able to go with Node.js. Good information was hard to find. Most of the benchmark type articles on the net are very simple test cases. They show Node.js doing very well on requests per second, but these typically are just responding with an almost empty response. Of course, they were comparing it to other servers handling similarly simple requests, so those results are fine for what they are trying to do but not really applicable. What happens when you start throwing in real world queries and processing that we’ll see in our web application?

The real answer is we don’t know, and even if the published benchmarks included more data with more processing to handle the requests we still wouldn’t know because they aren’t running our code on our setup. We need to get Slashdotted to find out for sure, and/or get large numbers of customers so we have thousands of real requests to the single page web app. Both of those would be interesting days. We have run a bunch of tests with ab and siege. I’m not going to report numbers, because they won’t be much more useful than the benchmarks I found. The fact is you have to build something and see how it works in your particular case. Feel free to help us get lots of customers, and we’ll report more on how we were able to handle the load in Node.js. It has to be real customers. We already know how we do under simulated load.

What we found in our early testing is that we were running out of performance in our app well before we wanted to. On most servers with at least moderate amounts of memory this was a matter of not enough processing power. We’d hit the host with an ab or siege test, and the processor would peg very shortly.

After looking at various options (mostly making sure we weren’t wasting processing in the code), we concluded we just needed to throw more processing at the app. Node.js typically runs in a single process. We needed to be able to utilize more cores. With Node.js, the most obvious way to do that is to start multiple processes and then balance the requests between the processes.

In our case we’re dealing with a web application, and the logic it needs to run isn’t very intensive. It is mostly serving up data. Each individual request doesn’t need access to more processing. So we don’t need interprocess communications, we just need to be able to run more of them. Also, most page content is in Redis or databases, which are shared, so we don’t even care if requests within a session hit the same process.

The first thing we looked at was various proxy front ends. There are several that might work. One of my favorites is node-http-proxy from nodejitsu.

In the end, we decided the simplest and fastest approach was using iptables to split the requests between multiple processes. We were already forwarding traffic to the server’s port so that we could run the service on a high port and easily run it as a non-root user. I needed to get this going quickly, so I just copied my main index.js file (which basically just starts the server and loads modules) to several files, with the port for each process in each file. It would have been fairly trivial to do this dynamically, or accept it as a command line parameter, but this was quick and it would work well for scripted starts. I ended up with files called something like index8001.js, index8002.js, and so on, with one file for each process I wanted to run.

All that’s left is the iptables bit. We are going to redirect the traffic using iptables statistics module. This could be done randomly, which should end up passing requests fairly evenly. Or it could be done using the nth mode, which forwards the nth request it sees that matches the rule. I opted for the nth mode approach.

Figuring out how to do this was a little tricky, because the statistics module has evolved and there is a lot of old information out there, plus some that is just wrong.

The rules we needed looked something like this:
-A PREROUTING -p tcp --dport 80 -j REDIRECT --to-ports 8001 -m statistic --mode nth --every 3
-A PREROUTING -p tcp --dport 80 -j REDIRECT --to-ports 8002 -m statistic --mode nth --every 2
-A PREROUTING -p tcp --dport 80 -j REDIRECT --to-ports 8003
-A PREROUTING -p tcp --dport 443 -j REDIRECT --to-ports 8101 -m statistic --mode nth --every 3
-A PREROUTING -p tcp --dport 443 -j REDIRECT --to-ports 8102 -m statistic --mode nth --every 2
-A PREROUTING -p tcp --dport 443 -j REDIRECT --to-ports 8103

Most of this is just normal forwarding. In this example we are forwarding traffic to port 80 to ports 8001, 8002, and 8003. Traffic to port 443 we forward to ports 8101, 8102, and 8103. We want a third of the traffic to each destination port to go to each process. Our actual number of ports will vary based on how many processes we want to use on a specific host, which depends in part on how many cores we want to use.

These are terminating targets. That is, once we’ve forwarded a given request we’re done and ready for the next one. We don’t ever forward to port 8001 and also port 8002. So the first rule we want every third request. The other two requests pass to the next line. Here, we want to forward every other request, since 1/3 are already being handled by the previous line. The third line for this destination port doesn’t need the statistics module at all, since it only ever sees every third request and should always forward all the traffic for port 80 that reaches this rule.

Some examples on the Internet list similar set ups with ‘–every 3’ on each of the three lines for port 80. That dumps a few of the requests out on the floor of the data center. The first line picks off one third of the requests, leaving 2/3 of the remaining requests to pass to the next line. If it is set to every 3, it picks off a third of those 2/3. The last line would then pick off a third of what’s left from that. That leaves something like 30% of our requests unhandled. That is bad.

This is not a high availability solution, it is just to spread load between processes. We’re using Forever to run the individual processes. That works fairly well, and we don’t really need to be concerned about fail over between processes on the single server. Load balancing between servers needs to preserve sessions, and is a different scenario from what I’ve described here. Watching the traffic come into this setup spreads the requests across all of the processes, effectively using all of our cores. On two processes we approximately double the number of requests we can handle per second. Four processes can handle roughly four times the number of requests. That is good.

Someone should create a service like that!

Some time ago Shawn and I were lamenting about what a pain network and service monitoring can be. There are some very good open source applications out there for doing this sort of thing. We’ve both used several versions of Nagios, and it works really well. If you need to run your own monitoring or write your own custom plugins (which we’ve done in the past), that’s a good option. If what you want is to monitor a bunch of services easily without having to put up and maintain another server just for that, a service that does it for you is more attractive.

There are a number of services out there that do pings, HTTP checks and a variety of other checks with notification. Some of them even start out at low cost or free, but if you have more than a handful of hosts and ports to monitor, they get pricey fast, or they don’t let you check very often, or they have some other catch that makes them just not do what you want. Some of them you need a special graduate degree from MIT to understand the pricing.

So we were trying to figure out how to get monitoring done reliably and cost effectively for a set of services we were responsible for at the time, and saying to each other “Someone should create a service that is easy, just does what you need it to do reliably, and doesn’t cost a lot.” Someone, as it turns out, was us.

More recently Shawn and I were once again chatting about the kinds of things geeks talk about, and one of those things was Node.js. I had been working on a few projects just as a proof of concept. It was clear that Node.js has some real strengths for writing scalable asynchronous services. In the course of the conversation, it occurred to us that we could create a service that would scale to many thousands of checks with very low incremental cost. If someone wanted to check thirty or fifty hosts every minute, the cost would be very similar to checking three sites every fifteen minutes. NodePing was born.

The name NodePing stuck with us, not because it uses Node.js (although it does, and we’re proud of that fact), but because “node” refers generically to something on the network. Of course, it’s much wider than that, and the most common checks don’t turn out to be pings. We think the name NodePing conveys “checks on things on the ‘net” well, even beyond pinging nodes. Our goal is to let you check what you want, when you want, for not much money.

As we wrap up our initial testing (with thanks to our beta testers for some great feedback) and move towards taking on customers in real quantity, I recognize that getting here has been quite a process from that first conversation about how lousy the options were for monitoring. I wish this service had been available when I was responsible for a range of web and email services years ago. It would have made life easier, for a great value at the price. We hope you see it that way too.

What do you want from a monitoring service? We are creating the service we wished we’d have had. What would you add? Is there something you’ve been frustrated about monitoring services, and just wish someone would fix already? Let us know!