US SMS? Yeah, we got that…

You asked for it…

For our US customers NodePing now offers unlimited SMS as part of the $10/month plan.  And you thought we couldn’t pack any more value in there!

All you need to do is add your 10-digit phone number to your contact record and to your check notifications and those important UP/DOWN messages will flow directly to your phone.

For customers outside the US, we’re working on direct SMS integration and hope to implement it in the near future.  Until then, please continue to use the email-to-SMS gateway addresses provided by your mobile carrier.  If your mobile operator isn’t listed at the link provided, please contact them directly and ask what your email-to-SMS address is.

Eight things you could do with monitoring checks on 1000 targets

With NodePing, you get checks on up to 1000 targets or services for one flat rate. NodePing’s 1000 service limit is designed to take the lid off of the kinds of limitations you might face with other service providers that charge more for adding checks or services. Once more checks don’t cost you more, what could you do with them? Every once in a while one of our customers has that moment where they realize how much they can do with NodePing that they couldn’t do when adding check targets raised the price. Here are eight things 1000 check targets can allow you to do that you might not do on other services.

  1. Monitor all your web sites and all basic services. OK, this one isn’t very creative, but it has to be said. If you are responsible for your business’s web sites then you need to know if they are down. Web sites that are down are not generating revenue, or if they are internal sites are not enabling your business to operate. If a site is worth having, then it is worth monitoring. This is the main reason people use monitoring in the first place. This goes beyond just web sites. If you are responsible to make sure that a service is available to customers or employees, you should monitor it so that you know immediately if it is unavailable, before someone complains.
  2. Our tongue-in-cheek tag line is “All your nodes are pinged by us,” but why not? With NodePing, now you can ping them all. If you don’t need notifications on all of them, just turn that off on a host by host basis, but you’ll have availability and uptime stats on everything.
  3. Monitor that a web page is showing the right information. This is called a web content check. Some web applications and content systems don’t return a proper 404 error, so to a normal HTTP check the page might appear to be up. A HTTP Content check makes sure the site is up by checking that it contains what you are expecting it to contain. It is often good to set the content to be checked as something that appears in all your pages, such as your copyright statement. This way if the text on the page changes during the normal course of business, your check will still pass.
  4. Monitor that the wrong text isn’t appearing on the page. Some web pages contain dynamic text. This is particularly the case for pages that show feeds, or your most recent news items. We’ve all gone to a site that should have a page with a list of articles or posts, but instead shows a database error or some kind of “No articles found” message. If that’s not what you want people to see, but you don’t know what text to check for because you don’t know what articles will appear, a check that makes sure the page does not contain specific text is the way to go.
  5. Along the same lines, since you have plenty of checks you might want more than one check on the same URL. If you need to watch for more than one error message, or check that multiple widgets or blocks on the page are populating correctly, why not check them all?
  6. Simple cron replacement. Many times web applications have a process that needs to run every so often, maybe every hour or every minute. These are often accessible by hitting a URL. This is often done by using curl or wget in a cron job, but it is easier to set up a check to hit the URL at the right interval. We use this to keep couchdb views fresh. Similarly, it can be used to replace Drupal’s cron job requirements.
  7. Check API’s and other HTTP interfaces. These often don’t get monitored, but they can be a key piece of your business. The HTTP Content check doesn’t care what kind of body the response has, and it will happily check for your text in JSON or XML as well as in HTML. You can monitor that a CouchDb server is saying “Welcome,” for example, or hit a URL that returns a reduced view and look for the value you expect in the results. The same idea applies to SOAP interfaces as well.
  8. Monitor other monitoring. Many systems have a status page that says how services on that host are doing. Frequently they’ll have an OK message, or an ERROR message will appear when things go wrong. HTTP Content checks can be used to watch these pages and send notifications if the wrong thing appears or does not appear on those pages. Both the “Contains” and “Does not contain” options for content checks are useful on this one.

There are many more things you could do with 1000 checks that you might not even consider doing with other services. We plan to add more check types to increase the utility of the service even more. What other things could you think of doing if you aren’t limited by artificial constraints imposed by services that charge by the target service or URL?

Multi-site checks added to NodePing

This week we’ve launched pinghosts in two additional locations with automatic rechecking across locations. So if the New Jersey pinghost reports a monitored site is down, we now automatically recheck it from Texas and California before we send notifications.

How many times we do this depends on your “Sensitivity” setting for the check. The default setting of High rechecks once from each of the other two locations. These rechecks take a few seconds each, and the notification will be sent off in around 30 seconds. A setting of “Very Low” rechecks the service ten times, and with the extra rechecking the notification gets out in around 2 minutes. A setting of “Very High” doesn’t do a recheck at all, so if any check shows the service is not responding as expected we send the notice immediately.

This addition of the multi-site rechecks is one of the features we get asked about most often, and we’re very happy to get it rolled out to our production service. We’d love to hear from you about additional features that you’d like to see in our service.

Monitoring Services Are Poised for a Shake Up

Server monitoring and website monitoring services cost too much and are overly complicated.

Over the past several months we have built and launched NodePing’s site and server monitoring service. Part of that process involved looking at the other companies in this market niche, and finding the opportunities for offering a service that fills a gap in what is currently being provided to customers. What we have found has confirmed our original reasons for starting NodePing.

There are a lot of companies offering site and server monitoring services. However, our experience as consumers of these services was that it was hard to find a provider that did what we needed at a reasonable price, and I think our experience is probably typical. Where’s the disconnect? We wanted a service that would allow us to watch twenty to thirty sites and services for a reasonable price. It is easy for a small to medium business to get to a couple of dozen services needing monitoring. Most companies have at least one or two web sites that need to be available all the time for their customers. Many also have two or three web sites used internally for collaboration and sharing or publishing information to employees (Intranets). Throw in a DNS server or two, a mail service, an accounting system, a key router or two, and you are quickly into double digits on the number of services that need to be checked.

IT departments used to run software like Nagios for this type of thing, and that is still a good option in many cases. Nagios provides a wider set of checks than a typical SaaS monitoring service, there are lots of specialized plugins available, and it is not all that difficult to write custom plugins. If you need specialized checks, a system like Nagios is probably the best bet. On the other hand, while Nagios is free software, running it is not free. It requires a server to run on. Typically you want monitoring to run on separate infrastructure from your normal servers, which often means leasing a server or using a VPS service. Doing this inexpensively typically runs $50-100 a month, and involves a non-trivial amount of technical expertise and work to setup, tune, and maintain. That’s not a huge amount of money, but it is not free.

External providers offer similar services. The majority of companies need HTTP, SMTP, and PING checks. These are the primary checks provided by the bulk of the monitoring as a service industry. These types of services don’t cost much to run. With today’s opportunities to build and deploy cloud based services in cost effective ways that scale well, the cost of these types of services should be fairly low. That’s not currently the case.

A quick search turns up a lot of companies offering these services. Many of them offer “free” or inexpensive services. “Free” monitoring is typically provided for one to five URLs, often with fifteen to thirty minute intervals. That is basically useless. If it is ok for a service to be down for 30 minutes without getting a notification, you probably don’t need monitoring. In my opinion, a price “plan” isn’t a serious offer unless they offer the service in intervals of five minutes or less at that price. Getting beyond that unhelpful “Free” level, many providers start charging by the URL or address you want to monitor. One company prominently advertises checks starting at $1, but again that’s one URL in thirty minute intervals, and it costs $11 for that URL check if you want to do check it every minute. Paying per check or per URL quickly gets expensive. It is not uncommon to find special price calculators on the sites of this kind of provider, which is itself a hint that the pricing is too complicated. At these prices, a fairly typical small to mid-sized company could easily find themselves spending hundreds of dollars a month on monitoring.

There are more competitive options out there. These companies typically cost $40-$60 for a reasonable number of addresses and services. These prices probably save you money compared to running monitoring yourself using something like Nagios. Plus, you don’t need to deal with setting up and maintaining the software. That’s a pretty good deal.

However, it still doesn’t need to cost that much. With modern hosting and technology, the cost per check and even per customer to run these types of services is very low. In fact, just about the only cost of running a service like this that is attributable to an individual account is the credit card processing. All the rest of the costs scale, and are spread in ways that actually decrease per account as you scale up. Unless they are just running very inefficient systems, the total overhead for the companies charging $40-$60 per month (not to mention the ones costing hundreds) should be less than $4 per customer. Of course, the companies advertising “Free” services are also spending dollars a click to get those accounts, and that easily becomes the biggest expense. Meanwhile, allowing their customers to add additional checks or URLs to an account costs the provider pennies. Pricing based on adding checks or URLs is a model completely detached from the economics of running the service.

Experience in running IT departments and talking to system administrators tells us that there are a lot of services that should be checked if best practices were followed that aren’t getting checked. Many companies that use external providers check their company’s primary site, but when adding checks means adding overhead costs (or just the work load), secondary and internal sites don’t get checked. This means that there are millions of services that should be monitored that aren’t getting monitored at all. Companies are just reacting to complaints when something goes down.

To us, this smelled like opportunity. It is not simple to set up a solid monitoring service. However, once the technology, infrastucture and processes are in place, it is a service that scales. The margin stays fairly stable even if you let customers use it as much as they need. This calls for a flat rate model.

Our biggest problem is that we have entered a market that is saturated by misinformation. Buyers assume that this type of service costs at least $40 for a reasonable level of monitoring, and often lots more. They expect to see low entry prices that don’t really meet anybody’s needs, followed by much higher prices for the real service. This becomes a marketing challenge. When shopping for these services, NodePing’s price of a flat $10 for monitoring sounds like one of the entry point bait ads. We say “$10 to monitor up to 1000 services in 1 minute intervals” and people ask “Yes, but what do we really get, and what’s it cost if we actually need to do real world monitoring?”

NodePing’s services really cost $10 a month. Period. There are no add-ons, no “X is available at additional cost”. We set 1000 services as the maximum because we don’t want to monitor IBM’s network (no offense to IBM intended). Our target is small to medium sized businesses, and we want them to monitor everything they want to monitor for one reasonable price. If this model works, maybe others will also move to flat rates. That’s great. We’d be happy to help make the monitoring world make more sense and be more cost effective for businesses. We think we have a solid technology stack and a great service, and we can do quite well even if other providers compete with us directly on price. Until then, there are few if any major providers that really provide the services that our customers need anywhere close to our price.

Monitoring services cost too much and are too complicated. We think this market is set for a change, similar to how the cloud has impacted other technology services. This shift will be a significant benefit to small and medium sized companies that need these services, and it is a fantastic opportunity to providers poised to provide the services the customers need at truly competitive scale and rates. NodePing has positioned itself to provide the services businesses need at a fantastic, flat-rate price.

Website monitoring with a backflip

A standard website monitoring check will fail when the page isn’t returned at all or the web server reports a page missing. What happens when your site is running but there is a problem with dynamic content, like a feed is missing or a list of recent posts is empty?

In those cases the page might be “working” from the web server’s point of view (and so not reported as a 404 or 500 error), but not displaying what you want. You don’t want your visitors to see messages like “Error establishing a database connection” or “0 articles found“.

NodePing HTTP Content check tests if particular text shows up on a given webpage.  Use the setting ‘Contains‘ to be alerted when specific text does NOT appear on a page.  But in this case, we want to receive an alert when our error messages DO appear on the page.  Use the ‘Does not contain‘ setting and the error message text as the search term to be notified when that happens.

For example, if you had an article list that was dynamic, so you never knew exactly what was going to show up there but you know something is wrong if the text ‘0 articles found‘ appears.  Maybe the database is offline or you haven’t written anything recently enough.  You’ll want to receive an alert.

Simply configure a HTTP Content check for the page and switch the text setting to ‘Does not contain‘ and add ‘0 articles found‘ to the text area.  This will check the webpage and as long as it does NOT contain the words ‘0 articles found‘, the check will pass.  If that text ever shows up, the check will fail and you’ll receive an alert, as expected.

There’s a thousand other uses for the HTTP Content check.  Get creative and make sure you’re alerted when errors happen.

5 Basic Questions About Web Site Monitoring

The other day I was looking for a place to get some good Mexican food. That’s fairly easy in my part of the world, but I was looking for somewhere I hadn’t eaten before. I found a place the same way I always do, on the web. I typed my search into a search engine, pulled up a map of my area, and started clicking on web sites. I looked through the menu for each place I found and picked based on my impression of the restaurant from the web site.

I do this same kind of thing for all kinds of businesses at least several times a week. Plumbing parts, accountants, property management companies, mechanics, toys, web design, banks… pretty much everything. If I am looking for a business, I find them on the web. If it’s not on the web, I will probably not find it.

Increasingly business relies on the Internet, even for non-Internet businesses. If your site is down, you lose business. If your email doesn’t go through, you lose business. If your business is technology or web related, this is doubly crucial. People’s impression depends largely on how you come across on the Internet, and if it doesn’t work you are in trouble.

That seems obvious, but how do you ensure that everything works all the time? If you are a large enough business to have a highly skilled IT department, they are using monitoring tools or services. If you are not a large enough business to keep IT staff on the payroll, this largely falls to you. How do you make sure your Internet presence is a plus for your business, bringing in new customers instead of driving them away? How do you do that without spending too much money and too much time?

Monitoring services exist specifically for this purpose. Technology professionals use monitoring services to make sure that the services they are responsible for are always available. This has been the normal way to do business for tech professionals for many years. There are sophisticated tools to help with this job that can monitor all kinds of things and notify someone immediately if there are problems. Until more recently, doing this inexpensively without spending a lot of time was out of reach for most people. Not any more.

A few years ago monitoring as a service started to pop up on the Internet. Now there are a number of companies out there providing these types of services. Some of them are easy to use, some are not. Many of them cost a lot, but a few do not. Some of them sell snake oil and fancy gadgets that don’t really tell you what you need to know. Increasingly, smart IT technical people are realizing that they can save time and money by using outside services to do things they had to do before themselves, and these same services are available to everybody without requiring a significant investment or a lot of knowledge.

If you are not a tech professional, and you are thinking about finding a way to make sure that your Internet presence is always there when your customers are looking for you, you might be asking questions like these:

  1. Is it easy? There is absolutely no reason monitoring should be hard to set up or use. If it is hard or takes you more than a few minutes to get going, chose a different service provider. You don’t need to know a lot to use a good monitoring service. In most cases all you need is the address of your web site or email service.
  2. How does it work? Monitoring services mostly all work about the same way. You login to the web site and create checks for the monitoring service. Setting up the check generally consists of typing in the address of your web sites and how often you want them checked. Sometimes there are a few other simple questions, but it doesn’t need to be more complicated than that. You also enter in email addresses or phone numbers to notify when your site or service is down. Typically the service takes it from there and starts monitoring right away. Monitoring services just connect to your site and log what happened. It’s all automated.
  3. How do I know what I need? Just about any monitoring service will do the checks that most businesses need. If you have specific needs in your monitoring, this might be something to shop around for. However, most businesses need HTTP checks, which is the basic check that makes sure a web site is up, and SMTP, which checks email services. Just about all monitoring companies do HTTP checks, and most of them do SMTP.
  4. How often should it check? This is up to you, but if you are using a service that checks every 10 or 15 minutes, your site could be down for several minutes before you know about it. The better services check as frequently as every minute. This is not a lot more expensive for the service to provide, and it should not cost you a lot more either.
  5. How much will it cost? This is currently the biggest differentiator in the monitoring business. Some monitoring services cost a lot, especially if you have more than a couple of web sites to watch. It doesn’t have to be expensive. NodePing costs a flat rate of $10 per month to check up to a thousand sites or services every minute. If you’re paying more than that, you’re paying too much. If a service needs a special calculator or a talk with a sales person to tell you how much it will cost, it’s too much. If prices are per check, read the fine print. It should be inexpensive, and it should be simple.

If you are not doing website and email monitoring yet you should start today. We think that NodePing is a great choice, but there are other good providers out there. Shop around. It is important to your success, it is easy, and it is inexpensive. You just have to do it.

Why we chose Node.js for server monitoring

NodePing’s server monitoring service was built from the front-end webapp to the backend SMTP requests, in 100% Node.js.  For those who may not be familiar with it, Node.js is server-side javascript using Google’s famed V8 engine.  It’s that engine that makes your Chrome browser so fast at javascript processing and NodePing so efficient at service monitoring.

Arguably, Node.js’ most interesting feature is the performance of its evented, asynchronous, non-blocking IO. In javascript fashion, the vast majority of IO functions use callbacks to handle the ‘results’. This allows the logic of NodePing to branch out in several directions without IO processes blocking others. This handling works when talking to databases, reading and writing files, and talking to other machines via network protocols.

Asynchronous, non-blocking, network chatter sounds like something a server monitoring service could use. So instead of running 1500 checks in series, one after another, each taking maybe hundreds of milliseconds to complete, we’re able to start hundreds of checks, one after another, without having to wait for the return results. For example, we may start an HTTPS request, move on to start 3 PINGs, 5 SMTP checks, and hundreds of other checks before the first HTTPS response has returned with the status code and a block of data from the webpage we requested. At that point Node.js processes the return information using a callback that we fed into the function when we started the request. That’s the magic of Node.js.

One limitation of Node.js is all that branching of a single process is bound to a single CPU.  A single Node.js script is unable to leverage the hardware of today’s multi-core, multi-cpu servers.  But we’re able to use Node.js’ “spawn” command to create multiple instances of our service checking processes, one for each CPU on the server and then balance our check load across the multiple running processes to make full use of the hardware.

Having non-blocking network IO allows our check servers to run thousands of more checks than our competitors with fewer resources.  Fewer resources means fewer and cheaper servers which means less overhead.  That’s how we’re able to charge only $10/month for 1 minute checks on 1000 target services.  You won’t find a better deal anywhere – you can thank the folks over at the Node.js community for that.

I’m sure some will be quick to point out there are other languages that can do the same thing, some of them probably better at one particular thing or another than Node.js and I won’t argue with most of them.  We think the way Node.js handles network IO makes it a great choice for a server monitoring service and if you give NodePing’s 15-day, risk-free trial a shot, we think you’ll agree.

10 Common Server Monitoring Mistakes

Server monitoring is an essential part of any business environment that has services.  Even if you don’t have your own servers and use cloud-based services, you’ll want to know about downtime.  You don’t want to find out your web site is down from customers and you don’t want your boss to be the one to point out the email server has wandered off into the weeds.  Done properly, server monitoring alerts those responsible for the services the minute they’re unavailable, allowing them to respond quickly, getting things back up and running.

David and I have been responsible for servers and server monitoring for years and have probably made nearly all the mistakes possible while trying to do it properly.  So listen to the war stories from a couple of guys with scars and learn from our mistakes.

Here are 10 common server monitoring mistakes we’ve made.

1. Not checking all my servers

Yeah it seems like a no-brainer but when I have so many irons in the fire, it’s hard to remember to configure server monitoring for all of them.  Some more commonly forgotten servers are:

  • Secondary DNS and MX servers.  This ‘B’ squad of servers usually gets in the game when the primary servers are offline for maintenance or have failed.  If I don’t keep my eye on them too, they may not be working when I need them the most.
  • New servers.  Ah, the smell of fresh pizza boxes from Dell!  After all the fun stuff (OS install, configuration, hardening, testing, etc) the two most forgotten ‘must-haves’ on a new server are the asset tag (anybody still use those?) and setting up server monitoring.
  • Temporary/Permanent servers.  You know the ones I’m talking about.  The ‘proof of concept’ development box that was thrown together from retired hardware that has suddenly been dubbed as ‘production’.  It needs monitoring too.

2. Not checking all services on a host

We know most failures take the whole box down but if I don’t watch each service on a host, I could have a running website while FTP has flatlined.

The most common one I forget is to check both HTTP and HTTPS.  Sure, it’s the same ‘service’ but the apache configuration is separate, the firewall rules are likely separate, and of course HTTPS needs a valid SSL certificate.  I’ve gotten the embarrassing calls about the site being ‘down’ only to find out that the cert had expired.  Oh, yeah… I was supposed to renew that, wasn’t I.

3. Not checking often enough

Users and bosses have very little tolerance for downtime.  A lesson learned when trying to use a cheap monitoring service  that only provided 10 minute check intervals.  That’s up to 9.96 minutes of risk (pretty good math, huh?) that my server might be down before I’m alerted.  Configure 1 minute check intervals on all services.  Even if I don’t need to respond to them right away (a development box that goes down in the middle of the night), I’ll know ‘when’ it went down to within 60 seconds which could be helpful information when slogging through the logs for root cause analysis later.

4. Not checking HTTP content

Standard HTTP check is good… but the ‘default’, ‘under-construction’ Apache server page has given me that happy 200 response code and a green ‘PASS’ in my monitoring service just like my real site should.  Choose something in the footer of the page that doesn’t change and do an HTTP content matching check on that.  Don’t use the domain name though – that may show up in the ‘default’ page too and make that check less useful.

5. Not setting the correct timeout

Timeouts for a service are very subjective and should be configurable on your monitoring service.  Web guys tell me our public website should load under 2 seconds or our visitors will go elsewhere. If my HTTP service check is taking 3.5 seconds, that should be considered a FAIL result and someone should be notified.  Likewise, if I had a 4 second ‘helo’ delay configured in my sendmail, I’d want to move that timeout above that.

Timeouts set to high let my performance issues go unnoticed; timeouts set too low just increase my notification noise. It takes time to tweak these on a per-service level.

6. Not realizing external and internal monitoring are different

When having an external monitoring service watch servers behind my firewalls, I may need to punch some holes in said firewall for that monitoring to work properly.  This can be a real challenge sometimes as many monitoring services use multiple locations and then dynamically pick one to monitor my servers making it hard to maintain a white-list of their IPs or hostnames to let in my network.

Another gotcha I’ve run into is resolution of internal and external DNS views.  If these aren’t configured properly, you’ll most likely get lots of ‘down’ notifications for hosts that are simply unreachable.

7. Sensitivity too low/high

Some servers or services seem more prone to having little hiccups that don’t take the server down but may intermittently cause checks to fail due to traffic or routing or maybe the phase of the moon. Nothing’s more annoying than a 3AM ‘down’ SMS for a host that really isn’t down.  Some folks call this a false positive or flapping- I call it a nuisance.  Of course I should jump every time a single ping looses its way around the interwebs and every SMTP helo goes unanswered – but reality sets in and a more dangerous condition might occur – I may be tempted to start ignoring notifications because of all the false positives.

A good monitoring service handles this nicely by allowing me to adjust the sensitivity of  each check.  Set this too low and my notifications for legitimate down events take too long to reach me but set it too high and I’m swamped with useless false positive notifications.  Again, this is something that should be configured per service and will take time to tweak.

8. Notifying the wrong person

Nothing ruins a vacation like a ‘host down’ notification.  Sure, I’ve got backup sysadmins that are covering it but I forgot to change the service so notifications get delivered to them and not me.

Another thing I’ve forgotten to take into consideration is notification time windows.  John’s always the first in the office at 6AM, he should get the alerts until Billy shows up at 9AM because we all know Billy is useless until he’s had that first hit of coffee.

9. Not choosing the correct notification type

Quick on the heels of #8 is knowing which type of notification to send. Yeah, I’ve made the mistake of configuring it to send email alerts when the email server is down.  Critical server notifications should almost always send via SMS.

10. Not whitelisting the notification system’s email address

Quick on the heels of #9 (we’ve got lots of heels around here) is recognizing that if I don’t whitelist the monitoring service’s email address – it may end up in the bit bucket.  Mental note – dang, all out of mental note paper.

Bonus!

11. Paying too much

I’ve paid hundreds of dollars a month for a mediocre monitoring service for a couple dozen servers before.  That’s just stupid.  NodePing costs $10 a month for 1000 servers/services at 1 minute intervals and we’re not the only cost effective monitoring service out there.  Be sure to shop around to find one that fits your needs well.  Know that most services are charging way too much though.

They say a wise man learns from his mistakes but a wiser man learns from the mistakes of the wise man.  Nuff said, true believer.

Using iptables to balance Node.js processes

One of the challenges in building a web application on any platform is making sure it can handle enough visitors. That’s a fairly well known and understood challenge for apps hosted on Apache, which is where most of our experience has been. It’s a new ballgame when we’re talking about Node.js based apps.

NodePing HomeThe NodePing web application is all Node.js on the server, with lots of jQuery driven Ajax on the client. We actually have two different request patterns within the web app. The application piece is used by customers managing their monitoring checks and accounts. That part is a single page app (SPA). The “static” content pages are simpler. From a request and response point of view they are a more traditional web page, in part so that they’ll be easily crawlable. The components that actually run checks and process results for the NodePing services are a whole different thing, which we’ll write a post about later. This post is just about the web application.

Early on we started looking for information about how fast we should be able to go with Node.js. Good information was hard to find. Most of the benchmark type articles on the net are very simple test cases. They show Node.js doing very well on requests per second, but these typically are just responding with an almost empty response. Of course, they were comparing it to other servers handling similarly simple requests, so those results are fine for what they are trying to do but not really applicable. What happens when you start throwing in real world queries and processing that we’ll see in our web application?

The real answer is we don’t know, and even if the published benchmarks included more data with more processing to handle the requests we still wouldn’t know because they aren’t running our code on our setup. We need to get Slashdotted to find out for sure, and/or get large numbers of customers so we have thousands of real requests to the single page web app. Both of those would be interesting days. We have run a bunch of tests with ab and siege. I’m not going to report numbers, because they won’t be much more useful than the benchmarks I found. The fact is you have to build something and see how it works in your particular case. Feel free to help us get lots of customers, and we’ll report more on how we were able to handle the load in Node.js. It has to be real customers. We already know how we do under simulated load.

What we found in our early testing is that we were running out of performance in our app well before we wanted to. On most servers with at least moderate amounts of memory this was a matter of not enough processing power. We’d hit the host with an ab or siege test, and the processor would peg very shortly.

After looking at various options (mostly making sure we weren’t wasting processing in the code), we concluded we just needed to throw more processing at the app. Node.js typically runs in a single process. We needed to be able to utilize more cores. With Node.js, the most obvious way to do that is to start multiple processes and then balance the requests between the processes.

In our case we’re dealing with a web application, and the logic it needs to run isn’t very intensive. It is mostly serving up data. Each individual request doesn’t need access to more processing. So we don’t need interprocess communications, we just need to be able to run more of them. Also, most page content is in Redis or databases, which are shared, so we don’t even care if requests within a session hit the same process.

The first thing we looked at was various proxy front ends. There are several that might work. One of my favorites is node-http-proxy from nodejitsu.

In the end, we decided the simplest and fastest approach was using iptables to split the requests between multiple processes. We were already forwarding traffic to the server’s port so that we could run the service on a high port and easily run it as a non-root user. I needed to get this going quickly, so I just copied my main index.js file (which basically just starts the server and loads modules) to several files, with the port for each process in each file. It would have been fairly trivial to do this dynamically, or accept it as a command line parameter, but this was quick and it would work well for scripted starts. I ended up with files called something like index8001.js, index8002.js, and so on, with one file for each process I wanted to run.

All that’s left is the iptables bit. We are going to redirect the traffic using iptables statistics module. This could be done randomly, which should end up passing requests fairly evenly. Or it could be done using the nth mode, which forwards the nth request it sees that matches the rule. I opted for the nth mode approach.

Figuring out how to do this was a little tricky, because the statistics module has evolved and there is a lot of old information out there, plus some that is just wrong.

The rules we needed looked something like this:
-A PREROUTING -p tcp --dport 80 -j REDIRECT --to-ports 8001 -m statistic --mode nth --every 3
-A PREROUTING -p tcp --dport 80 -j REDIRECT --to-ports 8002 -m statistic --mode nth --every 2
-A PREROUTING -p tcp --dport 80 -j REDIRECT --to-ports 8003
-A PREROUTING -p tcp --dport 443 -j REDIRECT --to-ports 8101 -m statistic --mode nth --every 3
-A PREROUTING -p tcp --dport 443 -j REDIRECT --to-ports 8102 -m statistic --mode nth --every 2
-A PREROUTING -p tcp --dport 443 -j REDIRECT --to-ports 8103

Most of this is just normal forwarding. In this example we are forwarding traffic to port 80 to ports 8001, 8002, and 8003. Traffic to port 443 we forward to ports 8101, 8102, and 8103. We want a third of the traffic to each destination port to go to each process. Our actual number of ports will vary based on how many processes we want to use on a specific host, which depends in part on how many cores we want to use.

These are terminating targets. That is, once we’ve forwarded a given request we’re done and ready for the next one. We don’t ever forward to port 8001 and also port 8002. So the first rule we want every third request. The other two requests pass to the next line. Here, we want to forward every other request, since 1/3 are already being handled by the previous line. The third line for this destination port doesn’t need the statistics module at all, since it only ever sees every third request and should always forward all the traffic for port 80 that reaches this rule.

Some examples on the Internet list similar set ups with ‘–every 3’ on each of the three lines for port 80. That dumps a few of the requests out on the floor of the data center. The first line picks off one third of the requests, leaving 2/3 of the remaining requests to pass to the next line. If it is set to every 3, it picks off a third of those 2/3. The last line would then pick off a third of what’s left from that. That leaves something like 30% of our requests unhandled. That is bad.

This is not a high availability solution, it is just to spread load between processes. We’re using Forever to run the individual processes. That works fairly well, and we don’t really need to be concerned about fail over between processes on the single server. Load balancing between servers needs to preserve sessions, and is a different scenario from what I’ve described here. Watching the traffic come into this setup spreads the requests across all of the processes, effectively using all of our cores. On two processes we approximately double the number of requests we can handle per second. Four processes can handle roughly four times the number of requests. That is good.

Dear PHP, I’m leaving and yes, she’s sexier

Dear PHP,

I know this letter won’t come as much of a surprise to you.  We’ve been growing apart for a while now but today we officially part ways.

It wasn’t easy to write this.  You and I have a lot of history.  Hard to believe it was over 10 years ago when you welcomed me into your arms.  You were young, sexy, and a breath of fresh air compared to my ex, Perl (shudder – lets not go there).  It didn’t take long for our relationship to start paying the bills.  In fact, every job I’ve had in the last decade had you on a pedestal.

We had plenty of good times.  Remember how we survived a front-page CNN link and pulled in $500k in 14 days?  And all the dynamic PDF creation over the years still brings a smile on my face.

But we had our rough times too.  I could go the rest of my life without ever hearing the words ‘register globals‘ again.  And you know quite well that I still bear the scars from creating SOAP clients with you.  While neither of us has been truly faithful (what ever happened to V6 and UTF-8?), we’ve always been able to work out our differences up to now.

But starting tomorrow, for the first time in 10 years, my ‘day job’ won’t include you.  I’m leaving you for node.js.  We were introduced by our mutual friend, JQuery.  At first I thought she was just the latest flavor of the month; popular among the guys on the mailing lists but now I’ve fallen victim to her async charm and really think we have a future together.

When you and I started having fun on the couch a year ago, I thought maybe our relationship would just keep going.  But then node and I spent some time on that couch and – oh, what she can do with JSON makes my toes curl.  To be perfectly honest, you just can’t compete with her.  She’s all that you used to be – young, sexy, and fast.  I’m sure some of your other boyfriends will probably argue with me about it but I’ve been smitten.  While they’re fighting with you over namespacing, she and I will be branching in non-blocking ways and spawning like crazy.

I’m not saying we’ll never see each other again – heck, you’re even serving this blog.  But I’m moving on and I hope you will too.  If you want to see what node.js and I are up to, stop by some time.  Maybe we can even help you keep an eye on all those fatal exceptions of yours.

Sincerely,
Shawn