Monitoring Memory Usage in Node Applications

I recently started a new node.js project that stored a good bit of data in memory, and talked to clients over web sockets as well as providing a REST interface. The combination of components meant there were several spots that I wanted to monitor.

For the WebSockets, I hooked up NodePing’s WebSocket check. In this case, I didn’t need it to pass any data, just connect and make sure the WebSocket interface was accessible. This was a quick win, and in seconds I was monitoring the WebSockets interface of my new app.

One of our concerns with this particular application was how much memory it would use over time, since it retains quite a bit of data in memory for as long as the server is running. I was curious about how much memory it would end up using in real use. I was also concerned about any memory leaks, both from the data caching and the use of buffers as the system interacted with WebSockets clients. This called for monitoring memory usage over time and getting notifications if it passed certain thresholds.

To accomplish this, I used NodePing’s HTTP Parse check. In my node app, I created a route in the REST interface that would return various statistics, including record counts in the cache and a few other things I was curious about. The key piece for monitoring purposes, though, was the output from the node.js process.memoryUsage() call. This gives me a nice JSON object with rss, heapTotal, and heapUsed numbers. This was in combination with some other stats I wanted to capture, and the output looked something like this:

 {
  "server": {
    "osuptime": 22734099.6647312,
    "loadavg": [
      0.17041015625,
      0.09814453125,
      0.12451171875
    ],
    "freemem": 1883095040,
    "processmem": {
      "rss": 77959168,
      "heapTotal": 63371520,
      "heapUsed": 30490000
    }
  }
}

Next I added an HTTP Parse check in NodePing, and added the rss and heapUsed fields to the check. For the check setup, we use JSONPath syntax, so the full fields looked like this:
server.processmem.rss
server.processmem.heapUsed

At first I was a little startled by the results of this. The rss figure is consistently a good bit higher than the heapUsed number, and it climbs slowly over time. At first glance this looks like the system has a memory leak. However, it turns out this is normal for node.js applications. Node manages the memory used internally, and the rss figure shows what’s been allocated at some point and is still reserved, but not what’s actually in use. The heapUsed figure, on the other hand, does reflect Node’s periodic garbage collection.

I found the HTTP Parse check to be perfect for watching memory usage and checking for a memory leak in my Node application. The key was capturing the heapUsed as reported by Node. In my case I had the check grab this information once a minute, and I quickly had a handy chart showing the total memory usage of my application over time. As a result, trends become quickly apparent and I can see how my memory usage grows and shrinks as Node manages its memory usage.

Since I was hitting the REST interface of my application once a minute to collect the memory information, this had the side benefit of notifying me if the REST interface ever goes down. If I wanted to chart the REST interface’s response time, I’d add a separate HTTP check.

At NodePing, a lot of what we’ve built originated in our own experiences building and supporting Internet-based services. This is another example of how we have used our own monitoring systems internally to help us build and maintain our own systems.  If you  and haven’t tried out NodePing’s monitoring, check out our 15-day free trial.

BigCouch 0.4 on Ubuntu 12.04

NodePing uses CouchDB and it’s big brother, BigCouch extensively for our database needs. After getting some shiny new hardware from the great guys over at Codero, we wanted to install BigCouch on the fresh new Ubuntu 12.04 release but, alas, the fine folks at Cloudant haven’t yet updated their repository for Ubuntu 12.04 so we took a swing at building from source.

First, we had to install some dependencies:

apt-get install erlang libicu48 libicu-dev libcurl4-openssl-dev zip autoconf2.13 runit

This took a while since the official Ubuntu repos are still pretty slow from the new release last week.

Download and install Spidermonkey (the javascript engine CouchDB uses). BigCouch requires version 1.92 so the older version in the Ubuntu repos won’t work.

wget http://ppa.launchpad.net/commonjs/ppa/ubuntu/pool/main/s/spidermonkey/spidermonkey_1.9.2.orig.tar.gz
tar zxvf spidermonkey_1.9.2.orig.tar.gz 
cd spidermonkey-1.9.2/src 
autoconf2.13 
./configure 
make 
make install

Grab the BigCouch code from git. At time of writing, it’s 0.4

git clone git://github.com/cloudant/bigcouch.git
cd bigcouch
./configure
make
make install

Now BigCouch is installed at /opt/bigcouch

We create a non-root user to run BigCouch under. We’ll use the same username that the official BigCouch binaries use so we can steal some start up scripts.

useradd bigcouch
chown bigcouch:bigcouch /opt/bigcouch -R

I stole the sv startup files/folders from an older Ubuntu BigCouch release and copied it to my home directory and then dropped it in place.

cp -r /home/mysupersecrethomefoldername/bigcouch /etc/sv/
ln -s /etc/sv/bigcouch /etc/service/bigcouch

Edit your config files at /opt/bigcouch/etc/vm.args and /opt/bigcouch/etc/local.ini following the instructions in the ‘Configure your nodes‘ section here then you should be able to start up BigCouch.

sv start bigcouch

Big thanks to BigCouch and CouchDB developers for their great work. NodePing couldn’t do what it does without you doing what you do!

Server Monitoring from Europe

NodePing is happy to announce we’ve added a new region to our check locations. You can now choose to run your checks from our North American or new European regions… or both!

Website Monitoring

We’ve heard from many of our customers with a presence in Europe that a check location on that side of the pond would be a huge benefit. Thanks to our great providers, IntroVPS and RobHost, we added 4 new probe servers in the following locations:

  • London, England
  • Amsterdam, Netherlands
  • Falkenstein, Germany
  • Bucharest, Romania
Check out our FAQ for ip addresses and more information on our check locations.

We’ve also introduced the idea of a ‘default region‘ for your NodePing account. For new customers, you’ll set your default region when you sign up. Current customers have their default region set to ‘North America’. The default region can be changed in the ‘Scheduling‘ tab. When you create checks, they will be automatically run from your default region. You may, of course, decide to run any check from any region (including multiple regions and ‘worldwide’) by simply choosing the desired region from the check configuration when you create it. You can also change an existing check by selecting a different region in the check configuration at any time.

When you assign your checks to a region, they will be run from a random server in that region. If an ‘up/down’ event is detected, NodePing will immediately and automatically recheck from other servers in that same region. The number of rechecks is based on your configuration of the check – the default is 2 rechecks. After verifying the ‘up/down’ status on other servers in the same region, we’ll send out your configured email and SMS notifications.

Everything is included with NodePing so the new European region checking is already part of your flat-rate $10 a month subscription which includes your 1000 site/server checks, unlimited logins, contacts, contact groups, emails and international SMS. If you’re not already a customer, sign up for your free 15-day trial.

We’re already planning to add even more check regions to NodePing – Oceania maybe? South America? East Asia? Let us know in the comments what new region you’d like to see added next.

Why we chose Node.js for server monitoring

NodePing’s server monitoring service was built from the front-end webapp to the backend SMTP requests, in 100% Node.js.  For those who may not be familiar with it, Node.js is server-side javascript using Google’s famed V8 engine.  It’s that engine that makes your Chrome browser so fast at javascript processing and NodePing so efficient at service monitoring.

Arguably, Node.js’ most interesting feature is the performance of its evented, asynchronous, non-blocking IO. In javascript fashion, the vast majority of IO functions use callbacks to handle the ‘results’. This allows the logic of NodePing to branch out in several directions without IO processes blocking others. This handling works when talking to databases, reading and writing files, and talking to other machines via network protocols.

Asynchronous, non-blocking, network chatter sounds like something a server monitoring service could use. So instead of running 1500 checks in series, one after another, each taking maybe hundreds of milliseconds to complete, we’re able to start hundreds of checks, one after another, without having to wait for the return results. For example, we may start an HTTPS request, move on to start 3 PINGs, 5 SMTP checks, and hundreds of other checks before the first HTTPS response has returned with the status code and a block of data from the webpage we requested. At that point Node.js processes the return information using a callback that we fed into the function when we started the request. That’s the magic of Node.js.

One limitation of Node.js is all that branching of a single process is bound to a single CPU.  A single Node.js script is unable to leverage the hardware of today’s multi-core, multi-cpu servers.  But we’re able to use Node.js’ “spawn” command to create multiple instances of our service checking processes, one for each CPU on the server and then balance our check load across the multiple running processes to make full use of the hardware.

Having non-blocking network IO allows our check servers to run thousands of more checks than our competitors with fewer resources.  Fewer resources means fewer and cheaper servers which means less overhead.  That’s how we’re able to charge only $10/month for 1 minute checks on 1000 target services.  You won’t find a better deal anywhere – you can thank the folks over at the Node.js community for that.

I’m sure some will be quick to point out there are other languages that can do the same thing, some of them probably better at one particular thing or another than Node.js and I won’t argue with most of them.  We think the way Node.js handles network IO makes it a great choice for a server monitoring service and if you give NodePing’s 15-day, risk-free trial a shot, we think you’ll agree.

Using iptables to balance Node.js processes

One of the challenges in building a web application on any platform is making sure it can handle enough visitors. That’s a fairly well known and understood challenge for apps hosted on Apache, which is where most of our experience has been. It’s a new ballgame when we’re talking about Node.js based apps.

NodePing HomeThe NodePing web application is all Node.js on the server, with lots of jQuery driven Ajax on the client. We actually have two different request patterns within the web app. The application piece is used by customers managing their monitoring checks and accounts. That part is a single page app (SPA). The “static” content pages are simpler. From a request and response point of view they are a more traditional web page, in part so that they’ll be easily crawlable. The components that actually run checks and process results for the NodePing services are a whole different thing, which we’ll write a post about later. This post is just about the web application.

Early on we started looking for information about how fast we should be able to go with Node.js. Good information was hard to find. Most of the benchmark type articles on the net are very simple test cases. They show Node.js doing very well on requests per second, but these typically are just responding with an almost empty response. Of course, they were comparing it to other servers handling similarly simple requests, so those results are fine for what they are trying to do but not really applicable. What happens when you start throwing in real world queries and processing that we’ll see in our web application?

The real answer is we don’t know, and even if the published benchmarks included more data with more processing to handle the requests we still wouldn’t know because they aren’t running our code on our setup. We need to get Slashdotted to find out for sure, and/or get large numbers of customers so we have thousands of real requests to the single page web app. Both of those would be interesting days. We have run a bunch of tests with ab and siege. I’m not going to report numbers, because they won’t be much more useful than the benchmarks I found. The fact is you have to build something and see how it works in your particular case. Feel free to help us get lots of customers, and we’ll report more on how we were able to handle the load in Node.js. It has to be real customers. We already know how we do under simulated load.

What we found in our early testing is that we were running out of performance in our app well before we wanted to. On most servers with at least moderate amounts of memory this was a matter of not enough processing power. We’d hit the host with an ab or siege test, and the processor would peg very shortly.

After looking at various options (mostly making sure we weren’t wasting processing in the code), we concluded we just needed to throw more processing at the app. Node.js typically runs in a single process. We needed to be able to utilize more cores. With Node.js, the most obvious way to do that is to start multiple processes and then balance the requests between the processes.

In our case we’re dealing with a web application, and the logic it needs to run isn’t very intensive. It is mostly serving up data. Each individual request doesn’t need access to more processing. So we don’t need interprocess communications, we just need to be able to run more of them. Also, most page content is in Redis or databases, which are shared, so we don’t even care if requests within a session hit the same process.

The first thing we looked at was various proxy front ends. There are several that might work. One of my favorites is node-http-proxy from nodejitsu.

In the end, we decided the simplest and fastest approach was using iptables to split the requests between multiple processes. We were already forwarding traffic to the server’s port so that we could run the service on a high port and easily run it as a non-root user. I needed to get this going quickly, so I just copied my main index.js file (which basically just starts the server and loads modules) to several files, with the port for each process in each file. It would have been fairly trivial to do this dynamically, or accept it as a command line parameter, but this was quick and it would work well for scripted starts. I ended up with files called something like index8001.js, index8002.js, and so on, with one file for each process I wanted to run.

All that’s left is the iptables bit. We are going to redirect the traffic using iptables statistics module. This could be done randomly, which should end up passing requests fairly evenly. Or it could be done using the nth mode, which forwards the nth request it sees that matches the rule. I opted for the nth mode approach.

Figuring out how to do this was a little tricky, because the statistics module has evolved and there is a lot of old information out there, plus some that is just wrong.

The rules we needed looked something like this:
-A PREROUTING -p tcp --dport 80 -j REDIRECT --to-ports 8001 -m statistic --mode nth --every 3
-A PREROUTING -p tcp --dport 80 -j REDIRECT --to-ports 8002 -m statistic --mode nth --every 2
-A PREROUTING -p tcp --dport 80 -j REDIRECT --to-ports 8003
-A PREROUTING -p tcp --dport 443 -j REDIRECT --to-ports 8101 -m statistic --mode nth --every 3
-A PREROUTING -p tcp --dport 443 -j REDIRECT --to-ports 8102 -m statistic --mode nth --every 2
-A PREROUTING -p tcp --dport 443 -j REDIRECT --to-ports 8103

Most of this is just normal forwarding. In this example we are forwarding traffic to port 80 to ports 8001, 8002, and 8003. Traffic to port 443 we forward to ports 8101, 8102, and 8103. We want a third of the traffic to each destination port to go to each process. Our actual number of ports will vary based on how many processes we want to use on a specific host, which depends in part on how many cores we want to use.

These are terminating targets. That is, once we’ve forwarded a given request we’re done and ready for the next one. We don’t ever forward to port 8001 and also port 8002. So the first rule we want every third request. The other two requests pass to the next line. Here, we want to forward every other request, since 1/3 are already being handled by the previous line. The third line for this destination port doesn’t need the statistics module at all, since it only ever sees every third request and should always forward all the traffic for port 80 that reaches this rule.

Some examples on the Internet list similar set ups with ‘–every 3’ on each of the three lines for port 80. That dumps a few of the requests out on the floor of the data center. The first line picks off one third of the requests, leaving 2/3 of the remaining requests to pass to the next line. If it is set to every 3, it picks off a third of those 2/3. The last line would then pick off a third of what’s left from that. That leaves something like 30% of our requests unhandled. That is bad.

This is not a high availability solution, it is just to spread load between processes. We’re using Forever to run the individual processes. That works fairly well, and we don’t really need to be concerned about fail over between processes on the single server. Load balancing between servers needs to preserve sessions, and is a different scenario from what I’ve described here. Watching the traffic come into this setup spreads the requests across all of the processes, effectively using all of our cores. On two processes we approximately double the number of requests we can handle per second. Four processes can handle roughly four times the number of requests. That is good.