Mongoose / MongoDB performance enhancements and tweaks (Part 3)

bush_doing_it_wrong_1

Holy crap. MongoDB native drivers are SO much faster than updates through mongoose’s ORM.  Initially, when I set out on this quest to enhance mongoDB performance with node.js, I thought that modifying my queries and limiting the number of results returned would be sufficient to scale. I was wrong (thanks Bush for the imagery). It turns out that the overhead that mongoose adds for wrapping mongoDB documents within mongoose’s ORM is tremendous. Well, I should say tremendous for our use case.

In my previous two posts of tweaking mongoDB / mongoose for performance enhancements (Part 1 and Part 2), I discussed optimization of queries or making simple writes instead of reads. These were worthwhile improvements and the speed difference eventually added up to significant chunks, but I had no idea moving to the native driver would give me these types of improvements (See below).

Example #1: ~400 streams, insertion times.
These numbers are from after I made the initial tweaks after Part 1. Unfortunately, I don’t really have that good of a printout from mongotop, but this kind of gives you an idea. Look at the write times for streams and packets, flowing at a rate of ~400 streams. This is for 400 sources of packets, which all gets written and persisted. Here you can see the write time to streams is @ 193ms / 400 streams or 48.25 ms / 100 streams. Likewise, packet writing is 7.25 ms / 100 streams. (You can mostly ignore read time, these are used for data aggregates and computing analytics). Compare these with the results below:

ns total read write
streams 193ms 0ms 193ms
packets 30ms 1ms 29ms
devices 9ms 9ms 0ms

Example 2: ~1000 streams, insertion times.
You can see here that write time has dropped significantly. Writes to the packets collection is hovering at around 1.7 ms / 1000 streams, and writes to the streams collection hovers at around 7.6 ms / 100 streams. Respectively, that’s a 425% and a 635% improvement in query write times to the packets collection and streams collection. And don’t forget, I had already begun the optimizations to mongoose. Even after the tweaks I made in Part 2, these numbers still represent a better than 100% improvement to query times. Huge, right?

ns total read write
packets 186ms 169ms 17ms
devices 161ms 159ms 2ms
streams 97ms 21ms 76ms

I knew using the mongoDB native drivers would be faster, but I hadn’t guessed that they would be this much faster.  

To make these changes, I updated mongoose to the latest version 3.8.14, which enables queries to be made using the native mongoDB driver released by 10gen (github here: https://github.com/mongodb/node-mongodb-native) via Model.Collection methods.  These in turn call methods defined in node_modules/mongodb/lib/mongodb/collection/core.js, which essentially just execute raw commands in mongo. Using these native commands, one can take advantage of things like bulk inserts.

I still like mongoose, because it helps instantiate the same object whenever you need to create and save something. If something isn’t defined in the mongoose.Schema, that object won’t get persisted to mongoDB either. Furthermore, it can still be tuned to be semi-quick, so it all depends on the use case. It just so happens that when you’re inserting raw json into mongoDB or don’t need the validation and other middleware that mongoose provides, you can use the mongoDB native drivers while still using mongoose for the good stuff. That’s cool.

Here’s what the new improvements look like:

    var Stream = mongoose.model('Stream');
    async.waterfall([
        //Native mongoDB update returns # docs updated, update by default updates 1 document:
        function query1 (callback) {
            Stream.collection.update(query1, query1set, {safe: true}, function(err, writeResult) {
                if (err) throw err;
                if (writeResult == 1) {
                    callback('Found and updated call @ Query1');
                } else {
                    callback(null);
                }
            });
        },
        function(callback) {
            Stream.collection.update(query2, query2set, {safe: true}, function(err, writeResult) {
                if (err) throw err;
                if (writeResult == 1) {
                    callback('Found and updated stream @ Query2');
                } else {
                    pushNewStream(packet, cb);
                    callback('No stream found.  Pushing new stream.');
                }
            });

        }
    ], function(err, results) {});

Mongoose / MongoDB speed improvements (Part 2)

In a previous post MongoDB performance enhancements and tweaks, I described some techniques I’ve found to speed up mongoose inserts and updates on large volume performance statistic data. I was still seeing performance bottlenecks in MongoDB, especially after running a node cluster for our data analytics. Since node is now spread to multiple cores (scaling “horizontally”, to be described in another post), the writes generally come to MongoDB much faster. The main question for me was whether it was mongoose, the node-mongoDB driver or mongo per se slowing me down.

The problem:

When we get performance data, we start tracking it by the creation of a “stream” object. The stream object is first created with a unique identifier, then subsequent packets that come in update the stream. The streams get the highest packet value of the incoming packet and update their timestamp with the packet’s timestamp. Later on when we stop seeing packets flow for a particular stream, we time it out so analytics can be computed for all packets that came in from the beginning of the stream to the end of the stream.

My initial implementation made a series of read queries using mongoose find(query), returned all potential matching streams, then updated the matching stream. The source code looked something like this.

function updateStream(packet) {
   var stream = mongoose.model('Stream');
   var query1 = {
        $and: [{
            'from.ID': packet.fromID
        }, {
            'to.ID': packet.toID
        }, {
            'stream_ended.from': false
        }, {
            'from.IP_ADDRESS': packet.IP
        }, {
            'from.highestPacketNum': {
                $lt: packet.highestPacketNum
            }
        }]
    };

   //Since streams are bilateral, we have to do two query reads in order to find the matching stream
   var query2 = {
        $and: [{
            'to.ID': packet.toID
        }, {
            'from.ID': packet.fromID
        }, {
            'stream_ended.to': false
        }, {
            'to.IP_ADDRESS': IP
        }, {
            'to.highestPacketNum': {
                $lt: packet.highestPacketNum
            }
        }]
    };  

   async.waterfall([
      function (callback) {
         stream.find(query1).exec(function(err, calls) { 
            if (calls.length > 1) {  //throw error, there should only be 1 stream
               throw new Error('There should only be one stream with this unique identifier');
            } else if (calls.length == 1) {
               //update calls[0]
               calls[0].save(cb);
               callback(null);
            } else {
               callback('Query1 yielded no results');  //continue down the waterfall
            }
         });
      },
      function (callback) {
         stream.find(query2).exec(function(err, calls) { 
            if (calls.length > 1) {  //throw error, there should only be 1 stream
               throw new Error('There should only be one stream with this unique identifier');
            } else if (calls.length == 1) {
               //update calls[0]
               calls[0].save(cb);
               callback(null);
            } else {
               callback('Query2 yielded no results');  //continue down the waterfall
            }
         });
      }
   ], cb);
}

You can see that this sourcecode was highly inefficient because mongoose was returning all possible matches of the stream. This was based on a limitation by our packet simulator, which at the time did not spoof unique IDs in the packets that it would send. At this point in time, we were capped at around 250 simultaneous streams, running in 1 node process.

Result: ~250 simultaneous streams of data

Improvement #1: Limit the search to the first object found, update it and persist it.

Essentially the source code remained the same, but the mongoose.find queries changed from find(query1).exec to find(query1).limit(1).exec(). With this change, we saw an improvement of around 50%, since mongoose would return after finding the first match. At this point, the blocker shifted back to node.js, and I noticed that at a certain point, the event queue would block up with handling too many events. AT the time, node.js was responsible for doing aggregation, decoding the packets, invoking mongoose and storing them, and running our REST API as well as serving up static data. I saw my poor little node.js process pinged out at 100% trying to churn through all the data. One good thing I noticed though, is that even though node.js was capped out in terms of resources, it still continued to churn through the data, eventually processing everything given enough time.

Result: ~350 simultaneous streams of data, query time improved by about 50%

Improvement #2: Clustering node.js

This deserves a post in itself since it required breaking out the various services that the single node process was handling into multiple forked processes, each doing its own thing. Suffice it to say, I eventually decided to fork the stream processors into N instances, where N is the number of cores on the machine / 2, with basic load balancing. This caused node to write to mongoDB much faster, and caused delays which eventually bogged mongoDB down. Thus the pendulum swung back to mongoDB.

Result: ~400 simultaneous streams of data, query time remains the same, mongoDB topped out.

Improvement #3: mongoDB updates in place

Finally, I realized that there was no reason to actually get the stream object returned in our source code, since I was just making updates to it. I had also noticed that it was actually the read time in mongotop that was spiking when the number of streams increased. This was because the find() functions in mongoose return a mongoose wrapped mongoDB document, so the update does not happen in place. For simple updates without much logic, there is no point to getting the object back, even when using the .lean() option to get json back. Here was the update:

function updateStream(packet) {
    //queries remain the same ...

    async.waterfall([
        function(callback) {
            Stream.findOneAndUpdate(query1, {$set: { 'endTime': packet.timestamp, 'metadata.lastUpdated': new Date(), 'from.highestPacketNum': highestPacketNum }}, {new: false}).exec(function(err, doc) {
                if (doc) {
                    cb(err, doc);
                    callback('Found and updated stream @ Query1');
                } else {
                    callback(null);
                }
            });
        },

        //No matching yet with IP, so try with just swapped SSRCs and update that one
        function(callback) {
            Stream.findOneAndUpdate(query2, {$set: { 'to.IP_ADDRESS': IP, 'endTime': packet.timestamp, 'metadata.lastUpdated': new Date(), 'to.highestPacketNum': highestPacketNum }}, {new: false}).exec(function(err, doc) {
                if (doc) {
                    cb(err, doc);
                    callback('Found and updated stream @ Query2');
                } else {
                    createNewStream(packet, cb);
                    callback('No streams found.  Pushing new stream.');
                }
            });
        }
    ], function(err, results) {

    });
}

It turns out that after this improvement, I saw in mongotop that read time dropped to 0ms per second, and write time slightly spiked. However, this was by far the biggest improvement in overall query time.

Result: ~600 simultaneous streams of data, query time dropped by 100 – 200% (when including read time dropping to 0). This, combined with the stream processors running on multiple cores seemed to be a scalable solution that would significantly spike our capacity, but I noticed that at around 600 simultaneous streams of data, suddenly our stream creation would spike, and continue increasing.

Improvement #4: MongoDB upgrade to 2.6.x and query improvement using $max field update operator

For all you readers who like mysteries, can you guess what was causing the stream creation to spike? I spent a long time thinking about it. For one, mongotop didn’t seem to be topped out in terms of the total query time on the streams collection. I noticed spikes of up to 400 or so ms for total query time, but it seemed by and large fine. Node.js was running comfortably at around 40% – 50% cpu usage per core on each of the four cores. So if everything seemed fine, why was stream creation spiking?

The answer, it turns out, was a race condition caused by the processing of the second packet of the simultaneous stream before the first packet could be used to instantiate a new stream object. At a certain point, when enough simultaneous streams were incoming, the delay in creation of the new stream would eclipse the duration between the first and second packets of that same stream. Hence, everything after this point created a new stream.

I thought for a while about a solution, but I got hung up either on choosing a synchronous processing of the incoming packets, which would significantly decrease our throughput, or use a “store and forward” approach where a reconciliation process would go back and match up streams. To be fair, I still have this problem, but I was able to reduce its occurrence to a significant extent. Because there’s no guarantee that we would be handling the packets in a synchronous order, I updated our query to make use of the $max field update operator, which would only update the stream highest packet number if a packet with the same IDs and higher packet number came in. This in turn let us reduce the query time because I no longer had to query to find a stream with a lower packet number than the incoming packet. After this update, I noticed that the reduced query time significantly reduced the total query time on the collection and at the same time helped the race condition issue.

function updateStream(packet) {
    var query1 = {
        $and: [{
            'from.ID': packet.fromID
        }, {
            'to.ID': packet.toID
        }, {
            'stream_ended.from': false
        }, {
            'from.IP_ADDRESS': packet.IP
        }]
    };

    var query2 = {
        $and: [{
            'to.ID': packet.fromID
        }, {
            'from.ID': packet.toID
        }, {
            'stream_ended.to': false
        }]
    };

    async.waterfall([

        //First see if there is a call object
        function(callback) {
            Call.findOneAndUpdate(query1, {
                $set: { 'endTime': packet.timestamp, 
                        'metadata.lastUpdated': new Date()
                    },
                $max: {
                    'from.highestPacketNum': highestPacketNum
                }
            }, {new: false}).exec(function(err, doc) {
                if (doc) {
                    cb(err, doc);
                    callback('Found and updated stream @ Query1');
                } else {
                    callback(null);
                }
            });
        },

        function(callback) {
            Call.findOneAndUpdate(query2, {
                $set: { 'to.IP_ADDRESS': packet.IP, 
                        'endTime': packet.timestamp, 
                        'metadata.lastUpdated': new Date()
                    },
                $max: {
                    'to.highestPacketNum': highestPacketNum 
                }
            }, {new: false}).exec(function(err, doc) {
                if (doc) {
                    cb(err, doc);
                    callback('Found and updated stream @ Query3');
                } else {
                    pushNewStream(packet, cb);
                    callback('No stream found.  Pushing new stream.');
                }
            });
        }
    ], function(err, results) {

    });
    return true;
}

Note that the threshold for the race condition is just higher with this approach and not completely solved. If enough streams come in, I’ll still eventually get the race condition where the stream is not instantiated before the second packet is being processed. I’m still not quite sure what the best solution is here, but as with everything, improvement is an incremental process.

Result: ~1000 simultaneous streams, query time dropped by 100%, 4 cores running at 40% – 50% cpu.