MongoDB 5.2 Improvements for Analytics

MongoDB initially gained traction as a backend for web applications, in which it was mostly concerned with so-called “CRUD” operations—creating, reading, updating, and deleting documents. Since then, MongoDB has broadened its capabilities remarkably, but it is still rather as an operational database than an analytic DB or data warehouse.

The ability for MongoDB to support rapid development, DevOps and modern mobile applications makes it a good choice for a very wide range of deployments. However, compared with comparable relational databases, it has historically lagged in its analytics capabilities.

In the SQL world, the venerable SELECT statement contains an extremely rich set of operators for aggregating, transforming and analyzing information. There is very little data manipulation that you can’t do with SQL.

In MongoDB, the find statement provides only basic data retrieval operations. The MongoDB BI connector allows one to run SQL against a Mongo database, albeit with significant functionality and performance limitations.

It is the MongoDB aggregation framework that provides the richest set of analytic capabilities. The aggregation framework allows developers to construct “pipelines” of data operations, including grouping, transforming and joining data. Aggregation frameworks are powerful, though hard to use for very complex operations.

All releases of MongoDB over the last couple of years have included significant enhancements to the aggregation framework, and version 5.0 is no exception.

The most significant new aggregation feature in MongoDB 5.0 is probably the introduction of windowing functions. Windowing functions were introduced in SQL in the SQL:2003 standard and are now found in virtually every modern SQL implementation. A window function partitions the documents in a result set and creates a sort of “virtual collection” that the function works with. The function operates on a “window” of documents around the current document, allowing you to access trends or group information.

For instance, using a window function, we can get the average of measurement over the current year and express the difference between the current measurement and that average. We can also access data in previous measurements and create a moving average or delta value.

In MongoDB 5.0, windowing functions are provided by the $setWindowsFields aggregation operator.

MongoDB 5.2 also introduces some useful enhancements to aggregation. The new $top and $bottom operators give you quick access to the highest and lowest elements within a sorted list. $maxN and $min provide a similar capability: you can retrieve the top “N” elements in a sorted list.

The $sortArray operator allows you to sort an embedded array within a pipeline without having to “unwind” that array into individual documents.

As always, I welcome these new additions to the aggregation framework while still hoping that one day MongoDB will provide a strong native SQL-based query interface. SQL may not be perfect, but I think it still provides an easier way to express analytic queries than some of the very complex aggregation pipelines required by MongoDB. While the BI connector allows for some SQL capabilities, its performance and functionality limitations make it unsuitable for serious analytics.

A final “under the hood” enhancement for analytics is the ability to perform long-running snapshot queries. A snapshot query, as its name implies, works on an immutable snapshot of data at some point in time. A snapshot query is unaffected by updates that might be going on while the query is running, and this is essential to get a consistent view of data. MongoDB 5.2 allows snapshot queries to extend to 5 minutes and even longer if you are prepared to adjust some internal parameters.

Almost all data has value beyond its use within an operational system. Being able to analyze data for predictive analytics and machine learning is an essential capability required by any serious database management system. With 5.0 and 5.2, MongoDB has continued to enhance its capabilities in this area.

Leave a Comment