DocumentDB

Azure DocumentDB is a fully managed NoSQL database service built for fast and predictable performance, high availability, automatic scaling,and ease of development. Its flexible data model, consistent low latencies, and rich query capabilities make it a great fit for web,mobile, gaming, and IoT, and many other applications that need seamless scale.

Why DocumentDB

DocumentDB is designed for situations where you don’t know exactly about the kind of data you are working,don’t know the structure of the  data and used by variety of clients.

The DocumentDB Data Model :

DocumentDB’s data model is simple: all data is stored in JSON documents.

For example, suppose you’re creating a DocumentDB application that works with customers. Information about each of those customers would typically be described in its own JSON document, and so the document for the customer Contoso might look like this:

The document for the customer Fabrikam would likely be similar, but it needn’t be identical. It might look like this:

 

Notice that the above two customer documents are similar, their structure isn’t identical. This is fine; DocumentDB doesn’t enforce any schema

Working with Data :

DocumentDB clients can be written in multiple languages, including C#, JavaScript, Java, and Python. Whatever choice a developer makes, the client accesses DocumentDB through RESTful access methods. A developer can use these to work with documents in a collection in a few different ways. The options are:

  • Using these access methods directly for create/read/update/delete (CRUD) operations.
  • Submitting requests expressed in DocumentDB SQL.
  • Defining and executing logic that runs inside DocumentDB, including stored procedures, triggers, and userdefined functions (UDFs).

RESTful Access Methods :

If an application has the necessary permissions, it can use DocumentDB’s RESTful access methods to perform CRUD operations on documents and other resources. Like every RESTful interface, DocumentDB uses the standard HTTP verbs:

  • A GET request returns the value of a resource, such as a document.
  • A PUT request replaces a resource.
  • A POST request creates a new resource. POSTs are also used to send DocumentDB SQL requests and to create new stored procedures, triggers, and UDFs.
  • A DELETE request removes a resource.

A developer using this interface is free to construct requests manually—it’s just REST. But to make life easier, DocumentDB provides  several client libraries.

DocumentDB SQL :

A DocumentDB client can read and write data using the service’s RESTful access methods. But a real database needs a real query language, something that lets applications work with data in more complex ways. This is what DocumentDB SQL provides.

This language is an extended subset of SQL, a technology that many developers already know. For example, suppose the simple JSON documents shown earlier are contained in a collection called customers. Here’s a query on that collection:

SELECT c.salesYTD  FROM customers c  WHERE c.name = “Imaginea”

As anybody who knows SQL can probably figure out, SELECT requests the value of the element salesYTD, FROM indicates that the query should be executed against documents in the customers collection, and WHERE specifies the condition that documents within that collection should meet. The query’s result is year-to-date sales for Fabrikam formatted as JSON data:

Executing Logic in the Database :

DocumentDB SQL lets a client issue a request that’s parsed and executed when it’s received. But there are plenty of situations where it makes more sense to run logic stored in the database itself. DocumentDB provides several ways to do this, including stored procedures (commonly called sprocs), triggers, and user-defined functions (UDFs). A collection can contain any or all of them.

Stored Procedures :

Stored procedures implement logic, which means they must be written in some programming language. Relational databases commonly create their own language for doing this, such as SQL Server’s T-SQL. But what should this language look like for a database that stores JSON documents? The answer is obvious: stored procedures should be written in JavaScript, which is exactly what DocumentDB does.

Every sproc is wrapped in an atomic transaction. If the sproc ends normally, all of the changes it has made to documents in this collection will be committed. If it throws an exception, however, all of the changes it has made to these documents will be rolled back. And while the sproc is executing, its work is isolated—no other requests to this database will see partial results

Triggers :

DocumentDB triggers are similar in some ways to stored procedures: they’re invoked via a POST request, and they’re written in JavaScript. They also materialize JSON documents into JavaScript variables and are automatically wrapped in an atomic transaction. Unlike sprocs, however, a trigger runs when a specific event happens, such as data being created, changed, or deleted.

User-Defined Functions :

Like stored procedures and triggers, user-defined functions are written in JavaScript, and they run within DocumentDB itself. UDFs can’t make changes to the database, however—they’re read-only. Instead, a UDF provides a way to extend DocumentDB SQL with custom code.

DocumentDB vs. MongoDB

Data Model :

MongoDB & DocumentDB use a flexible data structure. MongoDB uses a data format named ‘BSON’ that is a binary format of JSON data and DocumentDB uses the JSON format to save data. In both DBs, the data is stored as documents. Both have a reserved ‘ID’ that takes a GUID to represent a unique record. MongoDB has a reserved field named ‘_id’ whereas DocumentDB has the field named ‘id’.

Hosting Features :

MongoDB is available in both ‘on-premise’ and cloud hosted options. DocumentDB  are hosted on the Azure Cloud.

Scalability :

Mongo supports scalability by adding a number of nodes by using scripts. Once they are added, one of the servers is treated as a primary server that supports read-write operations and all the secondary nodes are used for read operations. An odd number of servers is generally configured in the farm, where one acts as a primary server, the other acts as a secondary server and the third acts as an arbiter.This arbiter server is used to promote any of the secondary servers to primary when the primary server goes down.

DocumentDB is hosted on the Azure Cloud and all the servers support read-write operations. The cluster is managed by using the Azure hosting methods.

Both enable vertical scaling and horizontal scaling; in other words, sharding. MongoDB supports sharding by using ‘shard clusters’.

Query Types Supported :

Mongo provides system-defined methods and operators to do operations such as aggregation and filter. Mongo provides a ‘Find’ method that takes a criterion and a number of fields to return. It also supports operators like ‘$in’, ‘$gt’, and ‘$lt’ when applying a filter. It also supports a search of nested structures. To do a groupby and having clause, it uses operators like ‘$match’ and ‘$group’. It also supports operators like ‘$geoNear’ to take advantage of geoSpatical indexes. It also supports $sum and $avg when used with group by ($group) operations.

DocumentDB supports the creation of stored procedures, trigger, and user defined functions (UDF) using JavaScript. It uses SQL-like statements to retrieve the data. It also supports joins within the document that has nested structure to apply filter on the data. The major disadvantage is that it doesn’t provide any group by options or methods like sum and average. Users have to write custom logic to achieve this.

Consistency & Availability :

MongoDB uses ACID properties at the document level. ACID properties ensure the document is safely updated; in case of any errors, the operation is rolled back.

DocumentDB also uses ACID properties at the document level. It defines different consistency levels to determine how the read is executed  after a write operation is performed.

MongoDB is designed to make a secondary server primary if a primary server goes down. This is automatically done and requires no manual  intervention. DocumentDB uses an Azure feature to manage the availability of servers.

Management & Operations :

Azure provides a web interface to manage and monitor a DocumentDB account. It provides an option to monitor the usage—charts—and also allows the modifying of the metrics based on needs.

In MongoDB, the ops manager enables monitoring. It also provides charts, dashboards, and customized alerts to monitor usage and also customized metrics.

Native REST interface :

Both DocumentDB and MongoDB have in common is developer support in the form of SDKs in multiple programming languages.

In addition to working with a language driver, DocumentDB can use a native REST interface. In fact, the client drivers are largely  wrappers around the REST interface.

By contrast, MongoDB does not have a native REST interface. It can be a little confusing because while there is no native support, there are third party open source wrappers written in other languages (Node, Python, but sadly no .NET).

Data Interchange Format :

The term “Data Interchange Format” is merely a formal way of describing a protocol that is used to represent data during transmission to and from client and database server. Just to clarify, for most database platforms, how data is physically stored on disk is typically different from how it is actually stored/managed in memory and transmitted over the wire.

DocumentDB uses JSON to represent documents which is originally derived from the Javascript language and it is currently described by two competing standards, RFC 7159 and ECMA-404.

For the most part, just your typical JSON data fragment. One thing to look for is the _self property; this field is ubiquitous (and unique) to all documents in DocumentDB. The purpose of _self is to act as an immutable primary key field. Searching on _self  is the fastest way to retrieve a resource.

MongoDB uses BSON which is a proprietary extension (or superset) of JSON. While there is no official standard, a specification can be found online. The primary takeaway is that BSON has support for data types beyond the standard JSON types

A brief comparing the merits between JSON vs. BSON

  • JSON has a slightly smaller footprint (size-wise) than BSON so if working in a scenario where disk/memory/bandwidth are an issue then JSON is a better fit.
  • BSON has a richer set of types which allows for more flexible querying involving dates, timestamps, numbers, Javascript objects, etc.
  • There is a slight performance penalty associated with serializing and de-serializing BSON and JSON to native objects when working with strongly-typed languages like C#, Java, etc. However with JSON (DocumentDB) there is no such penalty when working with the Javascript  driver since JSON is native to Javascript (not the case with BSON).

In similar fashion to DocumentDB’s _self field, MongoDB enforces the use of a primary key field which is called _id (although it can be renamed). Being a key field, this field is automatically indexed and can be used for fast retrieval of a single document.