What is Map-Reduce?

In a very simple way, MapReduce can be defined as a framework for processing vast amount of unstructured data. The simplicity of the previous sentence does not undermine the vastness of MapReduce.

There are two steps to process data using MapReduce:

a. User specifies a map function that processes a key/value pair to generate a set of intermediate key/pairs.

b. Then user specifies a Reduce function that merges all intermediate values associated with the same intermediate key.

The map function emits each word plus an associated count of occurrences (just `1′ in this simple example). The reduce function sums together all counts emitted for a particular word.

Details of the above process are in Section 3.1 in the pdf at:


MapReduce implementation in MongoDb:


a. Create a collection in MongoDb and insert some records.

  • Created a collection named “mapReduceCollection” and inserted four records with three columns, “cust_id”, “amount” and “status”

b. Apply MapReduce function on the collection

  • Applied “MapReduce” function to the “mapReduceCollection” which first executes the query (status: “A”) and filters the data. Then the result set is mapped according to the Map function defined. The key/value pair obtained is then “Reduced” to the “order_totals” collection.

c. See the aggregated result