What is Map-Reduce?

In a very simple way, MapReduce can be defined as a framework for processing vast amount of unstructured data. The simplicity of the previous sentence does not undermine the vastness of MapReduce.

There are two steps to process data using MapReduce:

a. User specifies a map function that processes a key/value pair to generate a set of intermediate key/pairs.

b. Then user specifies a Reduce function that merges all intermediate values associated with the same intermediate key.

The map function emits each word plus an associated count of occurrences (just `1′ in this simple example). The reduce function sums together all counts emitted for a particular word.

Details of the above process are in Section 3.1 in the pdf at:

http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//archive/mapreduce-osdi04.pdf

MapReduce implementation in MongoDb:

 

a. Create a collection in MongoDb and insert some records.

  • Created a collection named “mapReduceCollection” and inserted four records with three columns, “cust_id”, “amount” and “status”

b. Apply MapReduce function on the collection

  • Applied “MapReduce” function to the “mapReduceCollection” which first executes the query (status: “A”) and filters the data. Then the result set is mapped according to the Map function defined. The key/value pair obtained is then “Reduced” to the “order_totals” collection.

c. See the aggregated result

References: