What is Map-Reduce?
In a very simple way, MapReduce can be defined as a framework for processing vast amount of unstructured data. The simplicity of the previous sentence does not undermine the vastness of MapReduce.
There are two steps to process data using MapReduce:
a. User specifies a map function that processes a key/value pair to generate a set of intermediate key/pairs.
b. Then user specifies a Reduce function that merges all intermediate values associated with the same intermediate key.
The map function emits each word plus an associated count of occurrences (just `1′ in this simple example). The reduce function sums together all counts emitted for a particular word.
Details of the above process are in Section 3.1 in the pdf at:
MapReduce implementation in MongoDb:
a. Create a collection in MongoDb and insert some records.
- Created a collection named “mapReduceCollection” and inserted four records with three columns, “cust_id”, “amount” and “status”
b. Apply MapReduce function on the collection
- Applied “MapReduce” function to the “mapReduceCollection” which first executes the query (status: “A”) and filters the data. Then the result set is mapped according to the Map function defined. The key/value pair obtained is then “Reduced” to the “order_totals” collection.
c. See the aggregated result