Introducing hadoop in 20 pages.

We have been working with hadoop for the last couple of years Patches, but we still find it tough to get other people in our company started on it. I came up with this blog as a starting point and was kinda popular internally, so am moving it here now.

Introducing hadoop in 20 pages is a concise document to briefly introduce just the right information in right amount, before starting out in-depth in this field. This document is intended to be used as a first and shortest guide to both understand and use Map-Reduce for building distributed data processing applications.

Topics covered or concisely presented.

  1. Introduction to hadoop.
  2. What is Map-Reduce and how it works ? (With example on how to write an algorithm)
  3. What is hadoop streaming ? ( A great tool for a newbie ).
  4. What is HDFS and where is it most suitable ?
  5. Serialization in hadoop – “how to go about it” and why not use java serialization ?
  6. Distributed cache.
  7. Job scheduling in hadoop.

Appendix: 1A on Avro serialization and its benefits over standard techniques.

Appendix: 1B documented examples from hadoop repository.

Download or view the PDF here: Introduction to hadoop in 20 pages.

Patches


 Accepted
1 MAPREDUCE-3360
2 MAPREDUCE-3532
3 MAPREDUCE-3316
4 MAPREDUCE-3708
5 MAPREDUCE-3723
6 MAPREDUCE-3212
7 HADOOP-7971

Available
8 MAPREDUCE-2493
9 MAPREDUCE-3504
10 MAPREDUCE-3115
11 MAPREDUCE-3131
12 MAPREDUCE-3686
13 HDFS-2725

Involved
14 MAPREDUCE-3140
15 MAPREDUCE-3494
16 MAPREDUCE-3070
17 MAPREDUCE-3354
18 MAPREDUCE-3193
19 MAPREDUCE-3204
20 HADOOP-7726

If you enjoyed this post, please consider to leave a comment or subscribe to the feed and get future articles delivered to your feed reader.

Comments

[...] Introducing Hadoop in 20 pages by Prashant Sharma. Getting started in hadoop for a newbie is a non trivial task, with amount of knowledge base available a significant amount of effort is gone in figuring out, where and how should one start exploring this field. Introducing hadoop in 20 pages is a concise document to briefly introduce just the right information in right amount, before starting out in-depth in this field. This document is intended to be used as a first and shortest guide to both understand and use Map-Reduce for building distributed data processing applications. [...]

[...] Prashant Sharma, December 15, 2011 [...]

I will gothrough all hadoop pages Thanks for the post. Hadoop can also used for OLTP as well as OLAP Transactions
click here apache Hadoop tutorials to know more about the hadoop system in 5minutes

Leave a comment

(required)

(required)