Introducing hadoop in 20 pages.

We have been working with hadoop for the last couple of years Patches, but we still find it tough to get other people in our company started on it. I came up with this blog as a starting point and was kinda popular internally, so am moving it here now.

Introducing hadoop in 20 pages is a concise document to briefly introduce just the right information in right amount, before starting out in-depth in this field. This document is intended to be used as a first and shortest guide to both understand and use Map-Reduce for building distributed data processing applications.

Topics covered or concisely presented.

  1. Introduction to hadoop.
  2. What is Map-Reduce and how it works ? (With example on how to write an algorithm)
  3. What is hadoop streaming ? ( A great tool for a newbie ).
  4. What is HDFS and where is it most suitable ?
  5. Serialization in hadoop – “how to go about it” and why not use java serialization ?
  6. Distributed cache.
  7. Job scheduling in hadoop.

Appendix: 1A on Avro serialization and its benefits over standard techniques.

Appendix: 1B documented examples from hadoop repository.

Download or view the PDF here: Introduction to hadoop in 20 pages.

Patches


 Accepted
1 MAPREDUCE-3360
2 MAPREDUCE-3532
3 MAPREDUCE-3316
4 MAPREDUCE-3708
5 MAPREDUCE-3723
6 MAPREDUCE-3212
7 HADOOP-7971
8 MAPREDUCE-3686
9 HDFS-2725
10 MAPREDUCE-3952

Available
11 MAPREDUCE-3504
12 MAPREDUCE-3115
13 MAPREDUCE-3131
14 MAPREDUCE-3870
15 MAPREDUCE-2493

Involved
16 MAPREDUCE-3070
17 MAPREDUCE-3354
18 MAPREDUCE-3193
19 MAPREDUCE-3204
20 HADOOP-7726
21 MAPREDUCE-3140
22 MAPREDUCE-3494

If you enjoyed this post, please consider to leave a comment or subscribe to the feed and get future articles delivered to your feed reader.

Comments

[...] Introducing Hadoop in 20 pages by Prashant Sharma. Getting started in hadoop for a newbie is a non trivial task, with amount of knowledge base available a significant amount of effort is gone in figuring out, where and how should one start exploring this field. Introducing hadoop in 20 pages is a concise document to briefly introduce just the right information in right amount, before starting out in-depth in this field. This document is intended to be used as a first and shortest guide to both understand and use Map-Reduce for building distributed data processing applications. [...]

[...] Prashant Sharma, December 15, 2011 [...]

it is a very good document. Can you pl create one with Hadoop latest developments. thanks

Excellent work!!

Very informative with right amount of information

Leave a comment

(required)

(required)