Introducing hadoop in 20 pages. 5


We have been working with hadoop for the last couple of years Patches, but we still find it tough to get other people in our company started on it. I came up with this blog as a starting point and was kinda popular internally, so am moving it here now.

Introducing hadoop in 20 pages is a concise document to briefly introduce just the right information in right amount, before starting out in-depth in this field. This document is intended to be used as a first and shortest guide to both understand and use Map-Reduce for building distributed data processing applications.

Topics covered or concisely presented.

  1. Introduction to hadoop.
  2. What is Map-Reduce and how it works ? (With example on how to write an algorithm)
  3. What is hadoop streaming ? ( A great tool for a newbie ).
  4. What is HDFS and where is it most suitable ?
  5. Serialization in hadoop – “how to go about it” and why not use java serialization ?
  6. Distributed cache.
  7. Job scheduling in hadoop.

Appendix: 1A on Avro serialization and its benefits over standard techniques.

Appendix: 1B documented examples from hadoop repository.

Download or view the PDF here: Introduction to hadoop in 20 pages.

Patches


 Accepted
1 MAPREDUCE-3360
2 MAPREDUCE-3532
3 MAPREDUCE-3316
4 MAPREDUCE-3708
5 MAPREDUCE-3723
6 MAPREDUCE-3212
7 HADOOP-7971
8 MAPREDUCE-3686
9 HDFS-2725
10 MAPREDUCE-3952

Available    
11 MAPREDUCE-3504
12 MAPREDUCE-3115
13 MAPREDUCE-3131
14 MAPREDUCE-3870
15 MAPREDUCE-2493

Involved
16 MAPREDUCE-3070
17 MAPREDUCE-3354
18 MAPREDUCE-3193
19 MAPREDUCE-3204
20 HADOOP-7726
21 MAPREDUCE-3140
22 MAPREDUCE-3494

Leave a comment

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">

5 thoughts on “Introducing hadoop in 20 pages.