Introducing hadoop in 20 pages.
We have been working with hadoop for the last couple of years Patches, but we still find it tough to get other people in our company started on it. I came up with this blog as a starting point and was kinda popular internally, so am moving it here now.
Introducing hadoop in 20 pages is a concise document to briefly introduce just the right information in right amount, before starting out in-depth in this field. This document is intended to be used as a first and shortest guide to both understand and use Map-Reduce for building distributed data processing applications.
Topics covered or concisely presented.
- Introduction to hadoop.
- What is Map-Reduce and how it works ? (With example on how to write an algorithm)
- What is hadoop streaming ? ( A great tool for a newbie ).
- What is HDFS and where is it most suitable ?
- Serialization in hadoop – “how to go about it” and why not use java serialization ?
- Distributed cache.
- Job scheduling in hadoop.
Appendix: 1A on Avro serialization and its benefits over standard techniques.
Appendix: 1B documented examples from hadoop repository.
Accepted 1 MAPREDUCE-3360 2 MAPREDUCE-3532 3 MAPREDUCE-3316 4 MAPREDUCE-3708 5 MAPREDUCE-3723 6 MAPREDUCE-3212 7 HADOOP-7971 Available 8 MAPREDUCE-2493 9 MAPREDUCE-3504 10 MAPREDUCE-3115 11 MAPREDUCE-3131 12 MAPREDUCE-3686 13 HDFS-2725 Involved 14 MAPREDUCE-3140 15 MAPREDUCE-3494 16 MAPREDUCE-3070 17 MAPREDUCE-3354 18 MAPREDUCE-3193 19 MAPREDUCE-3204 20 HADOOP-7726
If you enjoyed this post, please consider to leave a comment or subscribe to the feed and get future articles delivered to your feed reader.
Comments
I will gothrough all hadoop pages Thanks for the post. Hadoop can also used for OLTP as well as OLAP Transactions
click here apache Hadoop tutorials to know more about the hadoop system in 5minutes

[...] Introducing Hadoop in 20 pages by Prashant Sharma. Getting started in hadoop for a newbie is a non trivial task, with amount of knowledge base available a significant amount of effort is gone in figuring out, where and how should one start exploring this field. Introducing hadoop in 20 pages is a concise document to briefly introduce just the right information in right amount, before starting out in-depth in this field. This document is intended to be used as a first and shortest guide to both understand and use Map-Reduce for building distributed data processing applications. [...]