Hadoop Development
Introduction to Big Data and Hadoop
- What is Big Data?
- What are the challenges for processing big data?
- What technologies support big data?
- Distribution systems.
- What is Hadoop?
- Why Hadoop?
- History of Hadoop
- Use Cases of Hadoop
- Hadoop eco System
- HDFS
- Map Reduce
- Statistics
Understanding the Cluster
- Typical workflow
- Writing files to HDFS
- Reading files from HDFS
- Rack Awareness
- 5 daemons
Developing the Map Reduce Application
- Configuring development environment – Eclipse
- Writing Unit Test
- Running locally
- Running on Cluster
- MapReduce workflows
How MapReduce Works
- Anatomy of a MapReduce job run
- Failures
- Job Scheduling
- Shuffle and Sort
- Task Execution
MapReduce Types and Formats
- MapReduce Types
- Input Formats – Input splits & records, text input, binary input, multiple inputs & database input
- Output Formats – text Output, binary output, multiple outputs, lazy output and database output
MapReduce Features
- Counters
- Sorting
- Joins – Map Side and Reduce Side
- Side Data Distribution
- MapReduce Combiner
- MapReduce Partitioner
- MapReduce Distributed Cache
Hive and PIG
- Fundamentals
- When to Use PIG and HIVE
- Concepts
HBASE
- CAP Theorem
- Hbase Architecture and concepts
- Programming
Subscribe to:
Post Comments
(
Atom
)
No comments :
Post a Comment