Course Content of HADOOP
Introduction
The Motivation for Hadoop
·
Problems with Traditional Large-Scale
Systems
·
Requirements for a New Approach
·
Introducing Hadoop
Hadoop: Basic Concepts
·
The Hadoop Project and Hadoop
Components
·
The Hadoop Distributed File System
·
Hands-On Exercise: Using HDFS
·
How MapReduce Works
·
Hands-On Exercise: Running a
MapReduce Job
·
How a Hadoop Cluster Operates
·
Other Hadoop Ecosystem Projects
Writing a MapReduce Program
·
The MapReduce Flow
·
Basic MapReduce API Concepts
·
Writing MapReduce Drivers, Mappers
and Reducers in Java
·
Writing Mappers and Reducers in Other
Languages Using the Streaming API
·
Speeding Up Hadoop Development by
Using Eclipse
·
Hands-On Exercise: Writing a
MapReduce Program
·
Differences Between the Old and New
MapReduce APIs
Unit Testing MapReduce Programs
·
Unit Testing
·
The JUnit and MRUnit Testing
Frameworks
·
Writing Unit Tests with MRUnit
·
Hands-On Exercise: Writing Unit Tests
with the MRUnit Framework
Delving Deeper into the Hadoop API
·
Using the ToolRunner Class
·
Decreasing the Amount of
·
Intermediate Data with Combiners
·
Hands-On Exercise: Writing and
Implementing a Combiner
·
Setting Up and Tearing Down Mappers
and Reducers by Using the Configure and Close Methods
·
Writing Custom Partitioners for
Better Load Balancing
·
Hands-On Exercise: Writing
·
a Partitioner
·
Accessing HDFS Programmatically
·
Using The Distributed Cache
·
Using the Hadoop API’s Library of
Mappers, Reducers and Partitioners
Practical Development Tips and Techniques
·
Strategies for Debugging MapReduce
Code
·
Testing MapReduce Code Locally by
Using LocalJobReducer
·
Writing and Viewing Log Files
·
Retrieving Job Information with
Counters
·
Determining the Optimal Number of
Reducers for a Job
·
Creating Map-Only MapReduce Jobs
·
Hands-On Exercise: Using Counters and
a Map-Only Job
Data Input and Output
·
Creating Custom Writable and
WritableComparable Implementations
·
Saving Binary Data Using SequenceFile
and Avro Data Files
·
Implementing Custom Input Formats and
Output Formats
·
Issues to Consider When Using File
Compression
·
Hands-On Exercise: Using
SequenceFiles and File Compression
Common MapReduce Algorithms
·
Sorting and Searching Large Data Sets
·
Performing a Secondary Sort
·
Indexing Data
·
Hands-On Exercise: Creating an
Inverted Index
·
Computing Term Frequency — Inverse
Document Frequency
·
Calculating Word Co-Occurrence
·
Hands-On
Exercise: Calculating Word
·
Co-Occurrence
o
Hands-On Exercise: Implementing Word
Co-Occurrence with a Customer WritableComparable
Joining Data Sets in MapReduce Jobs
·
Writing a Map-Side Join
·
Writing a Reduce-Side Join
Integrating Hadoop into the Enterprise Workflow
·
Integrating Hadoop into an Existing
Enterprise
·
Loading Data from an RDBMS into HDFS
by Using Sqoop
·
Hands-On Exercise: Importing Data
with Sqoop
·
Managing Real-Time Data Using Flume
·
Accessing HDFS from Legacy Systems
with FuseDFS and HttpFS
Machine Learning and Mahout
·
Introduction to Machine Learning
·
Using Mahout
·
Hands-On Exercise: Using a Mahout
Recommender
An Introduction to Hive and Pig
·
The Motivation for Hive and Pig
·
Hive Basics
·
Hands-On Exercise: Manipulating Data
with Hive
·
Pig Basics
·
Hands-On Exercise: Using Pig to
Retrieve Movie Names from Our Recommender
·
Choosing Between Hive and Pig
An Introduction to Oozie
·
Introduction to Oozie
·
Creating Oozie Workflows
·
Hands-On Exercise: Running an Oozie
Workflow
Thanks
Naresh
Learning Hub2nd Floor, Above HDFC Bank
Next to Noble Polyclinic
MAGARPATTA CITY
PUNE - 411013
PH: 9325793756
PUNE - 411013
PH: 9325793756
Skype id : learning.hub01
Email: learninghub01@gmail.com