Big Data Hadoop Developer

Course Code: 0112



Cognixia’s Big Data Hadoop Developer training program highlights the key ideas and aims to build proficiency for managing Big Data with Apache’s open source platform – Hadoop. This course is designed to impart hands-on knowledge of MapReduce, Hadoop Architecture, Pig & Hive, Oozie, Flume and Apache workflow scheduler. Participants will also be familiarized with HBase, Zookeeper, and Sqoop concepts while working on industry-based use-cases and projects and leverage best practices to design big data environments for security and cost-effectiveness.

Schedule Classes

Looking for more sessions of this class?
Cognixia logo

Course Delivery

This course is available in the following formats:

Live Classroom
Duration: 14 days

Live Virtual Classroom
Duration: 14 days

What You'll learn

  • Learn to write complex codes in MapReduce on both MRv1 & MRv2 (Yarn)
  • Understand Hadoop architecture
  • Perform analytics and learn high-level scripting frameworks Pig & Hive
  • Build an advanced understanding of Hadoop system, including Oozie, Flume and Apache workflow scheduler
  • Gain familiarity with other concepts, such as Hbase, Zookeeper and Sqoop
  • Get hands-on expertise in numerous configurations surroundings of Hadoop cluster
  • Learn about optimization & troubleshooting
  • Acquire in-depth knowledge on Hadoop architecture and Hadoop Distribution file system (vHDFS one.0 & vHDFS a pair of.0)
  • Work on real-life projects


  • Introduction/ installation of Virtual Box and the Big Data VM
  • Introduction to Linux
  • Why Linux?
  • Windows and the Linux equivalents
  • Different flavors of Linux
  • Unity Shell (Ubuntu UI)
  • Basic Linux commands
  • 3V (Volume- Variety- Velocity) characteristics
  • Structured and unstructured data
  • Application and use cases of Big Data
  • Limitations of traditional large scale systems
  • How a distributed way of computing is superior (cost and scale)
  • Opportunities and challenges with Big Data
  • HDFS overview and architecture
  • Deployment architecture
  • Name Node
  • Data Node and Checkpoint Node (aka Secondary Name Node)
  • Safe mode
  • Configuration files
  • HDFS Data Flows (Read/Write)
  • CRC Checksum
  • Data Replication
  • Rack awareness and block placement policy
  • Small file problems
  • Command-Line Interface
  • File Systems
  • Administrative
  • Web Interfaces
  • Load Balancer
  • Dist CP (Distributed Copy)
  • HDFS Federation
  • HDFS High Availability
  • Hadoop Archives
  • MapReduce overview
  • Functional Programming paradigms
  • How to think in a MapReduce way
  • Legacy MR v/s Next Generation MapReduce (YARN/ MRv2)
  • Slots v/s Containers
  • Schedulers
  • Shuffling, Sorting
  • Hadoop Data Types
  • Input and Output Formats
  • Input Splits – Partitioning (Hash Partitioner v/s Customer Partitioner)
  • Configuration files
  • Distributed Cache
  • Standalone mode (in Eclipse)
  • Pseudo Distributed mode (as in the Big Data VM)
  • Fully Distributed mode (as in Production)
  • MR API
  • Old and the New MR API
  • Java Client API
  • Hadoop data types
  • Custom Writable
  • Different input and output formats
  • Saving Binary Data using Sequence Files and Avro Files
  • Hadoop Streaming (developing and debugging non Java MR programs – Ruby and Python)
  • Speculative execution
  • Combiners
  • JVM Reuse
  • Compression
  • Sorting
  • Term Frequency
  • Inverse Document Frequency
  • Student Database
  • Max Temperature
  • Different ways of joining data
  • Word Co-occurrence
  • Click Stream Analysis using Pig and Hive
  • Analyzing the Twitter data with Hive
  • Further ideas for data analysis
  • HBase Data Modeling
  • Bulk loading data in HBase
  • HBase Co-processors – Endpoints (similar to Stored Procedures in RDBMS)
  • HBase Co-processors – Observers (similar to Triggers in RDBMS)
  • PageRank
  • Inverted Index
  • Introduction and Architecture
  • Different modes of executing Pig constructs
  • Data Types
  • Dynamic Invokers
  • Pig streaming Macros
  • Pig Latin language Constructs (LOAD, STORE, DUMP, SPLIT, etc.)
  • User-Defined Functions
  • Use Cases
  • NoSQL Databases – 1 (Theoretical Concepts)
  • NoSQL Concepts
  • Review of RDBMS
  • Need for NoSQL
  • Brewers CAP Theorem
  • ACI D v/s BASE
  • Schema on Read vs. Schema on Write
  • Different levels of consistency
  • Bloom filters
  • Key Value
  • Columnar, Document
  • Graph
  • HBase Architecture
  • Master and the Region Server
  • Catalog tables (ROOT and META)
  • Major and Minor Compaction
  • Configuration Files
  • HBase v/s Cassandra
  • Java API
  • Client API
  • Filters
  • Scan Caching and Batching
  • Command Line Interface
  • Introduction to RDD
  • Installation and Configuration of Spark
  • Spark Architecture
  • Different interfaces to Spark
  • Sample Python programs in Spark
  • Use-case of YARN
  • YARN Architecture
  • YARN Demo
  • Use-case of Oozie
  • Oozie Architecture
  • Oozie Demo
  • Use-case of Flume
  • Flume Architecture
  • Flume Demo
  • Use-case of Sqoop
  • Sqoop Architecture
  • Sqoop Demo
  • Cloudera Hadoop cluster on the Amazon Cloud (Practice)
  • Using EMR (Elastic Map Reduce)
  • Using EC2 (Elastic Compute Cloud)
  • Stand-alone mode (Theory)
  • Distributed mode (Theory)
  • Pseudo distributed
  • Fully-distributed
  • Hadoop industry solutions
  • Importing/exporting data across RDBMS and HDFS using Sqoop
  • Getting real-time events into HDFS using Flume
  • Creating workflows in Oozie
  • Introduction to Graph processing
  • Graph processing with Neo4J
  • Using the Mongo Document Database
  • Using the Cassandra Columnar Database
  • Distributed Coordination with Zookeeper
View More


To pursue Cognixia’s Big Data Hadoop Developer course, it would be beneficial for participants to have a basic understanding of core Java. However, it is not mandatory.

Who Should Attend

The Big Data Hadoop Developer Training program is best suited for professionals in the field of IT, business analytics, data management, and anyone looking to make a career in Big Data. The Big Data Hadoop Developer training program is highly recommended for current and aspiring:

  • Software developers and architects
  • Analytics professionals
  • Senior IT professionals
  • Testing and Mainframe professionals
  • Data management professionals
  • Business intelligence professionals
  • Project managers
  • Aspiring data scientists
  • Graduates looking to build a career in Big Data Analytics

Interested in this course? Let’s connect!


Participants will be awarded with an exclusive certificate upon successful completion of the program. Every learner is evaluated based on their attendance in the sessions, their scores in the course assessments, projects, etc. The certificate is recognized by organizations all over the world and lends huge credibility to your resume.