Stay Connected

Computer and Information Technology

Orgnise our Workshop / Training?

Big Data and Hadoop

Big Data & Hadoop is 100% open or free source, and pioneered a fundamentally new way of storing and processing data. Instead of relying on expensive, proprietary hardware and different systems to store and process data, Hadoop enables distributed parallel processing of huge amounts of data across inexpensive, industry-standard servers that both store and process the data, and can scale without limits. With Hadoop, no data is too big. And in today’s hyper-connected world where more and more data is being created every day, Hadoop’s breakthrough advantages mean that businesses and organizations can now find value in data that was recently considered useless. The students would get to work on a Real Life Project on Big Data Analytics and gain hands-on project.

Day 1: Session 1

Fundamentals of Big data

Big Data Management
Distributed file systems for big data storage, access, and analytics
Frameworks and tools for big data cyber security analytics
Performance modeling, simulation and analysis
Big data applications in cyber security
Parallel and distributed algorithms for big data analytics
Big data case studies and applications

Day 1: Session 2

Different Components of Hadoop
Introduction to Apache Pig
Map Reduce vs. Apache Pig
SQL vs. Apache Pig
Different Data Types in Pig
Modes of Execution in Pig
Execution Mechanism
Transformations in Pig
How to Write a Simple Pig script
UDFs in Pig

HDFS (Hadoop Distributed File System)
Significance of HDFS in Hadoop
Features of HDFS
5 daemons of Hadoop

Day 2: Session 1

HIVE Introduction
HIVE Architecture
HIVE Meta Store
HIVE Integration with Hadoop

MapReduce Architecture
MapReduce Programming Mode
Different Phases of MapReduce Algorithm
Different Data Types in MapReduce
How to Write a Basic MapReduce Program
Joining Datasets In MapReduce Jobs – Map Joins and Reduce Joins
Creating Input and Output Formats in MapReduce Jobs
How to Debug MapReduce Jobs in Local and Pseudo Cluster Mode
Data Localization in MapReduce
Combiner (Mini Reducer and Partitioner)

Day 2: Session 2

Hbase Shell
Hbase General Cammand

What sqoop does
Data Imports
Parallel data transfer
Fast Data copeis

Hardware Kit: This workshop does not include any hardware kit.

- A working Laptop/PC with minimum of 4 GB RAM, 100 GB HDD, intel i3+ processor
- OS: Linux or Windows with VMware
- A Seminar Hall with sitting capacity of all participants along with charging plugs, proper ventilation
- Projector, Collar Mike and Speakers

- Digital toolkit of PPTs and study material for all participants
- Certificate of Participation for every participant.
- A competition will be organized at the end of the workshop and winners will be awarded by Certificate of Excellence.