Start Date: 05/04/2022
Course Overview
Today companies have the capability to collect large amount of data. Handling large amount of data requires new technologies that are able to collect, cleanse, process and store effectively significant amount of information.
Many companies reached the conclusion that not using this collected data is actually loosing large amount of money. Big Data market is estimated to surpass $200 billion this year.
There is a tremendous business in Big Data and with the right methodologies and tools this row Data can be available for use.
This course provide the basis for Big Data and NoSQL DB environment, architecture, process and available tools. The course will also present Big Data methodologies and deployment recommendations
Who should attend?
Project Managers, Product people and Managers, Developers and Architects who whats to know about Big Data.Prerequisite:
None
Course Outline:
1. Introduction
• Definition: Big Data, NoSQL
• The need for Big Data technology
• Tradition technologies Vs Big Data technologies
• Big Data project requirements
• Big Data Project workflow
2. Big Data Architecture
• Big Data project definitions
• Data sources & development resources
• Big Data technologies evaluation
– The need for POC
3. Data Collection & Ingestion
• Streaming Concept
– Rest API
• Apache Kafka
– AWS Kinesis, Azure Event Hub
• Apache Flume
• ELK package
– Logz.io
4. Hadoop – Introduction
• What is Hadoop?
• Hadoop Architecture
• Hadoop File System (HDFS)
– Architecture
– NameNode & DataNode
• Hadoop MapReduce
• Apache YARN
• Apache Oozie, ZooKeeper
• Project non-functional support
– Sentry, Tez, Ambri, Knox, Falcon
5. Project decision – Hadoop deployment
• Hadoop Distribution
– Examples: Cloudera, Hortonworks
• Hadoop as a service
• Can Big Data project switch environments?
• Hadoop deployment requirements
• Hadoop Performance Best Practices
6. Hadoop project POC
• POC environment
• Using Apache Pig! & Apache Sqoop for POC
• Apache Storm
• Apache Spark
– Concept & Architecture
– Programming with Spark
– Spark Streaming
– Spark SQL
– MLlib
– GraphX8. Project development cycle & deployment – Spark
• Big Data – Development methodologies
– Waterfall Vs Agile
• ETL development cycle & deployment
• Tests Cycle9. Big Data DB types
• Key-Values Stores
– Redis
• Column Family Stores (Wide Column Stores)
– Apache HBase
– Apache Cassandra
• Document Databases
– MongoDB
• Graph Databases
– Mathematical Graph as a DB
10. Big Data & RDBMS
• Product logic
• Apache Hive
– Architecture – Batch Processing
• Apache Impala
– Massively Parallel Processing (MPP)
11. Big Data Northbound Interfaces
• Big Data to OLAP
• BI Visualization
• Scaling BI over Big Data
12. Big Data – system ATP
13. Trends & Conclusions