Modern Data Warehousing

Course Details

Today’s businesses need distributed decision making where every stakeholder, from a business analyst to the store manager to a frontline operator needs to analyze data and take decisions. Those decisions involve analyzing vast amounts of data that originate from wide variety of data sources such as transactions, CRM, marketing, mobile and web.

Amazon Redshift is a fully managed Cloud Data Warehousing that seamlessly scales with high performance and throughput to analyze vast amounts of data so that you can build powerful reports for your business intelligence or derive operational analytics from your business events.

In this workshop, you will learn how you can use Amazon Redshift to analyze your data to derive business intelligence. You will learn the distributed architecture of Amazon Redshift, how to bring data to the data warehouse, best practices for performance and managed capabilities that reduces your operational overheads. In addition, you will also learn how Redshift can seamlessly extend to your data lake allowing you to analyze all the data available in your data lake.

Course Outline

DAY 1

Module 1: Overview of Amazon Redshift

Course Introduction
Introduction to Data Warehousing
Amazon Redshift architecture, its components and features

Module 2: Table Design Concepts

Deep Dive into Distribution Styles and Sort Keys
Understanding Data Compression
How to choose distribution styles and sort keys for different workloads
Loading data into the Cluster

Lab 1: Launching a Redshift cluster, loading data and running queries

Module 3: Managing your Redshift cluster

Choosing Redshift node types
Pause, Resume and Elastic Resizing your cluster
Backups and Disaster Recovery for your cluster

DAY 2

Module 4: Managing Workloads on your Cluster

How to manage different workloads in your cluster?
Automatic and Manual Workload Management
Short Query Acceleration and Assigning queries to queues
Concurrency Scaling

Module 5: Extend to your data lake using Redshift Spectrum

Overview of Redshift Spectrum and its architecture
Best practices for Redshift Spectrum performance

Lab 2 : Use Redshift Spectrum to query Data Lake

Module 6: Maintaining your cluster

Monitoring query performance and analyzing workload performance
Redshift Advisor
System tables to analyze cluster performance

Module 7: Security

Data Protection
Managing access to cluster
Infrastructure & Network Security

Quiz and Course Wrap Up

Quiz
Summary of 2 days
Further learning resources

Course Duration

2 Days

Key Takeaways

Understand Amazon Redshift distributed architecture for scale and performance
How Amazon Redshift seamlessly integrates with Data Lakes
Processing structured and semi structured data
Best practices for performance and cost

Key Services that you will learn

Amazon Redshift
Amazon Redshift Spectrum
AWS Glue (Data Catalog)

Prerequisites

Beginner’s knowledge of the AWS Platform and its key services like EC2, S3
Basic hands-on experience with the AWS management console will be a plus
Prior experience in Data Warehousing concepts and any Data Warehousing product will be an advantage