Course Details
Today’s businesses need distributed decision making where every stakeholder, from a business analyst to the store manager to a frontline operator needs to analyze data and take decisions. Those decisions involve analyzing vast amounts of data that originate from wide variety of data sources such as transactions, CRM, marketing, mobile and web.
Amazon Redshift is a fully managed Cloud Data Warehousing that seamlessly scales with high performance and throughput to analyze vast amounts of data so that you can build powerful reports for your business intelligence or derive operational analytics from your business events.
In this workshop, you will learn how you can use Amazon Redshift to analyze your data to derive business intelligence. You will learn the distributed architecture of Amazon Redshift, how to bring data to the data warehouse, best practices for performance and managed capabilities that reduces your operational overheads. In addition, you will also learn how Redshift can seamlessly extend to your data lake allowing you to analyze all the data available in your data lake.
Course Outline
DAY 1
Module 1: Overview of Amazon Redshift
- Course Introduction
- Introduction to Data Warehousing
- Amazon Redshift architecture, its components and features
Module 2: Table Design Concepts
- Deep Dive into Distribution Styles and Sort Keys
- Understanding Data Compression
- How to choose distribution styles and sort keys for different workloads
- Loading data into the Cluster
Lab 1: Launching a Redshift cluster, loading data and running queries
Module 3: Managing your Redshift cluster
- Choosing Redshift node types
- Pause, Resume and Elastic Resizing your cluster
- Backups and Disaster Recovery for your cluster
DAY 2
Module 4: Managing Workloads on your Cluster
- How to manage different workloads in your cluster?
- Automatic and Manual Workload Management
- Short Query Acceleration and Assigning queries to queues
- Concurrency Scaling
Module 5: Extend to your data lake using Redshift Spectrum
- Overview of Redshift Spectrum and its architecture
- Best practices for Redshift Spectrum performance
Lab 2 : Use Redshift Spectrum to query Data Lake
Module 6: Maintaining your cluster
- Monitoring query performance and analyzing workload performance
- Redshift Advisor
- System tables to analyze cluster performance
Module 7: Security
- Data Protection
- Managing access to cluster
- Infrastructure & Network Security
Quiz and Course Wrap Up
- Quiz
- Summary of 2 days
- Further learning resources
Course Duration
2 Days
Key Takeaways
- Understand Amazon Redshift distributed architecture for scale and performance
- How Amazon Redshift seamlessly integrates with Data Lakes
- Processing structured and semi structured data
- Best practices for performance and cost
Key Services that you will learn
- Amazon Redshift
- Amazon Redshift Spectrum
- AWS Glue (Data Catalog)
Prerequisites
- Beginner’s knowledge of the AWS Platform and its key services like EC2, S3
- Basic hands-on experience with the AWS management console will be a plus
- Prior experience in Data Warehousing concepts and any Data Warehousing product will be an advantage
Intended Audience
- Data / Database Architects, developers and engineers
- Data Analysts / Scientists
- Solution Architects
- Data platform owners