Achieve Scalability and Cost Savings: Migrating Teradata to AWS Redshift Serverless for Transient Workloads

As data volumes grow exponentially, traditional data warehousing solutions often struggle to meet the increasing demands for scalability, performance, and advanced analytics.

Shifting to Amazon Redshift provides organizations with the potential for superior price-performance, enhanced data processing, faster query speeds, and seamless integration with technologies like machine learning (ML) and artificial intelligence (AI).

As organizations face increasing pressure to modernize their data infrastructure and harness the power of advanced analytics, migrating from traditional on-premises systems to cloud-based solutions has become a critical step. One key decision is transitioning from legacy platforms like Teradata to modern cloud data warehouses such as Amazon Redshift. The migration from Teradata to Redshift enables organizations to overcome the limitations of traditional systems by improving scalability, performance, and cost-efficiency.

Using AWS tools like the Schema Conversion Tool (SCT), we aim to seamlessly convert the schema, code, and data to ensure compatibility with Redshift while maintaining data accuracy and minimizing disruption to business operations.

Why migrate from Teradata to Amazon Redshift?

Cost-Efficiency: Redshift eliminates the high operational costs associated with Teradata. While Teradata can also be used in the cloud, Redshift is more cost-effective because it is a native AWS service, offering flexible pricing and better integration with other AWS services to optimize cost for transitive workloads.

Performance and Scalability: Redshift improved performance and scalability cater to growing data volumes and user loads effectively.

Vendor Independence: Migration liberates organizations from vendor lock-in, allowing strategic decision-making based on their unique needs.

As part of your modern data architecture transition strategy, the migration goal of a new Amazon Redshift based platform is to use the scalability, performance, cost-optimization, and additional lake house capabilities of Amazon Redshift, resulting in improving the existing data consumption experience.

Key Considerations between Teradata and Redshift

Feature	Teradata	Redshift
Architecture	Primarily designed as a traditional on-premises data warehouse, Teradata uses a shared-nothing architecture, which means each node in the cluster operates independently.	A cloud-based data warehouse that uses a columnar storage model and is optimized for OLAP workloads. It is built on a modified PostgreSQL engine and leverages cloud resources for scalability.
Scalability	Scaling typically requires significant hardware investment and can lead to longer lead times for capacity expansion.	Supports elastic scaling, allowing users to increase or decrease compute and storage resources independently and on-demand, accommodating fluctuating workloads more effectively.
Storage	Row-based (traditionally)	Columnar storage
Performance	Uses advanced indexing, partitioning, and join strategies, but performance tuning may require more manual intervention and expertise.	Employs columnar storage, data compression, and query optimization features like distribution styles and sort keys to enhance query performance automatically.
Cost	Higher TCO cost	Generally lower with pay-as-you-go
Security and compliance	Provides robust security features, but organizations must manage them themselves.	Offers built-in security features such as encryption at rest and in transit, along with compliance with various regulations.

Assessment and Planning stage:

What We Achieved Through the Teradata to Redshift Migration:

The main achievement of this migration is the data is for archival process, allowing users to access data with minimal usage during business hours.

Since the project involves an archival process and data storage is around 5TB approximate pertains to ETL (Extract, Transform, Load), we plan to conduct the migration in two phases: Static Schema and Online Schema.

Why We Chose Manual Migration from Teradata to Redshift Without CDC

We analysed the version compatibility between the current source Teradata version and the relevant versions of Redshift. This analysis was a crucial prerequisite for our planning process.

This migration does not involve a Change Data Capture (CDC) method, and the AWS DMS service is not supported. Instead, the data was manually migrated using the AWS Schema Conversion Tool (SCT).

ETL jobs are executed on both Static and Online schemas. Static schemas refer to tables that do not undergo changes, while Online schemas include tables that experience transactional changes.

Initially, we migrated all the static tables, and at the time of cutover, we migrated the online tables, which minimized the overall migration timeline.

The architecture diagram below, which outlines the flow for planning the migration from on-premises Teradata to AWS Redshift.

Migration Phase:

The migration phase is where the actual transfer of data and schema occurs. During this stage, leveraging the right tools and strategies is crucial to ensure that the migration is efficient, minimizes downtime, and prevents data loss. We migrate the data in two phases: Static and Online schema.

Challenges Faced During Data Migration to Redshift:

Data type lengths in the source did not match the specifications in Redshift, causing potential truncation or errors.

Mismatched collation settings between the source and Redshift led to inconsistencies in string comparisons.

Issues with the JDBC driver resulted in connectivity problems during the migration process.

The Schema Conversion Tool (SCT) became unresponsive at critical points during migration.

The migration process halted due to session timeouts in the background, leading to incomplete data transfers.

Migrating large tables, approximately 20GB in size, caused significant delays and stalls during the process.

Some non-nullable fields had missing data after migration due to mapping issues or null values in the source data.

Migration time consuming was high

Issue Fixes for Migration Challenges

We identified the issue in the Schema Conversion Tool (SCT) related to data type length. To resolve this, we modified the string length twice: first to match the source DDL and then to conform to the target DDL requirements.

To avoid collation errors, we changed the database/schema to be case-insensitive.

For large tables exceeding 20GB, we utilized the virtual partitioning option and applied a date column filter to effectively migrate the data.

In Windows, we disabled session timeouts to ensure uninterrupted data migration during background processes.

To reduce the time required for migration, we provisioned three additional high-resource instances, allowing us to migrate data in parallel and meet the project timeline.

We adjusted the minimum and maximum memory values in the SCT configuration file, resulting in an increase in the speed of the data migration process.

Validation Phase:

Validate (Compare row count of every table) the data in the Redshift database against the source Teradata database to ensure consistency.

Cut-over phase:

Plan a maintenance window to perform the final data migration and switch over to Amazon Redshift as the production database.

Update application connection strings and configurations to point to the new Redshift cluster.

Optimization Strategies for Teradata to Redshift Migration:

Leverage Columnar Storage and Compression: Redshift uses columnar storage, which is highly efficient for analytical queries. Take advantage of Redshift’s automatic data compression to reduce storage costs and improve query performance by minimizing I/O.

Optimize Data Distribution and Sort Keys: To enhance performance, strategically define distribution keys based on frequently joined tables and apply sort keys to optimize query patterns. This helps in reducing data movement during query execution and speeds up the process.

Utilize Redshift Spectrum for Data Lakes: For large, infrequently accessed datasets, consider using Redshift Spectrum to query data directly from Amazon S3 without having to load it into Redshift. This optimizes both storage costs and performance for big data queries.

Concurrency Scaling: To manage variable workloads efficiently, enable Redshift’s concurrency scaling feature. It automatically adds more capacity during peak times, ensuring high performance without over-provisioning resources during regular loads.

Workload Management (WLM) Tuning: Use Redshift’s workload management (WLM) to prioritize queries and optimize system resources. By defining different queues for various workloads, you can allocate system resources effectively, improving query throughput and response times.

Automatic Table Maintenance: Enable automatic table vacuuming and analyze functions to optimize query performance. Regularly vacuum and analyze your tables to remove outdated rows and update query statistics.

Monitor with Amazon CloudWatch and Redshift Advisor: Use CloudWatch for real-time monitoring of your cluster’s performance and Redshift Advisor to get automated insights and recommendations on optimizing your database setup, schema design, and workload execution.

Conclusion:

Migrating from on-premises Teradata to AWS Redshift serverless provides a transformative opportunity for organizations to modernize their data infrastructure, reduce costs, and unlock new analytical capabilities. By leveraging Redshift’s cloud-native architecture, advanced performance features, and seamless integration with AWS services, organizations can achieve superior scalability, faster insights, and more efficient data management. As you embark on this migration journey, careful planning, optimization, and continuous monitoring will ensure a successful transition that maximizes the benefits of a cloud-first data strategy.

Contact 1CloudHub today for expert guidance on automating your infrastructure, reducing costs, and enhancing performance. Let’s elevate your cloud strategy!

Date: 1/10/2024 – Written by