Industry
Media
Offering
Cloud Advisory Services
Workload
DC, Servers & Data
Cloud
AWS
Project scope
— Data lake architecture design
— Data transformation and storage in data lake
— Customized reports in PowerBI
About the client
The client is a leading media production and broadcasting company, subsidiary of a global media conglomerate. They have over 30 television channels, a digital business and a movie production business, reaching over 700 million viewers in India.
Business challenge
As part of their digital strategy, our client wanted to optimise user experience across channels — iOS and Android apps, Fire TV, web, and so on — based on user behaviour and preferences. This required a deeper understanding of customer behavioural patterns across platforms.
Presently, they were using Segment as the tool to collect around 6.5 billion records (20TB of raw data) of behavioural data from their 30 million online viewers every month from across sources.
In order to deliver a user-focussed digital viewing experience, the client needed
- Reliable storage, with protection against data corruption and other types of data losses
- Security against un-authorized data access
- Ease of finding a single record in billions (by efficiently indexing data)
- An advanced analytics engine that can help them derive and visualise meaningful insights from the client’s high volume and variety of data.
- All of this forming their single source of truth.
Solution
We, at 1CloudHub, enabled an enterprise data lake for all of the client’s data to reside in one place — preserving accuracy and timeliness of the data.
Leveraging our client’s existing mechanism to collect and feed data into the data lake, we created a pipeline with EMR (Elastic MapReduce) for data crunching or ETL (Extract, Transform, Load) and Power BI for self-service visualisation.
Our approach
Understand
Define
Design
Transform
Completion and reporting
01. Understand
- In collaboration with the client’s development team, we outlined the volume, velocity, veracity and variety of data.
02. Define
- We worked with the client’s business teams and domain experts to define reports in Power BI for the 18 use cases the client had identified.
03. Design
- We mapped data to corresponding reports and planned data transformation.
- Based on these, we designed and architected the data lake and pipeline necessary for Power BI.
- With the client’s sign-off, we deployed the solution on AWS cloud.
04. Transform
- Once the infrastructure was in place, our data engineering team performed the necessary ETL steps such as cleaning and consolidation to derive value from the raw data.
- We stored this in an S3 bucket as parquet formatted files.
- We imported transformed data as data-marts into AWS Redshift, to be used for Power BI reports.
05. Completion and reporting
- We delivered a summary of findings and recommendations for production deployment to bring the PoC to a meaningful closure.
Outcomes
Better
We enabled advanced analytics for data from up to a year — compared to the 3 months data as per agreement — to deliver the meaningful insights the business teams sought.
Faster
We crunched over 12 million records in under an hour, running more than 100 VMs concurrently in a cluster.
Cheaper
We delivered each report at a cost of $70. At this cost, we delivered an excellent price-to-performance ratio, driven by the spot fleet instances we used and our on-demand or pay-as-you-use cloud model.
A similar setup on-premise in a data centre would have cost the client 12,000 times more.
Looking forward
We are delighted to have helped the client create a centralized, analytics-ready repository for their Big Data and look forward to helping them meet their strategic goals using our cloud capabilities.