January 19, 2024

Using Onehouse CDC Streaming for Real-Time Retargeting

Using Onehouse CDC Streaming for Real-Time Retargeting

Operational databases such as PostgreSQL and MySQL are often ill-suited for large-scale analytics due to cost, storage limitations, and performance issues. A common use case for Onehouse is data replication for analytics via change data capture (CDC), supporting a wide range of use cases. For instance, in online advertising, marketers can efficiently retarget potential customers across various channels, while enhancing near real-time analytics for better customer understanding and personalized offers based on real-time behavioral insights.

Onehouse, a cloud-native managed universal data lakehouse platform, provides users with a fully managed data lakehouse, offering data warehouse-like data updating capabilities with the scalability, cost advantages, security benefits, and flexibility of a data lake. Onehouse natively integrates with streaming services such as Kafka and Debezium, enabling near real-time data ingestion into a data lakehouse. It automates data management services such as ingestion, performance tuning, indexing, observability, and data quality management, significantly accelerating data lake setup, and drastically reducing your operational effort and cost, all while maintaining data in open data table formats - Apache Hudi, Apache Iceberg, and Delta Lake. Customers can achieve read-write interoperability across all three table formats using the open source OneTable project.

E-commerce Retargeting Use Case

Let's introduce Company A, a fictional e-commerce enterprise, to illustrate the use case. Company A operates a dynamic website that attracts potential customers worldwide. The company tracks online user interactions, such as clicks and actions, in a PostgreSQL database. Company A's objective is to enhance ad purchasing efficiency through real-time analytics and use optimized online ad retargeting methods to immediately deliver personalized product ads with tailored content.

In the e-commerce sector, retargeting is a vital advertising strategy that allows brands to present ads based on users' past online behavior, purchase intentions, and preferences. When a user visits Company A's website, they engage in various actions, such as adding items to the shopping cart and viewing various offers. These interactions provide valuable insights into the customer’s needs and interests, enabling the company to fine-tune ads to these users. 

From a business standpoint, the choice of online retargeting campaigns like those undertaken by Company A is guided by specific objectives. Defining these goals is a crucial initial step, followed by continuous real-time monitoring of campaign performance to ensure it delivers the intended outcomes.

Synergy of Retargeting and Onehouse CDC Streaming

Onehouse's near real-time continuous streaming Change Data Capture (CDC) technology ensures that Company A can reduce data latency between their source database and the data lakehouse to as low as a minute. This capability paves the way for highly personalized marketing efforts, enabling Company A to deliver precisely tailored offers at the right moment, in the right context, and with a compelling message.

Through the use of Onehouse's technology, Company A has reduced data latency from 10+ hours to a minute and observed a substantial increase in the impact of their retargeting ads. These ads have not only significantly boosted the conversion rate but have also demonstrated the potential to substantially increase the revenue of their e-commerce business. Improved responsiveness can also impact customer satisfaction, time on site, and likelihood of making larger and repeat purchases. 

Drawing from years of experience in digital advertising, it's evident that this effectiveness extends beyond e-commerce and can be harnessed in various sectors, including online travel, classified ads, and price comparison websites, among others. 

Onehouse CDC Streaming Solutions

To support Company A's continuous replication of their PostgreSQL database into an analytics-ready data lakehouse, Onehouse offers several CDC streaming options tailored to their specific needs, including:

  • PostgreSQL CDC: You can continuously stream data directly from a PostgreSQL database to a Onehouse-managed lakehouse, without your having to manage streaming technologies such as Kafka. This is an excellent option to kickstart your lakehouse journey, as Onehouse takes care of managing both Debezium and Amazon MSK to implement streaming from the source database to the analytics data table. For more details, refer to the Onehouse blog post, Instantly Unlock Your CDC PostgreSQL Data on the Lakehouse using Onehouse.
  • Confluent CDC: Use your Confluent cluster to stream CDC data from relational databases such as PostgreSQL into a data lakehouse powered by Onehouse. Simply provide the details of your relational database and Confluent Cloud cluster, and Onehouse will handle ingestion by provisioning and managing Debezium within your Confluent Cloud account. Explore more at The Ultimate Data Lakehouse for Streaming Data using Onehouse Confluent.
  • Confluent Kafka: Continuously stream data directly from Confluent Kafka into your Onehouse-managed lakehouse. Here, you manage Debezium by streaming PostgreSQL CDC data into Confluent Kafka, and Onehouse takes over by applying the PostgreSQL CDC transformation when setting up a stream capture. Delve deeper into this approach at Powering Real-time Analytics with Confluent Kafka and Onehouse.

When you choose any Onehouse CDC streaming option, Onehouse automatically uses services provided within your cloud to leverage Debezium, Kafka, and Apache Spark, all while shielding you from the complexity, ongoing effort, and risk of human error associated with setting up and maintaining such a complex solution. 

Figure 1: Onehouse architecture for the CDC streaming solution.

Whether you're currently designing PostgreSQL CDC ingestion pipelines or seeking to escape the burdens of maintenance and on-call monitoring, Onehouse can deliver a seamless experience. To determine the optimal option based on your infrastructure, skillset, and preferences, please consult the following table:

Table 1: Responsibility matrix for Onehouse CDC solutions.

Conclusion

In this blog, we've demonstrated the power of Onehouse's Change Data Capture (CDC) streaming feature through the lens of an e-commerce example organization, Company A. This use case serves as a testament to how businesses, including e-commerce and other industries, can harness the power of CDC streaming to achieve near real-time data latency using a modern data lakehouse, keeping pace with the rapidly evolving demands of the modern business landscape. By leveraging this technology, businesses can target individuals at the optimal moment, in the right context, and with a tailored message, unlocking success in retargeting with Onehouse's CDC streaming for real-time marketing campaigns.

If you are ready to give Onehouse a try, or want to learn more, please visit the Onehouse listing on the AWS Marketplace or sign up for a Onehouse free trial

Authors
No items found.

Read More:

Automagic Data Lake Infrastructure
Onehouse Commitment to Openness
Introducing Onehouse
Apache Hudi - 2021 a Year in Review
Apache Hudi Z-Order and Hilbert Space Filling Curves

Subscribe to the Blog

Be the first to read new posts

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
We are hiring diverse, world-class talent — join us in building the future