Resources
Webinars
Automated Performance Tuning for Apache Hudi™ on Amazon EMR
Learn from the Hudi experts on how you can optimize and accelerate your data lakehouse on Amazon EMR.
Implementing the fastest, most open data lakehouse for Snowflake ETL/ELT
Learn how you can leverage the most open data lakehouse to ingest, store, and transform data for Snowflake at a fraction of the cost, while enabling end-users to access data in Apache Iceberg, Apache Hudi, and Delta Lake formats.
Deliver an Open Data Architecture with a Fully Managed Data Lakehouse
Join this webinar to learn how Conductor and other organizations have rearchitected their data stacks with an open data platform to ensure that they can always work with any upstream data source and downstream query engine, future-proofing their stack to support the current wave of GenAI use cases - and whatever comes next.
Vector Embeddings in the Lakehouse: Bridging AI and Data Lake Technologies
Join NielsenIQ and Onehouse to explore the crucial role of vector embeddings in AI, and discover how Onehouse makes it easier than ever to generate and manage vector embeddings directly from your data lake.
Introducing Onehouse LakeView and Table Optimizer - Power Tools for the Data Lakehouse
Learn about LakeView, a free data lakehouse observability and management tool, and Table Optimizer, a managed service to optimize your data lakehouse tables in production.
Iceberg for Snowflake: Implementing the fastest, most open data lakehouse for Snowflake ETL/ELT
Learn how to ingest, store and transform your Snowflake data faster and for a fraction of the cost using fully-managed Iceberg tables.
NOW Insurance Uses Data and AI to Revolutionize an Industry
With Onehouse, NOW Insurance is Harnessing Data, Cutting Costs, and Driving Innovation
Universal Data Lakehouse: User Journeys from the World's Largest Data Lakehouse Users
Streaming Ingestion at Scale: Kafka to the lakehouse
Scale data ingestion like the world’s most sophisticated data teams, without the engineering burden
Implementing End-to-End CDC to the Universal Data Lakehouse
Learn how to replicate operational databases to the data lakehouse in a manner that is easy, fast, cost-efficient, and opens your data to multiple downstream engines
The Onehouse Universal Data Lakehouse Demo and Q&A
Learn all about the benefits of the universal data lakehouse architecture and see Onehouse in action for use cases such as Postgres change data capture!
OneTable Introduction and Live Demo
Tired of making tradeoffs between data lake formats? Learn how OneTable opens your data to any - or all - formats including Apache Hudi™, Delta Lake and Apache Iceberg, and see a live demo!
Open Source Data Summit
Join thought leaders from Onehouse, AWS, Confluent, Uber, Walmart, Tesla, Netflix and more as they discuss how open source projects have taken over as the standard for data architectures at companies of all sizes
Hello World! The Onehouse Universal Data Lakehouse™ Demo and Q&A
Join Onehouse Founder and CEO Vinoth Chandar for an overview and demo of the Onehouse platform
Hudi 0.14.0 Deep Dive: Record Level Index
Nadine Farah, an Apache Hudi™ Contributor, and Prashant Wason, the release manager for Hudi 0.14.0, delve deep into the groundbreaking record-level index feature
Deep Dive: Hudi, Iceberg, and Delta Lake
Join Onehouse Head of Product Management Kyle Weller as he discusses the ins and outs of the most popular open source lakehouse projects
White Papers
Apache Hudi: The Definitive Guide
Whether you've been using Hudi for years, or you’re new to Hudi’s robust capabilities, our early-release chapters of this O'Reilly Guide will help you build robust, open, and high-performing data lakehouses.
Apache Hudi: From Zero to One
Apache Hudi™ helped power Uber to global leadership. Organizations large and small, from Amazon to Walmart, have joined in, helping to create one of the livelier and more effective open source projects. Learn about Hudi's storage format, versatile capabilities in flexible reads and writes, its robust table services, and Hudi Streamer.
NOW Insurance’s Data Journey with Onehouse: Streamlining for Growth
NOW Insurance is a rapidly growing pioneer in the insurtech space. Read about why they chose the Universal Data Lakehouse architecture - and partnered with Onehouse to make it happen, fast.
The Journey to the Universal Data Lakehouse
Apna, Notion, Uber, Walmart, Zoom. What do these companies have in common? Aside from their businesses generating massive volumes of data - at high velocity - all of their teams have chosen the universal data lakehouse as a core component of their data stack and pipelines.
Building a Universal Data Lakehouse
You shouldn’t have to move copies of data around, never knowing which is the real source of truth for different applications such as reporting, AI/ML, data science, and analytics. Learn how the universal data lakehouse architecture architecture is reshaping how businesses like Uber, Walmart, and TikTok handle vast and diverse data with unparalleled efficiency.
Hudi vs. Delta Lake vs. Iceberg Comparison
The data lakehouse is gaining strong interest from organizations looking to build a centralized data platform. Many are struggling to choose between the three popular lakehouse projects: Hudi™, the original data lakehouse developed at Uber; Iceberg, developed at Netflix, and; Delta Lake, an open source version of the Databricks lakehouse. Learn about the goals and differences of each project.
Data Sheets
Onehouse Data Integration Guide
Onehouse Data Integration Guide -- Explore Onehouse’s comprehensive guide to discover a wealth of data integration and connector options tailored to your needs.
Introducing the Onehouse Universal Data Lakehouse
Combine the scalability and flexibility of data lakes with the stability and accessibility of data warehouses, and open it to your entire ecosystem.
Onehouse + Confluent = Limitless Real-Time Workloads
Build realtime workloads in minutes to power use cases across your entire ecosystem, including change data capture, analytics, AI and ML, and more.
Events
Demos
Video Overview: Introducing the Onehouse Universal Data Lakehouse
Onehouse builds on the data lakehouse architecture with a universal approach that makes data from all your favorite sources - streams, databases, and cloud storage, for example - available to all the common query engines, languages, and data lakehouse formats your data consumers use every day.
Ingest PostgreSQL CDC data into the lakehouse with Onehouse and Confluent
See how you can replicate Postgres tables into the lakehouse using Onehouse's new Confluent CDC source. This demo showcases fully automated integration with Onehouse and Confluent to provision and manages resources in Confluent to facilitate CDC data ingestion into the lakehouse.
Solution Guides
Bring Your Own Kafka for SQL Server CDC with Onehouse
This guide describes how to implement fully-managed change data capture (CDC) from a SQL Server database to a data lakehouse, using Confluent Cloud’s managed Kafka Connect, Confluent Schema Registry, and Onehouse.
Onehouse Managed Lakehouse Table Optimizer Quick Start
Read this guide to learn how to leverage integrated table services from Onehouse for optimizing read and write performance of Apache Hudi™ tables. Compaction, clustering, and cleaning are supported out-of-the-box features for Hudi tables optimized by Onehouse.
Synchronize PostgreSQL and your Lakehouse: CDC with Onehouse on AWS
In this guide, we will show you how to set up the Change Data Capture (CDC) feature of Onehouse to enable continuous synchronization of data from an OLTP database into a data lakehouse. We will be using Amazon RDS PostgreSQL as our example source database.
Integrate Snowflake with Onehouse
This guide helps users integrate the Snowflake Data Cloud with a fully managed data lakehouse from Onehouse.
Integrate Amazon Athena with your Onehouse Lakehouse
This guide shows how to seamlessly integrate Amazon Athena with your Onehouse Managed Lakehouse. This will allow you to power serverless analytics at scale on top of the data in your Lakehouse.
Ingest Data from your DynamoDB Tables into the Lakehouse using Kafka
This guide will show how to ingest data from DynamoDB Tables into your Onehouse Managed Lakehouse using Onehouse's deep integration with Kafka.
Cross-region Hudi Disaster Recovery using Savepoints
This guide provides a pattern for creating a cross-region disaster recovery solution for Hudi™ using savepoints - enabling highly resilient lakehousees.
AWS Lake Formation and Onehouse Integration Guide
Gain maximum value from your data lakehouse while ensuring robust security and tailored access control.
Build a Sagemaker ML Model on a Onehouse Data Lakehouse
Integrate the Onehouse Universal Data Lakehouse with Amazon Sagemaker to build machine learning models in near real-time.
Ingest PostgreSQL CDC Data into the Data Lakehouse using Onehouse
Replicate your operational PostgreSQL database to the Onehouse Universal Data Lakehouse with up-to-the-minute data.
Database Replication into the Lakehouse with Onehouse's Confluent CDC Source
Integrate Confluent and Onehouse to seamlessly replicate operational databases in near real-time.
Workshop
Building an Open Data Lakehouse on AWS S3 with Apache Hudi & Presto
The workshop will leverage TPC-DS dataset in volume of 10 GB to demonstrate the various capabilities of read and write with Hudi™ and Presto. The dataset will be made available at a common S3 location accessible to workshop attendees.
Case Studies
Olameter Harnesses the Power of a Fully Managed Universal Data Lakehouse
Olameter turned to Onehouse to bring years of historical XML data into a fully managed, open data lakehouse. They cut data ingestion and processing times by 10x, powering ML models and adding value to their data pipelines.
NOW Insurance’s Data Journey With Onehouse: Streamlining For Growth
NOW Insurance is a rapidly growing pioneer in the insurtech space. Read about why they chose the Universal Data Lakehouse architecture - and partnered with Onehouse to make it happen, fast.
Overhauling Data Management at Apna
Apna is the largest and fastest-growing site for professional opportunities in India. Read on to learn how they rearchitected their data infrastructure to move from daily batch workloads to near real-time insights about their business while reducing costs.
The Journey To The Universal Data Lakehouse
Apna, Notion, Uber, Walmart, Zoom. What do these companies have in common? Aside from their businesses generating massive volumes of data - at high velocity - all of their teams have chosen the universal data lakehouse as a core component of their data stack and pipelines.
Stay in the know
Be the first to hear about news and product updates