Download Early Release ChapterS
Apache HudiTM: The Definitive Guide
Whether you've been using Hudi for years, or you’re new to Hudi’s capabilities, this guide will help you build robust, open, and high-performing data lakehouses.

Apache HudiTM enables you to create and manage a data lakehouse; database-like capabilities - including efficient upserts, deletions, and incremental data processing - on the data lake. It is a revolutionary open source framework that transforms the way data engineers and data scientists interact with large-scale datasets.
These capabilities are implemented using metadata alongside data lake file storage. Read these early release chapters from the upcoming O’Reilly book to learn what is Apache Hudi (chapter 1 in the full book), getting started with Hudi (chapter 2 in the full book), how to write to Hudi (chapter 3 in the full book), how to efficiently read from Hudi (chapter 4 in the full book) and how to use Hudi Streamer for data ingestion (chapter 8 in the full book).
Coverage in the full ebook includes:
- How to write to Hudi
- Distributed query engines and the query lifecycle
- Snapshot and time travel queries
- Incremental queries in latest-state and change-data-capture modes