Download Early Release ChapterS

Apache HudiTM: The Definitive Guide

Whether you've been using Hudi for years, or you’re new to Hudi’s capabilities, this guide will help you build robust, open, and high-performing data lakehouses.

Universal Data Lakehouse hero image

Apache HudiTM enables you to create and manage a data lakehouse; database-like capabilities - including efficient upserts, deletions, and incremental data processing - on the data lake. It is a revolutionary open source framework that transforms the way data engineers and data scientists interact with large-scale datasets.

These capabilities are implemented using metadata alongside data lake file storage. Read these early release chapters from the upcoming O’Reilly book to learn what is Apache Hudi (chapter 1 in the full book), getting started with Hudi (chapter 2 in the full book), how to write to Hudi (chapter 3 in the full book), how to efficiently read from Hudi (chapter 4 in the full book) and how to use Hudi Streamer for data ingestion (chapter 8 in the full book).

Coverage in the full ebook includes:

  • How to write to Hudi
  • Distributed query engines and the query lifecycle
  • Snapshot and time travel queries
  • Incremental queries in latest-state and change-data-capture modes