Download The Guide

Apache Hudi™: Zero to One

Whether you are a pro with Hudi or new to the project, this guide collects a number of important considerations for you to take your data lakehouse to the next level.

Apache Hudi is a revolutionary open source framework that transforms the way data engineers and data scientists interact with large-scale datasets. Hudi supports database-like capabilities - for example, efficient upserts, deletions, and incremental data processing - by creating and managing metadata alongside data lake file storage.

The combination of cutting-edge data management capabilities on top of data lake underpinnings is referred to as a data lakehouse. Read this eBook from Hudi PMC Member Shiyan Xu to learn about the design concepts behind Hudi, key considerations, and optimizations you can achieve when using Hudi.

‍

Coverage includes:

Hudi's storage format
Versatile reads and writes
Table services - compaction, cleaning, indexing, and clustering
Incremental processing, for extremely efficient updates
Hudi Streamer, for rapid ingestion
Ground-breaking features in the upcoming Hudi 1.0 release