Download The Guide

Apache Hudi: Zero to One

Whether you are a pro with Hudi or new to the project, this guide collects a number of important considerations for you to take your data lakehouse to the next level.

Universal Data Lakehouse hero image

Apache Hudi is a revolutionary open source framework that transforms the way data engineers and data scientists interact with large-scale datasets. Hudi supports database-like capabilities - for example, efficient upserts, deletions, and incremental data processing - by creating and managing metadata alongside data lake file storage.

The combination of cutting-edge data management capabilities on top of data lake underpinnings is referred to as a data lakehouse. Read this eBook from Hudi PMC Member Shiyan Xu to learn about the design concepts behind Hudi, key considerations, and optimizations you can achieve when using Hudi.

Coverage includes:

  • Hudi's storage format
  • Versatile reads and writes
  • Table services - compaction, cleaning, indexing, and clustering
  • Incremental processing, for extremely efficient updates
  • Hudi Streamer, for rapid ingestion 
  • Ground-breaking features in the upcoming Hudi 1.0 release