Download Early Release ChapterS
Apache HudiTM: The Definitive Guide
Whether you've been using Hudi for years, or you’re new to Hudi’s capabilities, this guide will help you build robust, open, and high-performing data lakehouses.
Apache HudiTM enables you to create and manage a data lakehouse; database-like capabilities - including efficient upserts, deletions, and incremental data processing - on the data lake. It is a revolutionary open source framework that transforms the way data engineers and data scientists interact with large-scale datasets.
These capabilities are implemented using metadata alongside data lake file storage. Read these two early release chapters from the upcoming O’Reilly book to learn how to efficiently read from Hudi (Chapter 4 in the full book) and how to use Hudi Streamer for data ingestion (Chapter 8 in the full book).
Coverage in the full ebook includes:
- How to write to Hudi
- Distributed query engines and the query lifecycle
- Snapshot and time travel queries
- Incremental queries in latest-state and change-data-capture modes
Like What you read? — Give Onehouse a test drive