December 11, 2024

Introducing LakeView Insights and New Deployment Models

Written by:

Andy Walner

Introducing LakeView Insights and New Deployment Models

We are thrilled by the initial adoption of the free LakeView product, with leaders in the Apache Hudi community using the product to analyze, debug, and optimize their data lakehouse deployments. Simply put, if you manage Hudi tables, you should be using LakeView to monitor and optimize your data.

Today we are announcing several new features for LakeView that make it easier to get started and receive actionable insights to help you optimize your Apache Hudi tables!

LakeView Insights

Let’s face it – managing a data platform is stressful when your product and business rely on your data. As a data engineer, you’re constantly monitoring hundreds to thousands of tables, seeking answers to questions like:

“Are my data ingestion volumes higher or lower than expected? Are there spikes or drops in specific tables?”
“How is my ingestion latency, and are any tables lagging more than expected?”
“Are there issues in my storage layout that could lead to poor query performance?”

‍LakeView now answers these questions directly in your email inbox. Simply upload your table metadata, and you’ll receive regular updates on trends, problems, and optimization opportunities across all your Apache Hudi tables.

New Deployment Models

We have released new deployment models to make it easier than ever to get started with LakeView.

‍[NEW] Pull Model

🎬See the Pull Model demo here.

The Pull model is the easiest way for you to share your Hudi metadata with LakeView. Simply specify the folder paths for your data lake, and LakeView will generate an IAM template for you to grant read-access to the Hudi metadata files. LakeView will automatically pull the latest metadata, so your metrics are always up to date.

[IMPROVED] Push Model

🎬See the Push Model demo here.

LakeView will continue supporting the Push model, for folks who prefer to push metadata to LakeView on a recurring schedule with a self-managed process. You can still install the metadata extractor as a JAR, Docker Image, or Kubernetes Helm package, and run it in any AWS or GCP environment. We’ve also upgraded the LakeView UI to guide you through setting up the Push model.

‍[NEW] LakeView SyncTool

🎬See the LakeView SyncTool demo here.

The LakeView SyncTool makes it easy to upload your Hudi metadata as part of an existing process, such as the Hudi Streamer. This functions similarly to Hudi catalog integrations such as the DataHub SyncTool. Simply install and run a LakeView SyncTool JAR in your existing Hudi jobs to push metadata to LakeView.

‍Keeping Privacy Front and Center

Data privacy continues to be central to LakeView – in all three deployment models, LakeView analyzes only your Hudi metadata files. Base data files containing records are never accessed and never leave your private cloud. And we’ve added a new security brief to explain how LakeView preserves data privacy.

Try LakeView Today

Thanks to the Apache Hudi community for your continued support and feedback on LakeView. We are excited to continue adding new insights, and look forward to expanding support for additional table formats such as Apache Iceberg and Delta Lake.

If you’re interested in using LakeView for free, sign up here!

Authors

Andy Walner

Product Manager

Andy is a Product Manager at Onehouse, designing the next-generation data platform to power analytics & AI. Before Onehouse, Andy developed ads and MLOps products at Google, and served as the Founding Product Manager for an AI startup backed by Y Combinator. He previously graduated from University of Michigan with a degree in Computer Science & Engineering.

Introducing LakeView Insights and New Deployment Models

LakeView Insights