Today we are announcing that our multi-catalog synchronization feature now integrates with the Snowflake catalog, Databricks Unity Catalog, and Google Data Catalog. With the world’s most open data lakehouse - what we call the Universal Data Lakehouse - Onehouse is making it possible for anyone to query a single copy of data from almost any cloud query engine of their choice.
With this announcement, the Onehouse multi-catalog synchronization feature now syncs table metadata to all the following catalogs:
This news has made, well, the news; Datanami has a solid write-up, posted this morning.
Onehouse is dedicated to making data open. Many of today’s popular data platforms are fully integrated analytics systems that lock user data into their systems for data storage, management, processing, and querying. Cloud data warehouses are good examples of these integrated systems. For example, Snowflake was built from the ground up with tight integration of storage, management, and processing, with its SQL layer.
Every day, customers and prospects share with us their desire to avoid lock-in to individual tools and platforms. More importantly, customers are increasingly investing more in data use-cases outside of traditional business analytics and reporting. For example, SiliconAngle highlighted last year that nearly 40% of Snowflake customers are also running Databricks. And nearly 50% of Databricks customers are also Snowflake customers. This overlap indicates that a significant number of organizations are ultimately duplicating their data across multiple proprietary platforms, with all the associated governance and maintenance burdens.
“The increasing overlap between Snowflake and Databricks can be seen as a response to these companies’ realization that to extract maximum value from their data, they need to address both business intelligence and AI/ML workloads,” Dave Vellante and George Gilbert wrote at the time.
To enable best-of-breed data tools for such diverse data workloads, organizations need an open data platform that seamlessly interoperates with different data formats, catalogs and compute engines. With an open platform, they can unlock universal data access from any of the many new and emerging downstream engines, while avoiding painful proprietary data silos. They can reuse data across use cases and processing frameworks without duplicating it, and they can migrate in and out of commercial offerings. Imagine using Snowflake for BI and reporting, and Databricks for AI/ML, and Google Cloud Platform for data engineering - all on a single copy of data, that is transformed, managed, and optimized in a single “source of truth.”
Without this approach, governing and maintaining pipelines and silos across multiple platforms becomes very complex.
Open data architectures require the use of open table formats such as Apache Hudi, Apache Iceberg, or Delta Lake. These open table formats, along with open file formats such as Apache Parquet, are what free data from proprietary storage formats and allow the use of multiple query engines.
Yet, while open formats are necessary, they are not sufficient:
Multi-catalog synchronization is an ideal solution to complement the open format, open data services, and catalog interoperability principles on which Onehouse is built. In short, it makes it simple for Onehouse users to set up a data pipeline once, while making the data from that pipeline accessible across a number of query engines. This is a significant differentiator against other ETL/ELT solutions, which often integrate with only a single catalog, limiting the data owner's choices.
Once you onboard to Onehouse, you can leverage the Multi-Catalog Sync Tool with only a few clicks.
Hopefully it’s clear that Onehouse is committed to open data across all three critical pillars: format interoperability; catalog interoperability, and open data services. And we are committed to make an open data architecture widely available and easy to use. Working with data should be about creating insights - not about padding vendors’ pockets. Interested in experiencing an alternative, open approach to vendor lock-in? Give Onehouse a try.
Be the first to read new posts