At this year’s Open Source Data Summit, Vinoth Chandar, founder and CEO of Onehouse, originator of the data lakehouse architecture, and long-time expert in open source data infrastructure, shared insights on the rising trend of “unbundling” data platforms. This shift empowers organizations to build modular, interoperable data ecosystems tailored to their unique needs. Drawing on his experiences with data infrastructure during the hyper-growth stages of companies like LinkedIn and Uber, Vinoth discussed the limitations of traditional data systems and the advantages of a new, open data lakehouse approach, and shared examples of other companies that embraced this approach, such as Walmart and Notion.
The Unbundling Movement in Data Platforms
Unbundling describes the breaking down of the independent modules of a product into multiple products, each specialized in its function. It’s an approach we’ve seen shape industries over the years; for example, broad service platforms such as Craigslist evolved into tailored products, like Zillow for real estate and Upwork for freelancing. Unbundling a data platform, Chandar explained, “means decoupling storage, compute query engines, and all the different data tools that you're using to interoperate seamlessly with one another.”
“It's like building your data platform with Lego blocks—each block representing a trend or technology you should consider to create a flexible, custom fit.” - Vinoth Chandar, Founder of Onehouse
This modular approach is a major shift from the traditional, all-in-one “bundled” data warehouse model, which typically groups storage, compute, and query features into a single, proprietary platform. By decoupling these components, data platforms can reduce or eliminate dependence on proprietary vendor offerings, allowing organizations to select the best tools for each use case.
The Problem With Bundled Data Platforms
But if bundled data platforms worked fine for years, why are they no longer a valid solution? Bluntly stated, bundled data platforms lock you in.
It wouldn’t be so bad if they were perfect for every use case. Unfortunately, they are not. For example, the data warehouse has decades of optimizations built in for traditional reporting. But if your use case is around real-time analytics, machine learning, or GenAI, you will need a platform optimized for fresher data at a much larger scale.
Some organizations facing these use cases and their inherent challenges then turn to multiple bundled data platforms. For example, it is not surprising to see engineering or IT organizations working with a data warehouse such as Snowflake, a real-time analytics platform such as Clickhouse, a vector database such as Pinecone, and a catch-all data lake on Amazon S3.
“With significant investments in AI today, choosing a data platform that doesn't scale well for vector embeddings and requires constant updating can quickly burn a big hole in your budget.” - Vinoth Chandar, Founder of Onehouse.
In large organizations with varied services, each with their own querying and analytics tools, the use of bundled platforms leads to duplicate pipelines, extra integration work, and significant management overhead. Traditional bundled platforms also pose risks around vendor lock-in. Proprietary solutions can tie organizations to a single query engine and storage platform, making it hard to adopt new tools as needs evolve.
An open, unbundled platform enables teams to choose from various storage formats (e.g., Apache Hudi, Apache Iceberg) and computing environments (e.g., Kubernetes), ensuring that organizations can adopt new engines or frameworks as they become available and necessary for future offerings.
Blueprint for The Next Generation of Data Platforms
Vinoth outlined a blueprint for an unbundled internal data platform with several defining goals:
Building an Unbundled Data Platform
The flexibility of an unbundled data platform allows companies to build custom data infrastructures, combining tools based on specific needs and optimizing cost and performance without major vendor dependencies. Chandar broke down the specific components that make up an unbundled platform, from storage to the analytics engine.
Unbundling in the Real World
Several prominent companies illustrate the benefits of unbundling data platforms through the use of an open data lakehouse architecture:
The Path Forward: Creating Future-Ready Data Platforms
The future of data platforms will see more support for unstructured data and catalog interoperability. Onehouse, for example, is developing efficient data storage formats optimized for new AI needs. “We need to blend and add a lot of support for unstructured data formats. And I think that'll complete the picture and make this layer, the storage layer, support all kinds of data,” Vinoth explained.
Vinoth encouraged attendees to embrace open data lakehouses for the next generation of their data platform.
“If you're considering your next data platform, build it on an open data lakehouse. Unbundle your storage from any one engine. Engines will change all the time. Provide a modular architecture where the single source of truth on cloud storage is accessible from any engine that you may need now or in the future.” - Vinoth Chandar, Founder of Onehouse.
The shift to unbundling is about creating a foundation that supports innovation and interoperability without sacrificing flexibility or incurring unnecessary costs. With unbundled platforms, companies can take advantage of best-in-class tools for each of their diverse use cases, while maintaining the adaptability to evolve with the data landscape. “It's very important to invest the time early on,” Vinoth emphasized, “so we can advance the technology, building the data platform without going through peaks and troughs.”
Be the first to read new posts