November 16, 2023

The First Open Source Data Summit is a Hit!

Written by:

Floyd Smith

The First Open Source Data Summit is a Hit!

We are all still abuzz from the first ever Open Source Data Summit, a live virtual event that attracted thousands of registrants from around the world. More than 30 speakers contributed on the past, present, and future use of open source in innovating with, and extracting value from the ever-growing tsunami of data that is sweeping across – and increasingly helping to define - our world, from run-of-the mill business reporting to the latest innovations in generative AI.

*Open Source Data Summit was more robust than your typical 1.0 release.*

As the founding co-chair of the Summit, we here at Onehouse believe that open source software will increasingly power the core infrastructure on which next-generation data services are built. The world in which data engineers, data scientists, data analysts, and many others do their work will be an increasingly open world, and we helped launched this new event to make that happen faster and better. As part of our shared commitment to this proposition, the Summit starred many open source leaders, committers, and contributors.

Don’t worry if you missed the conference live, there are recordings coming soon if you visit https://opensourcedatasummit.com! On the day, up to three breakout sessions were running at a time, so no one could have possibly gotten to see everything live. All the sessions are available online for you to watch at your own pace.

‍

Kicking Off with An Overview of OSS

Onehouse Founder and CEO Vinoth Chandar led off the day with an overview of the role of open source in data infrastructure. “As someone who fumbled his way through building Uber's early data infrastructure…,” he shared, and brought the audience with him through a description of the growing role of OSS. Visit Vinoth's keynote.

*Onehouse is strongly involved in promoting open data software and services.*

Gathering at the OneTable

The Summit saw the first public discussion of OneTable, a recently open sourced project that allows omni-directional interoperability across Apache Hudi, Apache Iceberg, and Delta Lake. (For more on lakehouse projects, visit our comparison blog post.) OneTable was built and is currently co-owned across a partnership of Onehouse, Microsoft and Google. Click to view the OneTable panel discussion.

Journalists had the opportunity to interview senior leaders of these companies. Here are some quotes of what they had to say in the VentureBeat article:

Vinoth Chandar, CEO Onehouse

"Throughout this year, we’ve been working with our customers as well as with Google and Microsoft and a bunch of different folks to broaden the idea and bring more form and shape to it."

Raghu Ramakrishnan, CTO Azure Data

"Ultimately, my real hope here is that together, we can create an ecosystem where customers can go to whatever is the best solution without being shackled by the underlying data."

Gerrit Kazmaier, VP/GM Analytics Google

“There are free and open formats like Iceberg, but then there may be other workloads running that depend on a different format that is not your chosen primary file format. That’s where OneTable helps; it’s kind of like a Babelfish.”

The fan favorite live quote during the OSDS session was from Tim Brown at Onehouse:

“You don’t want to be left wondering what your life would have been if you had chosen the other format.”

*OneTable brings all the major data lakehouse formats to one, well, table.*

What a Panel!

The leadership panel assembled open source pioneers from – wait for it – Confluent, Google, LinkedIn, Microsoft, Onehouse, Starburst, and Uber. Chaired by Onehouse CEO Vinoth Chandar, the panel discussed “The Growing Role of Open Source Technology in Today’s Data Architectures.” Click to visit the leadership panel.

A few key quotes:

Raghu Ramakrishnan, CTO for Data at Microsoft:
“Open source is the best way we have of creating standards. Customers love it because it insulates them from vendor lock-in.”
Kapil Surlaker, VP Engineering at LinkedIn:
“As an engineer, you benefit (on OSS projects) from exposure to a broader community.”
Praveen Nepalli Naga, VP of Engineering & Data Science at Uber:
“During the financial crisis of 2009, we could hardly keep the (LinkedIn) site up… that’s how Jay came up with Kafka.”
Justin Borgman, Chairman and CEO at Starburst:
“A lot of customers end up using data lakes to store most of their data, using Hudi, for example. We end up creating an alternative data warehouse.”
Jay Krebs, Co-Founder and CTO at Confluent:
In evaluating what to build vs. buy, “Think about where your best engineers are spending their time.”
Justin Levandovski, Director of Engineering at Google:
“BigQuery started its life as the original query engine for Google. We started hearing from customers that they wanted to build around open source formats.”

Near the end of the discussion, Raghu Ramakrishnan shared a conclusion: “We are evolving as an industry toward a risk architecture for a unified analytic portal.” We believe that people will be playing back this session on repeat for a long time to come.

Link to session coming soon.

‍

Previewing Apache Hudi 1.0

Apache Hudi, launched at Uber in 2016, is growing up – soon to reach the 1.0 milestone. Bhavani Sudha Saktheeswaran (widely known as Sudha) and Sagar Sumit, software engineers at Onehouse, led a session titled “Apache Hudi 1.0 preview: A database experience on the data lake.”

As one attendee put it in the comments, “Some of these terms / features are only associated with databases and have never been heard of before in the data lakes / lakehouse ecosystem. Hudi… leading the pack in such innovations!” Click to visit the panel discussion.

*No, it’s not the Brady Bunch - it’s a selection of speakers from Open Source Data Summit.*

From Giant Companies to Up-and-Comers

Including the digital leaders in the opening panel, the mega-companies who were (well) spoken for by presenters and panelists are Amazon, Google, Intuit, Microsoft, Netflix, Tesla, Uber, and Walmart. Our stellar speaker line-up also included data folk from up-and-coming and established companies including Acryl Data, Apna, Confluent, DataStax, Eastern Bank, InfluxData, Intuit, JobTarget, Lyra, Quix, Robinhood, Starburst, Tecton, and Wayfair.

And special thanks to our fellow sponsors: Acryl Data, ClickHouse, DataStax, InfluxData, Starburst, and Tecton.

*And now for a few words from our sponsors.*

Implementation - with Less Lamentation

Several sessions focused on putting the lakehouse to work; click to view a session:

Scaling and governing Robinhood's data lakehouse. Speakers: Balaji Varadarajan, Senior Staff Software Engineer @ Robinhood and Pritam Dey, Technical Lead @ Robinhood. Financial services companies are well-known for tech innovation, and Robinhood is a lively and current example.
Enabling Walmart's data lakehouse with Apache Hudi. Speakers: Ankur Ranjan, Data Engineer III @ Walmart and Ayush Bijawat, Senior Data Engineer @ Walmart. This Fortune 1 company has a big open source Hudi installation? Heck yeah. As Walmart founder Sam Walton famously said, in the true spirit of open source, “One person seeking glory doesn't accomplish very much.”
Overhauling data management at Apna. Speakers: Sarfaraz Hussain, Senior Data Engineer @ Apna and Ronak Shah, Head of Data & Product @ Apna. Onehouse customer Apna transformed their data management operations (and a lot of data) with the help of Onehouse, creating a whole new data platform. As Apna describes, “Onehouse automatically streams raw data into your Hudi files on Amazon S3 or Google Cloud storage. Now is the time to make it queryable for downstream users. Then we make a silver table, and we use Onehouse streams to do that.”

*Apna puts the Onehouse data-lakehouse-as-a-service at the center of their data universe.*

Pushing Forward at the Cutting Edge

Several sessions concerned pushing technology, and the community, forward; click to view a session:

A petabyte-scale vector store for generative AI. Speaker: Patrick McFadin, VP Developer Relations @ DataStax. Ever wonder how those GPT things work? Now’s your chance to peek behind the curtain at the data infrastructure that powers these new wonders.
Diving into Uber's cutting-edge data infrastructure. Speaker: Girish Baliga, Director of Engineering @ Uber. A fresh voice from where the data lakehouse began, back in the mega-growth days of Uber, tells how the company puts it all together.
Panel: A discussion on batch vs. streaming vs. real-time data processing. Speakers: Eric Gonzalez, VP, Business Intelligence Architecture @ Eastern Bank; Vaishnavi Muraldihar, Data Engineer @ Intuit; Michael Del Balso, CEO & Co-Founder @ Tecton. Three speakers with three critically different perspectives on how to move data around, including a focus on cost.
Panel: A discussion about contributing to open source projects. Speakers: Nadine Farah, Head of Developer Relations @ Onehouse, Manfred Moser, Trino Contributor, David Anderson, Apache Flink Committer, and Bhavani Sudha Saktheeswaran, Apache Hudi PMC. This event gave back to the community in many ways, not least with this insightful session on how to contribute to open source projects. Manfred Moser described the joy of bringing a new contributor into Hudi: “You see the glow on their face when they get their first pull request merged.” And Bhavani Sudha Saktheeswaran, speaking from her long experience on Hudi, added, “Hudi gives you the freedom to explore.”

*Start here to contribute to open source!*

Conclusion

Creating and helping to put on an event like this is always a ton of work – and several tons of fun. The event organizers, Solution Monday, expertly walked all involved through the process. You can visit Open Source Data Summit, view Solution Monday's previous events, or reach out by email at astronaut@solutionmonday.com to participate in their future events.

Now that we've closed on this exciting day, we expect a raft of questions about our managed service offering, also called Onehouse. Onehouse is powered by Apache Hudi, and amplified by OneTable. If you’d like to know more, visit our website or contact us.

Authors

No items found.

The First Open Source Data Summit is a Hit!

Kicking Off with An Overview of OSS

Gathering at the OneTable

What a Panel!

Previewing Apache Hudi 1.0

From Giant Companies to Up-and-Comers

Implementation - with Less Lamentation

Pushing Forward at the Cutting Edge

Conclusion

Read More:

Using Apache Hudi Data with Apache Iceberg and Delta Lake

Moving Beyond Lambda: The Unified Apache Beam Model for Simplified Data Processing

What is Clustering in an Open Data Lakehouse?

Introducing Onehouse Compute Runtime to Accelerate Lakehouse Workloads Across All Engines

Accelerating Lakehouse Table Performance - The Complete Guide

Unbundling Your Data Platform: How Open Data Lakehouses are Changing the Game

The First Open Source Data Summit is a Hit!

Kicking Off with An Overview of OSS

Gathering at the OneTable

What a Panel!

Previewing Apache Hudi 1.0

From Giant Companies to Up-and-Comers

Implementation - with Less Lamentation

Pushing Forward at the Cutting Edge

Conclusion

Read More:

Using Apache Hudi Data with Apache Iceberg and Delta Lake

Moving Beyond Lambda: The Unified Apache Beam Model for Simplified Data Processing

What is Clustering in an Open Data Lakehouse?

Introducing Onehouse Compute Runtime to Accelerate Lakehouse Workloads Across All Engines

Accelerating Lakehouse Table Performance - The Complete Guide

Unbundling Your Data Platform: How Open Data Lakehouses are Changing the Game

Subscribe to the Blog