Apache Parquet
Entry is same as: Parquet
Apache Parquet is an open source file format that stores data in column-based format, making it more useful for many analytics operations. This is in contrast to data stored in row-based formats, such as data stored in Avro, which are easier to use for record-keeping and for transactions.
Columnar file formats such as Parquet are often easy to compress using multiple approaches, yielding great savings in file size. Parquet files also contain metadata such as the minimum and maximum values in a specific column in a specified group of rows to make relevant analytics queries more efficient.
All data lakehouse projects use Parquet, due to the operational efficiencies yielded by smaller file sizes and the suitability of columnar files for many queries.
Related terms: Avro; data lakehouse; metadata
On the Onehouse website:
Stay in the know
Be the first to hear about news and product updates