Engineering Lakehouses with Open Table Formats

£26.99

Engineering Lakehouses with Open Table Formats

Build scalable and efficient lakehouses with Apache Iceberg, Apache Hudi, and Delta Lake

Data warehousing Computer science Parallel processing

Authors: Dipankar Mazumdar, Vinoth Govindarajan

Dinosaur mascot

Language: English

Published by: Packt Publishing

Published on: 26th December 2025

Format: LCP-protected ePub

ISBN: 9781836207221


Jump-start your journey toward mastering open data architectural patterns by learning the fundamentals and applications of open table formats

Key Features

Build lakehouses with open table formats using compute engines such as Apache Spark, Flink, Trino, and Python

Optimize lakehouses with techniques such as pruning, partitioning, compaction, indexing, and clustering

Find out how to enable seamless integration, data management, and interoperability using Apache XTable

Purchase of the print or Kindle book includes a free PDF eBook

Book Description

Engineering Lakehouses with Open Table Formats provides detailed insights into lakehouse concepts, and dives deep into the practical implementation of open table formats such as Apache Iceberg, Apache Hudi, and Delta Lake. You’ll explore the internals of a table format and learn in detail about the transactional capabilities of lakehouses. You’ll also get hands-on with each table format with exercises using popular computing engines, such as Apache Spark, Flink, Trino, and Python-based tools. The book addresses advanced topics, including performance optimization techniques and interoperability among different formats, equipping you to build production-ready lakehouses. With step-by-step explanations, you’ll get to grips with the key components of lakehouse architecture and learn how to build, maintain, and optimize them. By the end of this book, you’ll be proficient in evaluating and implementing open table formats, optimizing lakehouse performance, and applying these concepts to real-world scenarios, ensuring you make informed decisions in selecting the right architecture for your organization’s data needs.

What you will learn

Explore lakehouse fundamentals, such as table formats, file formats, compute engines, and catalogs

Gain a complete understanding of data lifecycle management in lakehouses

Learn how to systematically evaluate and choose the right lakehouse table format

Optimize performance with sorting, clustering, and indexing techniques

Use the open table format data with ML frameworks like TensorFlow and MLflow

Interoperate across different table formats with Apache XTable and UniForm

Secure your lakehouse with access controls and ensure regulatory compliance

Who this book is for

This book is for data engineers, software engineers, and data architects who want to deepen their understanding of open table formats, such as Apache Iceberg, Apache Hudi, and Delta Lake, and see how they are used to build lakehouses. It is also valuable for professionals working with traditional data warehouses, relational databases, and data lakes who wish to transition to an open data architectural pattern. Basic knowledge of databases, Python, Apache Spark, Java, and SQL is recommended for a smooth learning experience.

Show moreShow less