Practical lakehouse architecture

This concise yet comprehensive guide explains how to adopt a data lakehouse architecture to implement modern data platforms. It reviews the design considerations, challenges, and best practices for implementing a lakehouse and provides key insights into the ways that using a lakehouse can impact you...

Full description

Bibliographic Details
Main Author: Thalpati, Gaurav Ashok
Format: eBook
Language:English
Published: Sebastopol, CA O'Reilly Media, Inc. 2024
Edition:First edition
Subjects:
Online Access:
Collection: O'Reilly - Collection details see MPG.ReNa
Table of Contents:
  • Intro
  • Copyright
  • Table of Contents
  • Preface
  • Who Should Read This Book?
  • Why I Wrote This Book
  • Navigating This Book
  • O'Reilly Online Learning
  • Conventions Used in This Book
  • How to Contact Us
  • Acknowledgments
  • Chapter 1. Introduction to Lakehouse Architecture
  • Understanding Data Architecture
  • What Is Data Architecture?
  • How Does Data Architecture Help Build a Data Platform?
  • Core Components of a Data Platform
  • Why Do We Need a New Data Architecture?
  • Lakehouse Architecture: A New Pattern
  • The Lakehouse: Best of Both Worlds
  • Administration and Management
  • Business Outcomes
  • Lakehouse Architecture: The Default Choice for Future Data Platforms?
  • Key Takeaways
  • References
  • Chapter 3. Storage: The Heart of the Lakehouse
  • Lakehouse Storage: Key Concepts
  • Row Versus Columnar Storage
  • Storage-based Performance Optimization
  • Lakehouse Storage Components
  • Cloud Object Storage
  • File Formats
  • Table Formats
  • Key Design Considerations
  • Ecosystem Support
  • Community Support
  • Supported File Formats
  • Supported Compute Engines
  • Supported Features
  • Commercial Product Support
  • Understanding Lakehouse Architecture
  • Lakehouse Architecture Characteristics
  • Lakehouse Architecture Benefits
  • Key Takeaways
  • References
  • Chapter 2. Traditional Architectures and Modern Data Platforms
  • Traditional Architectures: Data Lakes and Data Warehouses
  • Data Warehouse Fundamentals
  • Data Lake Fundamentals
  • Modern Data Platforms
  • Finding Answers in the Cloud
  • Standalone Approach
  • Combined Approach
  • Expectations of Modern Data Platforms
  • Comparison: Data Warehouse, Data Lake, Lakehouse
  • Capabilities and Limitations
  • Implementation Activities
  • Implementing a Data Catalog: Key Design Considerations and Options
  • Using Hive metastore
  • Using AWS Services
  • Using Azure Services
  • Using GCP Services
  • Using Databricks
  • Key Takeaways
  • References
  • Chapter 5. Compute Engines for Lakehouse Architectures
  • Data Computation Benefits of Lakehouse Architecture
  • Independent Scaling
  • Cross-region, Cross-account Access
  • Unified Batch and Real-Time Processing
  • Enhanced BI Performance
  • Freedom to Choose Different Engine Types
  • Cross-zone Analysis
  • Compute Engine Options for Lakehouse Platforms
  • Open Source Tools
  • Current and Future Versions
  • Performance Benchmarking
  • Comparisons
  • Sharing Features
  • Key Takeaways
  • References
  • Chapter 4. Data Catalogs
  • Understanding Metadata
  • Technical Metadata
  • Business Metadata
  • How Metastores and Data Catalogs Work Together
  • Features of a Data Catalog
  • Search, Explore, and Discover Data
  • Data Classification
  • Data Governance and Security
  • Data Lineage
  • Unified Data Catalog
  • Challenges of Siloed Metadata Management
  • What Is a Unified Data Catalog?
  • Benefits of a Unified Data Catalog