Practical lakehouse architecture
This concise yet comprehensive guide explains how to adopt a data lakehouse architecture to implement modern data platforms. It reviews the design considerations, challenges, and best practices for implementing a lakehouse and provides key insights into the ways that using a lakehouse can impact you...
Main Author: | |
---|---|
Format: | eBook |
Language: | English |
Published: |
Sebastopol, CA
O'Reilly Media, Inc.
2024
|
Edition: | First edition |
Subjects: | |
Online Access: | |
Collection: | O'Reilly - Collection details see MPG.ReNa |
Table of Contents:
- Intro
- Copyright
- Table of Contents
- Preface
- Who Should Read This Book?
- Why I Wrote This Book
- Navigating This Book
- O'Reilly Online Learning
- Conventions Used in This Book
- How to Contact Us
- Acknowledgments
- Chapter 1. Introduction to Lakehouse Architecture
- Understanding Data Architecture
- What Is Data Architecture?
- How Does Data Architecture Help Build a Data Platform?
- Core Components of a Data Platform
- Why Do We Need a New Data Architecture?
- Lakehouse Architecture: A New Pattern
- The Lakehouse: Best of Both Worlds
- Administration and Management
- Business Outcomes
- Lakehouse Architecture: The Default Choice for Future Data Platforms?
- Key Takeaways
- References
- Chapter 3. Storage: The Heart of the Lakehouse
- Lakehouse Storage: Key Concepts
- Row Versus Columnar Storage
- Storage-based Performance Optimization
- Lakehouse Storage Components
- Cloud Object Storage
- File Formats
- Table Formats
- Key Design Considerations
- Ecosystem Support
- Community Support
- Supported File Formats
- Supported Compute Engines
- Supported Features
- Commercial Product Support
- Understanding Lakehouse Architecture
- Lakehouse Architecture Characteristics
- Lakehouse Architecture Benefits
- Key Takeaways
- References
- Chapter 2. Traditional Architectures and Modern Data Platforms
- Traditional Architectures: Data Lakes and Data Warehouses
- Data Warehouse Fundamentals
- Data Lake Fundamentals
- Modern Data Platforms
- Finding Answers in the Cloud
- Standalone Approach
- Combined Approach
- Expectations of Modern Data Platforms
- Comparison: Data Warehouse, Data Lake, Lakehouse
- Capabilities and Limitations
- Implementation Activities
- Implementing a Data Catalog: Key Design Considerations and Options
- Using Hive metastore
- Using AWS Services
- Using Azure Services
- Using GCP Services
- Using Databricks
- Key Takeaways
- References
- Chapter 5. Compute Engines for Lakehouse Architectures
- Data Computation Benefits of Lakehouse Architecture
- Independent Scaling
- Cross-region, Cross-account Access
- Unified Batch and Real-Time Processing
- Enhanced BI Performance
- Freedom to Choose Different Engine Types
- Cross-zone Analysis
- Compute Engine Options for Lakehouse Platforms
- Open Source Tools
- Current and Future Versions
- Performance Benchmarking
- Comparisons
- Sharing Features
- Key Takeaways
- References
- Chapter 4. Data Catalogs
- Understanding Metadata
- Technical Metadata
- Business Metadata
- How Metastores and Data Catalogs Work Together
- Features of a Data Catalog
- Search, Explore, and Discover Data
- Data Classification
- Data Governance and Security
- Data Lineage
- Unified Data Catalog
- Challenges of Siloed Metadata Management
- What Is a Unified Data Catalog?
- Benefits of a Unified Data Catalog