MPG.eBooks - Table of Contents: Delta Lake: up and running

Read Now

Delta Lake: up and running modern data Lakehouse architectures with Delta Lake

With the surge in big data and AI, organizations can rapidly create data products. However, the effectiveness of their analytics and machine learning models depends on the data's quality. Delta Lake's open source format offers a robust lakehouse framework over platforms like Amazon S3, ADL...

Full description

Bibliographic Details
Main Authors:	Haelen, Bennie, Davis, Dan (Author)
Format:	eBook
Language:	English
Published:	Sebastopol, California O'Reilly Media, Inc. 2023
Edition:	First edition
Subjects:	Computer Network Architectures / Http://id.loc.gov/authorities/subjects/sh86007468 Cloud Computing / Fast Storage Area Networks (computer Networks) / Http://id.loc.gov/authorities/subjects/sh2001003093 Réseaux D'ordinateurs / Architectures Storage Area Networks (computer Networks) / Fast Réseaux De Stockage (informatique) Computer Network Architectures / Fast Infonuagique Cloud Computing / Http://id.loc.gov/authorities/subjects/sh2008004883
Online Access:	https://learning.oreilly.com/library/view/~/978109...
Collection:	O'Reilly - Collection details see MPG.ReNa

Table of Contents:

Delta Connectors
Conclusion
Chapter 2. Getting Started with Delta Lake
Getting a Standard Spark Image
Using Delta Lake with PySpark
Running Delta Lake in the Spark Scala Shell
Running Delta Lake on Databricks
Creating and Running a Spark Program: helloDeltaLake
The Delta Lake Format
Parquet Files
Writing a Delta Table
The Delta Lake Transaction Log
How the Transaction Log Implements Atomicity
Breaking Down Transactions into Atomic Commits
The Transaction Log at the File Level
Scaling Massive Metadata
Conclusion
Analyzing the MERGE operation with DESCRIBE HISTORY
Inner Workings of the MERGE Operation
Conclusion
Chapter 5. Performance Tuning
Data Skipping
Partitioning
Partitioning Warnings and Considerations
Compact Files
Compaction
OPTIMIZE
ZORDER BY
ZORDER BY Considerations
Liquid Clustering
Enabling Liquid Clustering
Operations on Clustered Columns
Liquid Clustering Warnings and Considerations
Conclusion
Chapter 6. Using Time Travel
Delta Lake Time Travel
Restoring a Table
Restoring via Timestamp
Time Travel Under the Hood
Chapter 3. Basic Operations on Delta Tables
Creating a Delta Table
Creating a Delta Table with SQL DDL
The DESCRIBE Statement
Creating Delta Tables with the DataFrameWriter API
Creating a Delta Table with the DeltaTableBuilder API
Generated Columns
Reading a Delta Table
Reading a Delta Table with SQL
Reading a Table with PySpark
Writing to a Delta Table
Cleaning Out the YellowTaxis Table
Inserting Data with SQL INSERT
Appending a DataFrame to a Table
Using the OverWrite Mode When Writing to a Delta Table
Inserting Data with the SQL COPY INTO Command
Partitions
User-Defined Metadata
Using SparkSession to Set Custom Metadata
Using the DataFrameWriter to Set Custom Metadata
Conclusion
Chapter 4. Table Deletes, Updates, and Merges
Deleting Data from a Delta Table
Table Creation and DESCRIBE HISTORY
Performing the DELETE Operation
DELETE Performance Tuning Tips
Updating Data in a Table
Use Case Description
Updating Data in a Table
UPDATE Performance Tuning Tips
Upsert Data Using the MERGE Operation
Use Case Description
The MERGE Dataset
The MERGE Statement
Intro
Copyright
Table of Contents
Preface
How to Contact Us
Conventions Used in This Book
Using Code Examples
O'Reilly Online Learning
Acknowledgment
Chapter 1. The Evolution of Data Architectures
A Brief History of Relational Databases
Data Warehouses
Data Warehouse Architecture
Dimensional Modeling
Data Warehouse Benefits and Challenges
Introducing Data Lakes
Data Lakehouse
Data Lakehouse Benefits
Implementing a Lakehouse
Delta Lake
The Medallion Architecture
The Delta Ecosystem
Delta Lake Storage
Delta Sharing

Delta Lake: up and running modern data Lakehouse architectures with Delta Lake

Similar Items