From flat files to deconstructed databases the evolution and future of the big data ecosystem

"Julien Le Dem (WeWork) discusses the key open source components of the big data ecosystem--including Apache Calcite, Parquet, Arrow, Avro, and Kafka as well as batch and streaming systems--and explains how they relate to each other and how they make the ecosystem more of a database and less of...

Full description

Bibliographic Details
Main Author: Le Dem, Julien
Format: eBook
Language:English
Published: [Place of publication not identified] O'Reilly Media 2019
Subjects:
Online Access:
Collection: O'Reilly - Collection details see MPG.ReNa
Description
Summary:"Julien Le Dem (WeWork) discusses the key open source components of the big data ecosystem--including Apache Calcite, Parquet, Arrow, Avro, and Kafka as well as batch and streaming systems--and explains how they relate to each other and how they make the ecosystem more of a database and less of a filesystem. (Parquet is the columnar data layout to optimize data at rest for querying. Arrow is the in-memory representation for maximum throughput execution and overhead-free data exchange. Calcite is the optimizer to make the most of our infrastructure capabilities.) Julien also explores the emerging components that are still missing or haven't become standard yet to fully materialize the transformation to an extremely flexible database that lets you innovate with your data. This session was recorded at the 2019 O'Reilly Strata Data Conference in San Francisco."--Resource description page
Item Description:Title from resource description page (Safari, viewed January 31, 2020)
Physical Description:1 streaming video file (43 min., 49 sec.)