Summary: | "Julien Le Dem (WeWork) discusses the key open source components of the big data ecosystem--including Apache Calcite, Parquet, Arrow, Avro, and Kafka as well as batch and streaming systems--and explains how they relate to each other and how they make the ecosystem more of a database and less of a filesystem. (Parquet is the columnar data layout to optimize data at rest for querying. Arrow is the in-memory representation for maximum throughput execution and overhead-free data exchange. Calcite is the optimizer to make the most of our infrastructure capabilities.) Julien also explores the emerging components that are still missing or haven't become standard yet to fully materialize the transformation to an extremely flexible database that lets you innovate with your data. This session was recorded at the 2019 O'Reilly Strata Data Conference in San Francisco."--Resource description page
|