MPG.eBooks - Table of Contents: Mastering Hadoop

Read Now

Mastering Hadoop go beyond the basics and master the next generation of Hadoop data processing platforms

Annotation

Bibliographic Details
Main Author:	Karanth, Sandeep
Format:	eBook
Language:	English
Published:	Birmingham, England Packt Publishing 2014
Series:	Community Experience Distilled
Subjects:	Logiciels D'application / Développement Application Software / Development / Fast Computers / Software Development & Engineering / General / Bisacsh Apache Hadoop / Fast Apache Hadoop / Http://id.loc.gov/authorities/names/n2013024279 Application Software / Development / Http://id.loc.gov/authorities/subjects/sh95009362
Online Access:	https://learning.oreilly.com/library/view/~/978178...
Collection:	O'Reilly - Collection details see MPG.ReNa

Table of Contents:

ManageabilityAvailable distributions; Cloudera Distribution of Hadoop (CDH); Hortonworks Data Platform (HDP); MapR; Pivotal HD; Summary; Chapter 2: Advanced MapReduce; MapReduce input; The InputFormat class; The InputSplit class; The RecordReader class; Hadoop's ""small files"" problem; Filtering inputs; The Map task; The dfs.blocksize attribute; Sort and spill of intermediate outputs; Node-local Reducers or Combiners; Fetching intermediate outputs
Map-side; The Reduce task; Fetching intermediate outputs
Reduce-side; Merge and spill of intermediate outputs; MapReduce output
The Replicated joinSkewed joins; The Merge join; User-defined functions; The evaluation functions; The aggregate functions; The filter functions; The load functions; The store functions; Pig performance optimizations; The optimization rules; Measurement of Pig script performance; Combiners in Pig; Memory for the Bag data type; Number of reducers in Pig; The multiquery mode in Pig; Best practices; The explicit usage of types; Early and frequent projection; Early and frequent filtering; The usage of the LIMIT operator; The usage of the DISTINCT operator; The reduction of operations
Speculative execution of tasksMapReduce job counters; Handling data joins; Reduce-side joins; Map-side joins; Summary; Chapter 3: Advanced Pig; Pig versus SQL; Different modes of execution; Complex data types in Pig; Compiling Pig scripts; The logical plan; The physical plan; The MapReduce plan; Development and debugging aids; The DESCRIBE command; The EXPLAIN command; The ILLUSTRATE command; The advanced Pig operators; The advanced FOREACH operator; The FLATTEN operator; The nested FOREACH operator; The COGROUP operator; The UNION operator; The CROSS operator; Specialized joins in Pig
The usage of Algebraic UDFsThe usage of Accumulator UDFs; Eliminating nulls in the data; The usage of specialized joins; Compressing intermediate results; Combining smaller files; Summary; Chapter 4: Advanced Hive; The Hive architecture; The Hive metastore; The Hive compiler; The Hive execution engine; The supporting components of Hive; Data types; File formats; Compressed files; ORC files; The Parquet files; The data model; Dynamic partitions; Semantics for dynamic partitioning; Indexes on Hive tables; Hive query optimizers; Advanced DML; The GROUP BY operation
Cover ; Copyright; Credits; About the Author; Acknowledgments; About the Reviewers; www.PacktPub.com; Untitled; Untitled; Table of Contents; Preface; Chapter 1: Hadoop 2.X; The inception of Hadoop; The evolution of Hadoop; Hadoop's genealogy; Hadoop-0.20-append; Hadoop-0.20-security; Hadoop's timeline; Hadoop 2.X; Yet Another Resource Negotiator (YARN); Architecture overview; Storage layer enhancements; High availability; HDFS Federation; HDFS snapshots; Other enhancements; Support enhancements; Hadoop distributions; Which Hadoop distribution?; Performance; Scalability; Reliability

Mastering Hadoop go beyond the basics and master the next generation of Hadoop data processing platforms

Similar Items