Hadoop and Spark performance for the Enterprise ensuring quality of service in multi-tenant environments

Virtually every enterprise depends on big data analysis, but distributed computing environments such as Hadoop and Spark are complicated, to say the least. Multiple users, business units, and workload types often compete for valuable computing resources. Monitoring tools are not well equipped to han...

Full description

Bibliographic Details
Main Author: Oram, Andrew
Format: eBook
Language:English
Published: Sebastopol, CA O'Reilly Media 2016
Edition:First edition
Subjects:
Online Access:
Collection: O'Reilly - Collection details see MPG.ReNa
LEADER 03307nmm a2200397 u 4500
001 EB001923250
003 EBX01000000000000001086152
005 00000000000000.0
007 cr|||||||||||||||||||||
008 210123 ||| eng
050 4 |a QA76.9.D5 
100 1 |a Oram, Andrew 
245 0 0 |a Hadoop and Spark performance for the Enterprise  |b ensuring quality of service in multi-tenant environments  |c Andy Oram 
250 |a First edition 
260 |a Sebastopol, CA  |b O'Reilly Media  |c 2016 
300 |a 1 volume  |b illustrations 
653 |a Technologie de l'information / Gestion 
653 |a Big data / fast 
653 |a Traitement réparti 
653 |a Spark (Electronic resource : Apache Software Foundation) / fast 
653 |a Information technology / Management / fast 
653 |a Big data / http://id.loc.gov/authorities/subjects/sh2012003227 
653 |a Electronic data processing / Distributed processing / fast 
653 |a Spark (Electronic resource : Apache Software Foundation) / http://id.loc.gov/authorities/names/no2015027445 
653 |a Données volumineuses 
653 |a Electronic data processing / Distributed processing / http://id.loc.gov/authorities/subjects/sh85042293 
653 |a Apache Hadoop / fast 
653 |a Apache Hadoop / http://id.loc.gov/authorities/names/n2013024279 
653 |a Information technology / Management / http://id.loc.gov/authorities/subjects/sh2008006980 
041 0 7 |a eng  |2 ISO 639-2 
989 |b OREILLY  |a O'Reilly 
776 |z 9781491963197 
856 4 0 |u https://learning.oreilly.com/library/view/~/9781492048985/?ar  |x Verlag  |3 Volltext 
082 0 |a 658 
082 0 |a 000 
520 |a Virtually every enterprise depends on big data analysis, but distributed computing environments such as Hadoop and Spark are complicated, to say the least. Multiple users, business units, and workload types often compete for valuable computing resources. Monitoring tools are not well equipped to handle this level of complexity, and typically provide only very high-level and historical information. The lack of fine-grained visibility for making real-time adjustments to running workloads means that high-priority jobs can easily be pushed aside by lower-priority jobs. It's time to bring Quality of Service (QoS) to distributed processing in multi-tenant Hadoop environments. This O'Reilly report explains how QoS allows operators to assign priorities to jobs, ensuring that higher-priority tasks get the resources needed to meet critical deadlines. Author Andy Oram examines the critical role of performance in the evolution of operating systems, data warehouses, and distributed processing. He also discusses Quasar (part of Mesos) and Pepperdata, two tools that can help improve performance in distributed computing environments. You'll discover how tools that help ensure QoS can help distributed environments evolve to accommodate: Multiple users contending for resources, such as those on operating systems Jobs that grow or shrink in hardware usage, so they don't strain at resource limits or let resources go to waste Jobs of different priorities, including soft real-time requirements that allow them to override lower-priority or adhoc jobs Performance guarantees, similar to service-level agreements (SLAs)