Big data now current perspectives from O'Reilly radar

This collection represents the full spectrum of data-related content we've published on O'Reilly Radar over the last year. Mike Loukides kicked things off in June 2010 with "What is data science?" and from there we've pursued the various threads and themes that naturally eme...

Full description

Bibliographic Details
Format: eBook
Language:English
Published: Sebastopol, CA O'Reilly Media 2011
Edition:1st ed
Subjects:
Online Access:
Collection: O'Reilly - Collection details see MPG.ReNa
Table of Contents:
  • Table of Contents; Foreword; Chapter 1. Data Science and Data Tools; What is data science?; What is data science?; Where data comes from; Working with data at scale; Making data tell its story; Data scientists; The SMAQ stack for big data; MapReduce; Hadoop MapReduce; Other implementations; Storage; Hadoop Distributed File System; HBase, the Hadoop Database; Hive; Cassandra and Hypertable; NoSQL database implementations of MapReduce; Integration with SQL databases; Integration with streaming data sources; Commercial SMAQ solutions; Query; Pig; Hive; Cascading, the API Approach
  • Includes bibliographical references and index
  • Time for the community to rallyWhy you can't really anonymize your data; Keep the anonymization; Acknowledge there's a risk of de-anonymization; Limit the detail; Learn from the experts; Big data and the semantic web; Google and the semantic web; Metadata is hard: big data can help; Big data: Global good or zero-sum arms race?; The truth about data: Once it's out there, it's hard to control; Chapter 3. The Application of Data: Products and Processes; How the Library of Congress is building the Twitter archive; Data journalism, data tools, and the newsroom stack; Data journalism and data tools
  • Search with SolrConclusion; Scraping, cleaning, and selling big data; Data hand tools; Hadoop: What it is, how it works, and what it can do; Four free data tools for journalists (and snoops); WHOIS; Blekko; bit.ly; Compete; The quiet rise of machine learning; Where the semantic web stumbled, linked data will succeed; Social data is an oracle waiting for a question; The challenges of streaming real-time data; Chapter 2. Data Issues; Why the term "data science" is flawed but useful; It's not a real science; It's an unnecessary label; The name doesn't even make sense; There's no definition
  • The newsroom stackBridging the data divide; The data analysis path is built on curiosity, followed by action; How data and analytics can improve education; Data science is a pipeline between academic disciplines; Big data and open source unlock genetic secrets; Visualization deconstructed: Mapping Facebook's friendships; Mapping Facebook's friendships; Static requires storytelling; Data science democratized; Chapter 4. The Business of Data; There's no such thing as big data; Big data and the innovator's dilemma; Building data startups: Fast, big, and focused
  • Setting the stage: The attack of the exponentialsLeveraging the big data stack; Fast data; Big analytics; Focused services; Democratizing big data; Data markets aren't coming: They're already here; An iTunes model for data; Data is a currency; Big data: An opportunity in search of a metaphor; Data and the human-machine connection