Big data analytics with Java big data analytics - massive, predictive, social and self-driving

The first part is an introduction that will help the readers get acquainted with big data environments, whereas the second part will contain a hardcore discussion on all the concepts in analytics on big data. It will take you from data analysis and data visualization to the core concepts and advanta...

Full description

Bibliographic Details
Main Author: Mehta, Rajat
Format: eBook
Language:English
Published: Birmingham, UK Packt Publishing 2017
Subjects:
Online Access:
Collection: O'Reilly - Collection details see MPG.ReNa
Table of Contents:
  • Sigmoid neuron
  • Multi-layer perceptrons
  • Accuracy of multi-layer perceptrons
  • Deep learning
  • Advantages and use cases of deep learning
  • Flower species classification using multi-Layer perceptrons
  • Deeplearning4j
  • Hand written digit recognizition using CNN
  • Diving into the code:
  • Summary
  • Index
  • How to make histograms using JFreeChart?
  • Line charts
  • Scatter plots
  • Box plots
  • Advanced visualization technique
  • Prefuse
  • IVTK Graph toolkit
  • Other libraries
  • Summary
  • Chapter 4: Basics of Machine Learning
  • What is machine learning?
  • Real-life examples of machine learning
  • Type of machine learning
  • A small sample case study of supervised and unsupervised learning
  • Steps for machine learning problems
  • Choosing the machine learning model
  • What are the feature types that can be extracted from the datasets?
  • How do you select the best features to train your models?
  • How do you run machine learning analytics on big data?
  • Getting and preparing data in Hadoop
  • Training and storing models on big data
  • Apache Spark machine learning API
  • Summary
  • Chapter 5: Regression on Big Data
  • Linear regression
  • What is simple linear regression?
  • Where is linear regression used?
  • Logistic regression
  • Which mathematical functions does logistic regression use?
  • Where is logistic regression used?
  • Predicting heart disease using logistic regression
  • Summary
  • Chapter 6: Naive Bayes and Sentiment Analysis
  • Conditional probability
  • Bayes theorem
  • Naïve bayes algorithm
  • Advantages of naïve bayes
  • Disadvantages of naïve bayes
  • Sentimental analysis
  • Concepts for sentimental analysis
  • Tokenization
  • Stop words removal
  • Stemming
  • N-grams
  • Term presence and Term Frequency
  • TF-IDF
  • Bag of words
  • Dataset
  • Data exploration of text data
  • Sentimental analysis on this dataset
  • SVM or Support Vector Machine
  • Summary
  • Chapter 7: Decision Trees
  • What is a decision tree?
  • Building a decision tree
  • Choosing the best features for splitting the datasets
  • Dataset
  • Data exploration
  • Cleaning and munging the data
  • Training and testing the model
  • Summary
  • Cover
  • Copyright
  • Credits
  • About the Author
  • About the Reviewers
  • www.PacktPub.com
  • Customer Feedback
  • Table of Contents
  • Preface
  • Chapter 1: Big Data Analytics with Java
  • Why data analytics on big data?
  • Big data for analytics
  • Big data
  • a bigger pay package for Java developers
  • Basics of Hadoop
  • a Java sub-project
  • Distributed computing on Hadoop
  • HDFS concepts
  • Design and architecture of HDFS
  • Main components of HDFS
  • HDFS simple commands
  • Apache Spark
  • Concepts
  • Transformations
  • Actions
  • Spark Java API
  • Spark samples using Java 8
  • Loading data
  • Data operations
  • cleansing and munging
  • Analyzing data
  • count, projection, grouping, aggregation, and max/min
  • Actions on RDDs
  • Paired RDDs
  • Saving data
  • Collecting and printing results
  • Executing Spark programs on Hadoop
  • Apache Spark sub-projects
  • Spark machine learning modules
  • Mahout
  • a popular Java ML library
  • Deeplearning4j
  • a deep learning library
  • Summary
  • Chapter 2: First Steps in Data Analysis
  • Datasets
  • Data cleaning and munging
  • Basic analysis of data with Spark SQL
  • Building SparkConf and context
  • Dataframe and datasets
  • Load and parse data
  • Analyzing data
  • the Spark-SQL way
  • Spark SQL for data exploration and analytics
  • Market basket analysis
  • Apriori algorithm
  • Implementation of the Apriori algorithm in Apache Spark
  • Efficient market basket analysis using FP-Growth algorithm
  • Running FP-Growth on Apache Spark
  • Summary
  • Chapter 3: Data Visualization
  • Data visualization with Java JFreeChart
  • Using charts in big data analytics
  • Time series chart
  • All India seasonal and annual average temperature series dataset
  • Simple single Time Series chart
  • Multiple Time Series on a single chart window
  • Bar charts
  • Histograms
  • When would you use a histogram?
  • Chapter 8: Ensembling on Big Data
  • Ensembling
  • Types of ensembling
  • Bagging
  • Boosting
  • Advantages and disadvantages of ensembling
  • Random forests
  • Gradient boosted trees (GBTs)
  • Classification problem and dataset used
  • Data exploration
  • Training and testing our random forest model
  • Training and testing our gradient boosted tree model
  • Summary
  • Chapter 9: Recommendation Systems
  • Recommendation systems and their types
  • Content-based recommendation systems
  • Dataset
  • Content-based recommender on MovieLens dataset
  • Collaborative recommendation systems
  • Advantages
  • Disadvantages
  • Alternating least square
  • collaborative filtering
  • Summary
  • Chapter 10: Clustering and Customer Segmentation on Big Data
  • Clustering
  • Types of clustering
  • Hierarchical clustering
  • K-means clustering
  • Bisecting k-means clustering
  • Customer segmentation
  • Dataset
  • Data exploration
  • Clustering for customer segmentation
  • Changing the clustering algorithm
  • Summary
  • Chapter 11: Massive Graphs on Big Data
  • Refresher on graphs
  • Representing graphs
  • Common terminology on graphs
  • Common algorithms on graphs
  • Plotting graphs
  • Massive graphs on big data
  • Graph analytics
  • GraphFrames
  • Building a graph using GraphFrames
  • Graph analytics on airports and their flights
  • Datasets
  • Graph analytics on flights data
  • Summary
  • Chapter 12: Real-Time Analytics on Big Data
  • Real-time analytics
  • Big data stack for real-time analytics
  • Real-time SQL queries on big data
  • Real-time data ingestion and storage
  • Real-time data processing
  • Real-time SQL queries using Impala
  • Flight delay analysis using Impala
  • Apache Kafka
  • Spark Streaming
  • Trending videos
  • Summary
  • Chapter 13: Deep Learning Using Big Data
  • Introduction to neural networks
  • Perceptron
  • Problems with perceptrons