Data Analytics A Theoretical and Practical View from the EDISON Project

Building upon the knowledge introduced in The Data Science Framework, this book provides a comprehensive and detailed examination of each aspect of Data Analytics, both from a theoretical and practical standpoint. The book explains representative algorithms associated with different techniques, from...

Full description

Bibliographic Details
Main Authors: Cuadrado-Gallego, Juan J., Demchenko, Yuri (Author)
Format: eBook
Language:English
Published: Cham Springer International Publishing 2023, 2023
Edition:1st ed. 2023
Subjects:
Online Access:
Collection: Springer eBooks 2005- - Collection details see MPG.ReNa
Table of Contents:
  • 4.6 Anomaly detection the exercise solves with R
  • C. Anomaly detection exercises solves
  • 4.7 Handmade exercises
  • 4.8 Exercises solved in R
  • Chapter 5. Unsupervised Classification
  • Juan. J Cuadrado-Gallego, Yuri Demchenko, Adelhamid Tayebi
  • A. Theory
  • 5.1 Introduction
  • 5.2 Unsupervised classification based on distances K Meand Algorithm
  • 5.3 Agglomerative hierarchical clustering
  • B. Computer Based Solved
  • 5.4 R studio
  • 5.5 Unsupervised classification exercises solves with R
  • C. Unsupervised Classification Solved
  • 5.6 Handmade exercises
  • 5.7 Exercises solved in R
  • Chapter 6. Supervised Classification
  • Juan. J Cuadrado-Gallego, Yuri Demchenko, Josefa Gómez
  • A. Theory
  • 6.1 Introduction
  • 6.2 Decision tree
  • 6.2.1 Optimizing the construction of a decision tree: ID3 Algorithm
  • 6.2.2 Optimizing the construction of a decision tree: CART Algorithm
  • 6.2.3 Optimizing the construction of a decision tree: Error Algorithm
  • 6.3 Neural Network
  • 6.4 Naïve Bayes
  • 6.5 Regression functions
  • 6.5.1 Lineal regression of polynomial events
  • 6.5.2 Lineal regression of polynomial for three events
  • 6.5.3 Lineal regression of polynomial for K events
  • 6.5.4 No Lineal regression of polynomial for two events
  • 6.5.5 No Lineal regression of not polynomial for two events
  • 6.5.6 Lineal regression validity analysis
  • B. Computer based solving
  • C. Supervised classification analysis exercises solved
  • 6.6 Handmade Exercises
  • 6.7. Exercises solves in R
  • Chapter 7. Association
  • A. Theory
  • 7.1 Introduction
  • 7.2 Analysis of association of events composed by a single elementary event
  • 7.2.1 Support
  • 7.2.2 Confidence
  • 7.2.3 Contingency
  • 7.2.4 Correlation
  • 7.3 Analysis of association of events composed by more than one elementary event . Apriori algorithm
  • B. Computer based solving
  • C. Association analysis exercises solved
  • 7.4 Handmade Exercises
  • 2.6 Mean
  • 2.6.1 Definition of Mean
  • 2.6.2 Arithmetic Mean
  • 2.6.3 Variance and Standard Deviation
  • 2.7 Median
  • 2.7.1 Range
  • 2.7.2 Median
  • 2.7.3 Quantiles
  • 2.7.4 Quantiles range
  • B. Computer Based Solving
  • 2.8 Reproject
  • 2.9 R graphical user interface
  • 2.10 Data exercises solves with R
  • C. Data Exercises solves
  • 2.11 Handmade exercises
  • 2.12 Exercises solves in R
  • Annex. Data Extended Concepts
  • 2.A.1 Frequency
  • 2.A.2 Mean
  • Chapter 3. Probability
  • A. Theory
  • 3.1 Introduction
  • 3.2 Event
  • 3.3 Sets theory actions and operations
  • 3.4 La Place or classic probability
  • 3.5 Bayesian Probability
  • 3.6 Probability distribution of random variables
  • 3.6.1 Random Variable
  • 3.6.2 Probability distribution
  • 3.6.3 Discrete probability distributions
  • 3.6.3.1 Bernoulli Probability distribution
  • 3.6.3.2 Binomial Probability distribution
  • 3.6.3.3 Geometric Probability distribution
  • 7.5 Exercises solves in R.
  • 3.6.3.4 Poison Probability distribution
  • 3.6.4 Continuous probability distribution
  • 3.6.4.1 Normal Distribution
  • 3.6.4.2 Pearson chi square distribution
  • 3.6.4.3 T the student distribution
  • 3.6.4.4 F the fisher distribution
  • B. Computer Based Solving
  • C. Probability exercises solved
  • 3.7 Handmade exercises
  • 3.8 Exercises solved in R
  • Annex. Probability extended concepts
  • Chapter 4. Anomaly Detection
  • Juan. J Cuadrado-Gallego, Yuri Demchenko, Josefa Gómez, Adelhamid Tayebi
  • A. Theory
  • 4.1 Introduction
  • 4.2 Anomaly detection basic on Statistics
  • 4.2.1 Anomaly detection Basic on the mean and the standard deviation
  • 4.2.2Anomaly detection based on the quartiles
  • 4.2.3 Anomaly detection based errors of the residuals
  • 4.3 Anomaly detection based on proximity. K nearest neighbor algorithm
  • 4.4 Anomaly detection based on density simplified local outlier factor algorithm
  • B. Computer based solving
  • 4.5 R packages
  • Contents
  • Chapter 1. Introduction to data science and data analytics 1
  • 1.1 About Data Science
  • 1.2 About the EDISON Project and Data Science Framework
  • 1.2.1 The EDISON project
  • 1.2.2 The EDISON Data Science Framework
  • 1.3 About Data Analytics
  • 1.3.1 Data Analytics Competences
  • 1.3.2 Data Analytics Body of Knowledge
  • 1.3.3 Data Analytics Model Curriculum Approach
  • 1.3.4 Data Analytics Professional Profiles
  • 1.4 About this Book
  • Chapter 2. Data …… 49
  • A. Theory
  • 2.1 Introduction
  • 2.2 Characteristic
  • 2.2.1 Definition of characteristic
  • 2.2.2 Types of characteristics
  • 2.3 Data
  • 2.3.1 Definition of Data
  • 2.3.2 Types of data from their nature
  • 2.3.3 Types of data from their storage
  • 2.4 Available Data
  • 2.4.1 Experiment
  • 2.4.2 Data population
  • 2.4.3 Data Sample
  • 2.4.4 Data Quality
  • 2.5 Frequency
  • 2.5.1 Definition of frequency
  • 2.5.2 Types of frequency
  • 2.5.3 Frequency of grouped Data
  • 2.5.4 Mode