Machine learning for imbalanced data tackle imbalanced datasets using machine learning and deep learning techniques

As machine learning practitioners, we often encounter imbalanced datasets in which one class has considerably fewer instances than the other. Many machine learning algorithms assume an equilibrium between majority and minority classes, leading to suboptimal performance on imbalanced data. This compr...

Full description

Bibliographic Details
Main Authors: Abhishek, Kumar, Abdelaziz, Mounir (Author)
Format: eBook
Language:English
Published: Birmingham, UK Packt Publishing Ltd. 2023
Edition:[First edition]
Subjects:
Online Access:
Collection: O'Reilly - Collection details see MPG.ReNa
Table of Contents:
  • Includes bibliographical references and index
  • Condensed Nearest Neighbors
  • One-sided selection
  • Combining undersampling and oversampling
  • Model performance comparison
  • Summary
  • Exercises
  • References
  • Chapter 4: Ensemble Methods
  • Technical requirements
  • Bagging techniques for imbalanced data
  • UnderBagging
  • OverBagging
  • SMOTEBagging
  • Comparative performance of bagging methods
  • Boosting techniques for imbalanced data
  • AdaBoost
  • RUSBoost, SMOTEBoost, and RAMOBoost
  • Ensemble of ensembles
  • EasyEnsemble
  • Comparative performance of boosting methods
  • Model performance comparison
  • Summary
  • Questions
  • When to not worry about data imbalance
  • Introduction to the imbalanced-learn library
  • General rules to follow
  • Summary
  • Questions
  • References
  • Chapter 2: Oversampling Methods
  • Technical requirements
  • What is oversampling?
  • Random oversampling
  • Problems with random oversampling
  • SMOTE
  • How SMOTE works
  • Problems with SMOTE
  • SMOTE variants
  • Borderline-SMOTE
  • ADASYN
  • Working of ADASYN
  • Categorical features and SMOTE variants (SMOTE-NC and SMOTEN)
  • Model performance comparison of various oversampling methods
  • Guidance for using various oversampling techniques
  • Cover
  • Copyright
  • Contributors
  • Table of Contents
  • Preface
  • Chapter 1: Introduction to Data Imbalance in Machine Learning
  • Technical requirements
  • Introduction to imbalanced datasets
  • Machine learning 101
  • What happens during model training?
  • Types of dataset and splits
  • Cross-validation
  • Common evaluation metrics
  • Confusion matrix
  • ROC
  • Precision-Recall curve
  • Relation between the ROC curve and PR curve
  • Challenges and considerations when dealing with imbalanced data
  • When can we have an imbalance in datasets?
  • Why can imbalanced data be a challenge?
  • When to avoid oversampling
  • Oversampling in multi-class classification
  • Summary
  • Exercises
  • References
  • Chapter 3: Undersampling Methods
  • Technical requirements
  • Introducing undersampling
  • When to avoid undersampling the majority class
  • Fixed versus cleaning undersampling
  • Undersampling approaches
  • Removing examples uniformly
  • Random UnderSampling
  • ClusterCentroids
  • Strategies for removing noisy observations
  • ENN, RENN, and AllKNN
  • Tomek links
  • Neighborhood Cleaning Rule
  • Instance hardness threshold
  • Strategies for removing easy observations