Multimodal scene understanding algorithms, applications and deep learning

Multimodal Scene Understanding: Algorithms, Applications and Deep Learning presents recent advances in multi-modal computing, with a focus on computer vision and photogrammetry. It provides the latest algorithms and applications that involve combining multiple sources of information and describes th...

Full description

Bibliographic Details
Other Authors: Yang, Michael Ying (Editor), Rosenhahn, Bodo (Editor), Murino, Vittorio (Editor)
Format: eBook
Published: London Academic Press 2019
Online Access:
Collection: O'Reilly - Collection details see MPG.ReNa
Table of Contents:
  • Includes bibliographical references and index
  • 4 Learning Convolutional Neural Networks for Object Detection with Very Little Training Data4.1 Introduction; 4.2 Fundamentals; 4.2.1 Types of Learning; 4.2.2 Convolutional Neural Networks; Arti cial neuron; Arti cial neural network; Training; Convolutional neural networks; 4.2.3 Random Forests; Decision tree; Random forest; 4.3 Related Work; 4.4 Traf c Sign Detection; 4.4.1 Feature Learning; 4.4.2 Random Forest Classi cation; 4.4.3 RF to NN Mapping; 4.4.4 Fully Convolutional Network; 4.4.5 Bounding Box Prediction; 4.5 Localization
  • Front Cover; Multimodal Scene Understanding; Copyright; Contents; List of Contributors; 1 Introduction to Multimodal Scene Understanding; 1.1 Introduction; 1.2 Organization of the Book; References; 2 Deep Learning for Multimodal Data Fusion; 2.1 Introduction; 2.2 Related Work; 2.3 Basics of Multimodal Deep Learning: VAEs and GANs; 2.3.1 Auto-Encoder; 2.3.2 Variational Auto-Encoder (VAE); 2.3.3 Generative Adversarial Network (GAN); 2.3.4 VAE-GAN; 2.3.5 Adversarial Auto-Encoder (AAE); 2.3.6 Adversarial Variational Bayes (AVB); 2.3.7 ALI and BiGAN
  • 2.4 Multimodal Image-to-Image Translation Networks2.4.1 Pix2pix and Pix2pixHD; 2.4.2 CycleGAN, DiscoGAN, and DualGAN; 2.4.3 CoGAN; 2.4.4 UNIT; 2.4.5 Triangle GAN; 2.5 Multimodal Encoder-Decoder Networks; 2.5.1 Model Architecture; 2.5.2 Multitask Training; 2.5.3 Implementation Details; 2.6 Experiments; 2.6.1 Results on NYUDv2 Dataset; 2.6.2 Results on Cityscape Dataset; 2.6.3 Auxiliary Tasks; 2.7 Conclusion; References; 3 Multimodal Semantic Segmentation: Fusion of RGB and Depth Data in Convolutional Neural Networks; 3.1 Introduction; 3.2 Overview; 3.2.1 Image Classi cation and the VGG Network
  • 3.2.2 Architectures for Pixel-level Labeling3.2.3 Architectures for RGB and Depth Fusion; 3.2.4 Datasets and Benchmarks; 3.3 Methods; 3.3.1 Datasets and Data Splitting; 3.3.2 Preprocessing of the Stanford Dataset; 3.3.3 Preprocessing of the ISPRS Dataset; 3.3.4 One-channel Normal Label Representation; 3.3.5 Color Spaces for RGB and Depth Fusion; 3.3.6 Hyper-parameters and Training; 3.4 Results and Discussion; 3.4.1 Results and Discussion on the Stanford Dataset; 3.4.2 Results and Discussion on the ISPRS Dataset; 3.5 Conclusion; References
  • 4.6 Clustering4.7 Dataset; 4.7.1 Data Capturing; 4.7.2 Filtering; 4.8 Experiments; 4.8.1 Training and Test Data; 4.8.2 Classi cation; 4.8.3 Object Detection; 4.8.4 Computation Time; 4.8.5 Precision of Localizations; 4.9 Conclusion; Acknowledgment; References; 5 Multimodal Fusion Architectures for Pedestrian Detection; 5.1 Introduction; 5.2 Related Work; 5.2.1 Visible Pedestrian Detection; 5.2.2 Infrared Pedestrian Detection; 5.2.3 Multimodal Pedestrian Detection; 5.3 Proposed Method; 5.3.1 Multimodal Feature Learning/Fusion; 5.3.2 Multimodal Pedestrian Detection; Baseline DNN model