Transformers for natural language processing build, train, and fine-tuning deep neural network architectures for NLP with Python, PyTorch, TensorFlow, BERT, and GPT-3

Transformers are a game-changer for natural language understanding (NLU) and have become one of the pillars of artificial intelligence. Transformers for Natural Language Processing, 2nd Edition, investigates deep learning for machine translations, speech-to-text, text-to-speech, language modeling, q...

Full description

Bibliographic Details
Main Author: Rothman, Denis
Other Authors: Gulli, Antonio (writer of forewrod)
Format: eBook
Language:English
Published: [Birmingham, United Kingdom] Packt Publishing 2022
Edition:Second edition
Series:Expert insight
Subjects:
Online Access:
Collection: O'Reilly - Collection details see MPG.ReNa
Table of Contents:
  • Intro
  • Copyright
  • Foreword
  • Contributors
  • Table of Contents
  • Preface
  • Chapter 1: What are Transformers?
  • The ecosystem of transformers
  • Industry 4.0
  • Foundation models
  • Is programming becoming a sub-domain of NLP?
  • The future of artificial intelligence specialists
  • Optimizing NLP models with transformers
  • The background of transformers
  • What resources should we use?
  • The rise of Transformer 4.0 seamless APIs
  • Choosing ready-to-use API-driven libraries
  • Choosing a Transformer Model
  • The role of Industry 4.0 artificial intelligence specialists
  • Summary
  • Questions
  • References
  • Chapter 2: Getting Started with the Architecture of the Transformer Model
  • The rise of the Transformer: Attention is All You Need
  • The encoder stack
  • Input embedding
  • Positional encoding
  • Sublayer 1: Multi-head attention
  • Sublayer 2: Feedforward network
  • The decoder stack
  • Output embedding and position encoding
  • The attention layers
  • The FFN sublayer, the post-LN, and the linear layer
  • Training and performance
  • Tranformer models in Hugging Face
  • Summary
  • Questions
  • References
  • Chapter 3: Fine-Tuning BERT Models
  • The architecture of BERT
  • The encoder stack
  • Preparing the pretraining input environment
  • Pretraining and fine-tuning a BERT model
  • Fine-tuning BERT
  • Hardware constraints
  • Installing the Hugging Face PyTorch interface for BERT
  • Importing the modules
  • Specifying CUDA as the device for torch
  • Loading the dataset
  • Creating sentences, label lists, and adding BERT tokens
  • Activating the BERT tokenizer
  • Processing the data
  • Creating attention masks
  • Splitting the data into training and validation sets
  • Converting all the data into torch tensors
  • Selecting a batch size and creating an iterator
  • BERT model configuration
  • Chapter 10: Semantic Role Labeling with BERT-Based Transformers
  • Getting started with SRL
  • Defining semantic role labeling
  • Visualizing SRL
  • Running a pretrained BERT-based model
  • The architecture of the BERT-based model
  • Setting up the BERT SRL environment
  • SRL experiments with the BERT-based model
  • Basic samples
  • Sample 1
  • Sample 2
  • Sample 3
  • Difficult samples
  • Sample 4
  • Sample 5
  • Sample 6
  • Questioning the scope of SRL
  • The limit of predicate analysis
  • Redefining SRL
  • Summary
  • Questions
  • References
  • Chapter 11: Let Your Data Do the Talking: Story, Questions, and Answers
  • Methodology
  • Transformers and methods
  • Method 0: Trial and error
  • Method 1: NER first
  • Using NER to find questions
  • Location entity questions
  • Person entity questions
  • Method 2: SRL first
  • Question-answering with ELECTRA
  • Project management constraints
  • Using SRL to find questions
  • Next steps
  • Exploring Haystack with a RoBERTa model
  • Exploring Q&amp
  • A with a GTP-3 engine
  • Summary
  • Questions
  • References
  • Chapter 12: Detecting Customer Emotions to Make Predictions
  • Getting started: Sentiment analysis transformers
  • The Stanford Sentiment Treebank (SST)
  • Sentiment analysis with RoBERTa-large
  • Predicting customer behavior with sentiment analysis
  • Sentiment analysis with DistilBERT
  • Sentiment analysis with Hugging Face's models' list
  • DistilBERT for SST
  • MiniLM-L12-H384-uncased
  • RoBERTa-large-mnli
  • BERT-base multilingual model
  • Sentiment analysis with GPT-3
  • Some Pragmatic I4.0 thinking before we leave
  • Investigating with SRL
  • Investigating with Hugging Face
  • Investigating with the GPT-3 playground
  • GPT-3 code
  • Summary
  • Questions
  • References
  • Chapter 13: Analyzing Fake News with Transformers
  • Emotional reactions to fake news
  • NLP tasks and examples
  • Comparing the output of GPT-2 and GPT-3
  • Fine-tuning GPT-3
  • Preparing the data
  • Step 1: Installing OpenAI
  • Step 2: Entering the API key
  • Step 3: Activating OpenAI's data preparation module
  • Fine-tuning GPT-3
  • Step 4: Creating an OS environment
  • Step 5: Fine-tuning OpenAI's Ada engine
  • Step 6: Interacting with the fine-tuned model
  • The role of an Industry 4.0 AI specialist
  • Initial conclusions
  • Summary
  • Questions
  • References
  • Chapter 8: Applying Transformers to Legal and Financial Documents for AI Text Summarization
  • Designing a universal text-to-text model
  • The rise of text-to-text transformer models
  • A prefix instead of task-specific formats
  • The T5 model
  • Text summarization with T5
  • Hugging Face
  • Hugging Face transformer resources
  • Initializing the T5-large transformer model
  • Getting started with T5
  • Exploring the architecture of the T5 model
  • Summarizing documents with T5-large
  • Creating a summarization function
  • A general topic sample
  • The Bill of Rights sample
  • A corporate law sample
  • Summarization with GPT-3
  • Summary
  • Questions
  • References
  • Chapter 9: Matching Tokenizers and Datasets
  • Matching datasets and tokenizers
  • Best practices
  • Step 1: Preprocessing
  • Step 2: Quality control
  • Continuous human quality control
  • Word2Vec tokenization
  • Case 0: Words in the dataset and the dictionary
  • Case 1: Words not in the dataset or the dictionary
  • Case 2: Noisy relationships
  • Case 3: Words in the text but not in the dictionary
  • Case 4: Rare words
  • Case 5: Replacing rare words
  • Case 6: Entailment
  • Standard NLP tasks with specific vocabulary
  • Generating unconditional samples with GPT-2
  • Generating trained conditional samples
  • Controlling tokenized data
  • Exploring the scope of GPT-3
  • Summary
  • Questions
  • References
  • Loading the Hugging Face BERT uncased base model
  • Optimizer grouped parameters
  • The hyperparameters for the training loop
  • The training loop
  • Training evaluation
  • Predicting and evaluating using the holdout dataset
  • Evaluating using the Matthews Correlation Coefficient
  • The scores of individual batches
  • Matthews evaluation for the whole dataset
  • Summary
  • Questions
  • References
  • Chapter 4: Pretraining a RoBERTa Model from Scratch
  • Training a tokenizer and pretraining a transformer
  • Building KantaiBERT from scratch
  • Step 1: Loading the dataset
  • Step 2: Installing Hugging Face transformers
  • Step 3: Training a tokenizer
  • Step 4: Saving the files to disk
  • Step 5: Loading the trained tokenizer files
  • Step 6: Checking resource constraints: GPU and CUDA
  • Step 7: Defining the configuration of the model
  • Step 8: Reloading the tokenizer in transformers
  • Step 9: Initializing a model from scratch
  • Exploring the parameters
  • Step 10: Building the dataset
  • Step 11: Defining a data collator
  • Step 12: Initializing the trainer
  • Step 13: Pretraining the model
  • Step 14: Saving the final model (+tokenizer + config) to disk
  • Step 15: Language modeling with FillMaskPipeline
  • Next steps
  • Summary
  • Questions
  • References
  • Chapter 5: Downstream NLP Tasks with Transformers
  • Transduction and the inductive inheritance of transformers
  • The human intelligence stack
  • The machine intelligence stack
  • Transformer performances versus Human Baselines
  • Evaluating models with metrics
  • Accuracy score
  • F1-score
  • Matthews Correlation Coefficient (MCC)
  • Benchmark tasks and datasets
  • From GLUE to SuperGLUE
  • Introducing higher Human Baselines standards
  • The SuperGLUE evaluation process
  • Defining the SuperGLUE benchmark tasks
  • BoolQ
  • Commitment Bank (CB)
  • Multi-Sentence Reading Comprehension (MultiRC)
  • Reading Comprehension with Commonsense Reasoning Dataset (ReCoRD)
  • Recognizing Textual Entailment (RTE)
  • Words in Context (WiC)
  • The Winograd schema challenge (WSC)
  • Running downstream tasks
  • The Corpus of Linguistic Acceptability (CoLA)
  • Stanford Sentiment TreeBank (SST-2)
  • Microsoft Research Paraphrase Corpus (MRPC)
  • Winograd schemas
  • Summary
  • Questions
  • References
  • Chapter 6: Machine Translation with the Transformer
  • Defining machine translation
  • Human transductions and translations
  • Machine transductions and translations
  • Preprocessing a WMT dataset
  • Preprocessing the raw data
  • Finalizing the preprocessing of the datasets
  • Evaluating machine translation with BLEU
  • Geometric evaluations
  • Applying a smoothing technique
  • Chencherry smoothing
  • Translation with Google Translate
  • Translations with Trax
  • Installing Trax
  • Creating the original Transformer model
  • Initializing the model using pretrained weights
  • Tokenizing a sentence
  • Decoding from the Transformer
  • De-tokenizing and displaying the translation
  • Summary
  • Questions
  • References
  • Chapter 7: The Rise of Suprahuman Transformers with GPT-3 Engines
  • Suprahuman NLP with GPT-3 transformer models
  • The architecture of OpenAI GPT transformer models
  • The rise of billion-parameter transformer models
  • The increasing size of transformer models
  • Context size and maximum path length
  • From fine-tuning to zero-shot models
  • Stacking decoder layers
  • GPT-3 engines
  • Generic text completion with GPT-2
  • Step 9: Interacting with GPT-2
  • Training a custom GPT-2 language model
  • Step 12: Interactive context and completion examples
  • Running OpenAI GPT-3 tasks
  • Running NLP tasks online
  • Getting started with GPT-3 engines
  • Running our first NLP task with GPT-3