Text mining with R a tidy approach

Much of the data available today is unstructured and text-heavy, making it challenging for analysts to apply their usual data wrangling and visualization tools. With this practical book, you'll explore text-mining techniques with tidytext, a package that authors Julia Silge and David Robinson d...

Full description

Bibliographic Details
Main Authors: Silge, Julia, Robinson, David (Author)
Format: eBook
Language:English
Published: Sebastopol, CA O'Reilly Media 2017
Edition:First edition
Subjects:
Online Access:
Collection: O'Reilly - Collection details see MPG.ReNa
Table of Contents:
  • Includes bibliographical references and index
  • WordcloudsLooking at Units Beyond Just Words; Summary; Chapter 3. Analyzing Word and Document Frequency: tf-idf; Term Frequency in Jane Austen's Novels; Zipf's Law; The bind_tf_idf Function; A Corpus of Physics Texts; Summary; Chapter 4. Relationships Between Words: N-grams and Correlations; Tokenizing by N-gram; Counting and Filtering N-grams; Analyzing Bigrams; Using Bigrams to Provide Context in Sentiment Analysis; Visualizing a Network of Bigrams with ggraph; Visualizing Bigrams in Other Texts; Counting and Correlating Pairs of Words with the widyr Package
  • Copyright; Table of Contents; Preface; Outline; Topics This Book Does Not Cover; About This Book; Conventions Used in This Book; Using Code Examples; O'Reilly Safari; How to Contact Us; Acknowledgements; Chapter 1. The Tidy Text Format; Contrasting Tidy Text with Other Data Structures; The unnest_tokens Function; Tidying the Works of Jane Austen; The gutenbergr Package; Word Frequencies; Summary; Chapter 2. Sentiment Analysis with Tidy Data; The sentiments Dataset; Sentiment Analysis with Inner Join; Comparing the Three Sentiment Dictionaries; Most Common Positive and Negative Words
  • Casting to a Document-Term MatrixReady for Topic Modeling; Interpreting the Topic Model; Connecting Topic Modeling with Keywords; Summary; Chapter 9. Case Study: Analyzing Usenet Text; Preprocessing; Preprocessing Text; Words in Newsgroups; Finding tf-idf Within Newsgroups; Topic Modeling; Sentiment Analysis; Sentiment Analysis by Word; Sentiment Analysis by Message; N-gram Analysis; Summary; Bibliography; Index; About the Authors; Colophon
  • Counting and Correlating Among SectionsExamining Pairwise Correlation; Summary; Chapter 5. Converting to and from Nontidy Formats; Tidying a Document-Term Matrix; Tidying DocumentTermMatrix Objects; Tidying dfm Objects; Casting Tidy Text Data into a Matrix; Tidying Corpus Objects with Metadata; Example: Mining Financial Articles; Summary; Chapter 6. Topic Modeling; Latent Dirichlet Allocation; Word-Topic Probabilities; Document-Topic Probabilities; Example: The Great Library Heist; LDA on Chapters; Per-Document Classification; By-Word Assignments: augment; Alternative LDA Implementations
  • Chapter 7. Case Study: Comparing Twitter Archives; Getting the Data and Distribution of Tweets; Word Frequencies; Comparing Word Usage; Changes in Word Use; Favorites and Retweets; Summary; Chapter 8. Case Study: Mining NASA Metadata; How Data Is Organized at NASA; Wrangling and Tidying the Data; Some Initial Simple Exploration; Word Co-ocurrences and Correlations; Networks of Description and Title Words; Networks of Keywords; Calculating tf-idf for the Description Fields; What Is tf-idf for the Description Field Words?; Connecting Description Fields to Keywords; Topic Modeling