Table of Contents:
  • Using the PTBTokenizer classUsing the DocumentPreprocessor class; Using a pipeline; Using LingPipe tokenizers; Training a tokenizer to find parts of text; Comparing tokenizers; Understanding normalization; Converting to lowercase; Removing stopwords; Creating a StopWords class; Using LingPipe to remove stopwords; Using stemming; Using the Porter Stemmer; Stemming with LingPipe; Using lemmatization; Using the StanfordLemmatizer class; Using lemmatization in OpenNLP; Normalizing using a pipeline; Summary; Chapter 3: Finding Sentences; The SBD process; What makes SBD difficult?
  • Understanding SBD rules of LingPipe's HeuristicSentenceModel classSimple Java SBDs; Using regular expressions; Using the BreakIterator class; Using NLP APIs; Using OpenNLP; Using the SentenceDetectorME class; Using the sentPosDetect method; Using the Stanford API; Using the PTBTokenizer class; Using the DocumentPreprocessor class; Using the StanfordCoreNLP class; Using LingPipe; Using the IndoEuropeanSentenceModel class; Using the SentenceChunker class; Using the MedlineSentenceModel class; Training a Sentence Detector model; Using the Trained model
  • Verifying the modelUsing the model; Preparing data; Summary; Chapter 2: Finding Parts of Text; Understanding the parts of text; What is tokenization?; Uses for tokenizers; Simple Java tokenizers; Using the Scanner class; Specifying the delimiter; Using the split method; Using the BreakIterator class; Using the StreamTokenizer class; Using the StringTokenizer class; Java core tokenization performance considerations; NLP tokenizer APIs; Using the OpenNLPTokenizer; Using the SimpleTokenizer class; Using the WhitespaceTokenizer class; Using the TokenizerME class; Using the Stanford tokenizer
  • Cover; Copyright; Credits; About the Author; About the Reviewers; www.PacktPub.com; Table of Contents; Preface; Chapter 1: Introduction to NLP; What is NLP?; Why use NLP?; Why is NLP so hard?; Survey of NLP tools; Apache OpenNLP; Stanford NLP; LingPipe; GATE; UIMA; Overview of text processing tasks; Finding parts of text; Finding sentences; Finding people and things; Detecting parts of speech; Classifying text and documents; Extracting relationships; Using combined approaches; Understanding NLP models; Identifying the task; Selecting a model; Building and training the model
  • Evaluating the model using the SentenceDetectorEvaluator classSummary; Chapter 4: Finding People and Things; Why NER is difficult?; Techniques for name recognition; Lists and regular expressions; Statistical classifiers; Using regular expressions for NER; Using Java's regular expressions to find entities; Using LingPipe's RegExChunker class; Using NLP APIs; Using OpenNLP for NER; Determining the accuracy of the entity; Using other entity types; Processing multiple entity types; Using the Stanford API for NER; Using LingPipe for NER; Using LingPipe's name entity models