Data analytics in the AWS cloud building a data platform for BI and predictive analytics on AWS.

A comprehensive and accessible roadmap to performing data analytics in the AWS cloud In Data Analytics in the AWS Cloud: Building a Data Platform for BI and Predictive Analytics on AWS, accomplished software engineer and data architect Joe Minichino delivers an expert blueprint to storing, processin...

Full description

Bibliographic Details
Main Author: Minichino, Joe
Format: eBook
Language:English
Published: Hoboken, NJ John Wiley & Sons, Inc. 2023
Subjects:
Online Access:
Collection: O'Reilly - Collection details see MPG.ReNa
Table of Contents:
  • Infrastructure-as-Code: CloudFormation and Terraform
  • CloudFormation
  • CloudFormation Stacks
  • CloudFormation Template Anatomy
  • CloudFormation Changesets
  • Getting Stack Information
  • Cleaning Up Again
  • CloudFormation Conclusions
  • Terraform
  • Coding Style
  • Modularity
  • Limitations
  • Terraform vs. CloudFormation
  • Infrastructure-as-Code: CDK, Pulumi, Cloudcraft, and Other Solutions
  • AWS CDK
  • Pulumi
  • Cloudcraft
  • Infrastructure Management Conclusions
  • Chapter 4 Serverless Computing and Data Engineering
  • Serverless vs. Fully Managed
  • AWS Serverless Technologies
  • AWS Lambda
  • Pricing Model
  • Laser Focus on Code
  • The Lambda Paradigm Shift
  • Virtually Infinite Scalability
  • Geographical Distribution
  • A Lambda Hello World
  • Lambda Configuration
  • Runtime
  • Container-Based Lambdas
  • Architectures
  • Memory
  • Networking
  • Execution Role
  • Environment Variables
  • AWS EventBridge
  • AWS Fargate
  • AWS DynamoDB
  • AWS SNS
  • Amazon SQS
  • AWS CloudWatch
  • Amazon QuickSight
  • AWS Step Functions
  • Amazon API Gateway
  • Amazon Cognito
  • AWS Serverless Application Model (SAM)
  • Ephemeral Infrastructure
  • AWS SAM Installation
  • Configuration
  • Creating Your First AWS SAM Project
  • Application Structure
  • SAM Resource Types
  • SAM Lambda Template
  • !! Recursive Lambda Invocation !!
  • Function Metadata
  • Outputs
  • Implicitly Generated Resources
  • Other Template Sections
  • Lambda Code
  • Building Your First SAM Application
  • Testing the AWS SAM Application Locally
  • Deployment
  • Cleaning Up
  • Summary
  • Chapter 5 Data Ingestion
  • AWS Data Lake Architecture
  • Serverless Data Lake Architecture Structure
  • Ingestion
  • Storage and Processing
  • Cataloging, Governance, and Search
  • Security and Monitoring
  • Consumption
  • Sample Processing Architecture: Cataloging Images into DynamoDB.
  • Step 1: Load Code into a DynamicFrame
  • Step 2: Apply Field Mapping
  • Step 3: Apply the Filter
  • Step 4: Write to S3 in Parquet Format
  • Example: Joining and Denormalizing Data from Two S3 Locations
  • Conclusions for Manually Authored Jobs with Notebooks
  • Creating ETL Jobs with AWS Glue Interactive Sessions
  • It's Magic
  • Development Workflow
  • Streaming Jobs
  • Differences with a Standard ETL Job
  • Streaming Sources
  • Example: Process Kinesis Streams with a Streaming Job
  • Streaming ETL Jobs Conclusions
  • Summary
  • Chapter 7 Cataloging, Governance, and Search
  • Cataloging with AWS Glue
  • AWS Glue and the AWS Glue Data Catalog
  • Glue Databases and Tables
  • Databases
  • The Idea of Schema-on-Read
  • Tables
  • Create Table Manually
  • Creating a Table from an Existing Schema
  • Creating a Table with a Crawler
  • Summary on Databases and Tables
  • Crawlers
  • Updating or Not Updating?
  • Running the Crawler
  • Creating a Crawler from the AWS CLI
  • Retrieving Table Information from the CLI
  • Classifiers
  • Classifier Example
  • Crawlers and Classifiers Summary
  • Search with Amazon Athena: The Heart of Analytics in AWS
  • A Bit of History
  • Interface Overview
  • Creating Tables Manually
  • Athena Data Types
  • Complex Types
  • Running a Query
  • Connecting with JDBC and ODBC
  • Query Stats
  • Recent Queries and Saved Queries
  • The Power of Partitions
  • Athena Pricing Model
  • Automatic Naming
  • Athena Query Output
  • Athena Peculiarities (SQL and Not)
  • Computed Fields Gotcha and WITH Statement Workaround
  • Lowercase!
  • Query Explain
  • Deduplicating Records
  • Working with JSON, Flattening, and Unnesting
  • Athena Views
  • CREATE TABLE AS SELECT (CTAS)
  • Saving Queries and Reusing Saved Queries
  • Running Parameterized Queries
  • Athena Federated Queries
  • Athena Lambda Connectors
  • Note on Connection Errors
  • Performing Federated Queries
  • Creating a View from a Federated Query
  • Governing: Athena Workgroups, Lake Formation, and More
  • Athena Workgroups
  • Fine-Grained Athena Access with IAM
  • Recap of Athena-Based Governance
  • AWS Lake Formation
  • Registering a Location in Lake Formation
  • Creating a Database in Lake Formation
  • Assigning Permissions in Lake Formation
  • LF-Tags and Permissions in Lake Formation
  • Data Filters
  • Governance Conclusions
  • Summary
  • Chapter 8 Data Consumption: BI, Visualization, and Reporting
  • QuickSight
  • Signing Up for QuickSight
  • Standard Plan
  • Enterprise Plan
  • Users and User Groups
  • Managing Users and Groups
  • Managing QuickSight
  • Users and Groups
  • Your Subscriptions
  • SPICE Capacity
  • Account Settings
  • Security and Permissions
  • VPC Connections
  • Mobile Settings
  • Domains and Embedding
  • Single Sign-On
  • Data Sources and Datasets
  • Creating an Athena Data Source
  • Creating Other Data Sources
  • Creating a Data Source from the AWS CLI
  • Creating a Dataset from a Table
  • Creating a Dataset from a SQL Query
  • Duplicating Datasets
  • Note on Creating Datasets
  • QuickSight Favorites, Recent, and Folders
  • SPICE
  • Manage SPICE Capacity
  • Refresh Schedule
  • QuickSight Data Editor
  • QuickSight Data Types
  • Change Data Types
  • Calculated Fields
  • Joining Data
  • Excluding Fields
  • Filtering Data
  • Removing Data
  • Geospatial Hierarchies and Adding Fields to Hierarchies
  • Unsupported Format Dates
  • Visualizing Data: QuickSight Analysis
  • Adding a Title and a Description to Your Analysis
  • Renaming the Sheet
  • Your First Visual with AutoGraph
  • Field Wells
  • Visual Types
  • Saving and Autosaving
  • A First Example: Pie Chart
  • Renaming a Visual
  • Filtering Data
  • Adding Drill-Downs
  • Parameters
  • Actions
  • Insights
  • ML-Powered Insights
  • Sharing an Analysis
  • Use Case Description
  • SAM Application Creation
  • S3-Triggered Lambda
  • Adding DynamoDB
  • Lambda Execution Context
  • Inserting into DynamoDB
  • Cleaning Up
  • Serverless Ingestion
  • AWS Fargate
  • AWS Lambda
  • Example Architecture: Fargate-Based Periodic Batch Import
  • The Basic Importer
  • ECS CLI
  • AWS Copilot CLI
  • Clean Up
  • AWS Kinesis Ingestion
  • Example Architecture: Two-Pronged Delivery
  • Fully Managed Ingestion with AppFlow
  • Operational Data Ingestion with Database Migration Service
  • DMS Concepts
  • DMS Instance
  • DMS Endpoints
  • DMS Tasks
  • Summary of the Workflow
  • Common Use of DMS
  • Example Architecture: DMS to S3
  • DMS Instance
  • DMS Endpoints
  • DMS Task
  • Summary
  • Chapter 6 Processing Data
  • Phases of Data Preparation
  • What Is ETL? Why Should I Care?
  • ETL Job vs. Streaming Job
  • Overview of ETL in AWS
  • ETL with AWS Glue
  • ETL with Lambda Functions
  • ETL with Hadoop/EMR
  • Other Ways to Perform ETL
  • ETL Job Design Concepts
  • Source Identification
  • Destination Identification
  • Mappings
  • Validation
  • Filter
  • Join, Denormalization, Relationalization
  • AWS Glue for ETL
  • Really, It's Just Spark
  • Visual
  • Spark Script Editor
  • Python Shell Script Editor
  • Jupyter Notebook
  • Connectors
  • Creating Connections
  • Creating Connections with the Web Console
  • Creating Connections with the AWS CLI
  • Creating ETL Jobs with AWS Glue Visual Editor
  • ETL Example: Format Switch from Raw (JSON) to Cleaned (Parquet)
  • Job Bookmarks
  • Transformations
  • Apply Mapping
  • Filter
  • Other Available Transforms
  • Run the Edited Job
  • Visual Editor with Source and Target Conclusions
  • Creating ETL Jobs with AWS Glue Visual Editor (without Source and Target)
  • Creating ETL Jobs with the Spark Script Editor
  • Developing ETL Jobs with AWS Glue Notebooks
  • What Is a Notebook?
  • Notebook Structure
  • Cover
  • Title Page
  • Copyright Page
  • About the Author
  • About the Technical Editor
  • Acknowledgments
  • Contents at a Glance
  • Contents
  • Introduction
  • What Is a Data Lake?
  • When You Do Not Need a Data Lake
  • When Do You Need Analytics?
  • When Do You Need a Data Lake for Analytics?
  • How About an Analytics Team?
  • The Data Platform
  • The End of the Beginning
  • Chapter 1 AWS Data Lakes and Analytics Technology Overview
  • Why AWS?
  • What Does a Data Lake Look Like in AWS?
  • Analytics on AWS
  • Skills Required to Build and Maintain an AWS Analytics Pipeline
  • Chapter 2 The Path to Analytics: Setting Up a Data and Analytics Team
  • The Data Vision
  • Support
  • DA Team Roles
  • Early Stage Roles
  • Team Lead
  • Data Architect
  • Data Engineer
  • Data Analyst
  • Maturity Stage Roles
  • Data Scientist
  • Cloud Engineer
  • Business Intelligence (BI) Developer
  • Machine Learning Engineer
  • Business Analyst
  • Niche Roles
  • Analytics Flow at a Process Level
  • Workflow Methodology
  • The DA Team Mantra: "Automate Everything"
  • Analytics Models in the Wild: Centralized, Distributed, Center of Excellence
  • Centralized
  • Distributed
  • Center of Excellence
  • Summary
  • Chapter 3 Working on AWS
  • Accessing AWS
  • Everything Is a Resource
  • S3: An Important Exception
  • IAM: Policies, Roles, and Users
  • Policies
  • Identity-Based Policies
  • Resource-Based Policies
  • Roles
  • Users and User Groups
  • Summarizing IAM
  • Working with the Web Console
  • The AWS Command-Line Interface
  • Installing AWS CLI
  • Linux Installation
  • macOS Installation
  • Windows
  • Configuring AWS CLI
  • A Note on Region
  • Setting Individual Parameters
  • Using Profiles and Configuration Files
  • Final Notes on Configuration
  • Using the AWS CLI
  • Using Skeletons and File Inputs
  • Cleaning Up!