Lesson 38: Text data analysis (foundation)

Focus: Getting text ready for ML - preprocessing and exploration

Topic Purpose
Text preprocessing Tokenization, normalization, stemming, lemmatization, stopword removal
Data exploration Word frequency, distributions, word clouds
Basic classification Naive Bayes text classification (no deep learning)
Rule-based sentiment TextBlob, VADER (lexicon-based, not learned embeddings)

Key concepts: Cleaning and understanding text data before vectorization


Lesson 39: Text vectorization (sparse to dense intro)

Focus: Converting text to numbers - from simple to sophisticated

Topic Purpose
One-hot encoding Simplest representation, limitations (sparsity)
Bag-of-Words Word counts, introduces CountVectorizer
TF-IDF Weighted importance, better than raw counts
Word2Vec intro Concept of dense embeddings, training basics

Builds on 38: Uses preprocessed text as input

Key concepts: Progression from sparse to dense representations


Lesson 40: Distributed representations (embeddings in practice)

Focus: Using and extending word embeddings

Topic Purpose
Pre-trained Word2Vec Loading Google News vectors, similarity
Document vectors Averaging word embeddings for documents
Doc2Vec Learning document-level embeddings directly

Builds on 39: Applies Word2Vec concept at scale

Key concepts: Pre-trained embeddings, word-to-document extension


Focus: Applying embeddings to sequence tasks

Topic Purpose
Sequence preprocessing Padding, sequencing for neural networks
Encoder-decoder intro Basic seq2seq architecture
Translation task English-French with LSTM

Builds on 40: Uses embeddings as input to sequence models

Key concepts: Sequence-to-sequence architecture, encoder-decoder pattern


Lesson 42: Sequence models

Focus: RNN architectures for sequences

Topic Purpose
RNN fundamentals Recurrent connections, vanishing gradients
LSTM Gates, memory cells
Bi-directional RNNs Context from both directions

Builds on 41: Deeper dive into the LSTM used in encoder-decoder

Key concepts: Recurrence, memory, handling variable-length sequences


Lesson 43: Attention mechanism (Transformers and BERT)

Focus: Modern architectures that replaced RNNs

Topic Purpose
Attention concept Why attention improves seq2seq
Transformer architecture Self-attention, positional encoding
Pre-trained transformers BERT, GPT - contextual embeddings
Fine-tuning Classification with transformers

Builds on 42: Attention solves RNN limitations

Key concepts: Self-attention, positional encoding, transfer learning with transformers