Activities¶

This section contains hands-on activities for practicing prompting techniques and LLM application development.

Note: Activity files are available in the GitHub repository and include detailed instructions and starter code.

Activity 1: LLM word problems¶

Practice basic prompting techniques and chain-of-thought reasoning by solving word problems with an LLM using the Gradio web interface.

Duration: 30-45 minutes

Skills practiced:

Basic prompting strategies
Chain-of-thought reasoning
System prompt experimentation
Comparing different prompting approaches

Prerequisites:

Completed Quickstart setup
Familiarity with Lesson 46 (Prompting fundamentals)

What you’ll use:

Gradio chatbot web interface (demos/chatbots/gradio_chatbot.py)
System prompt customization

Location: activities/activity_1_word_problems.md

Activity 2: Text summarization¶

Build a practical text summarization script applying various prompting techniques.

Duration: 45-60 minutes

Skills practiced:

Text preprocessing and chunking
Prompt engineering for summarization
Handling long documents
Iterative refinement

Prerequisites:

Completed Activity 1
Python programming experience
Understanding of basic file I/O

Location: activities/activity_2_text_summarization.md

Activity 3: Building LangChain chains¶

Build practical LangChain applications using prompt templates, output parsers, and chains.

Duration: 60-75 minutes

Skills practiced:

Creating reusable prompt templates with variables
Structured data extraction with Pydantic schemas
JSON parsing with JsonOutputParser
Composing multi-step chains
Error handling and debugging chains

Prerequisites:

Completed Activities 1 and 2
Python programming experience
Understanding of Lesson 48 (LangChain basics)
Familiarity with Demo 5 (LangChain demo)

What you’ll do:

Build a template-based translator chain
Create a structured book information extractor
Compose a multi-step review analysis pipeline
Learn debugging and best practices

Location: activities/activity_3_langchain_chains.md

Activity 4: Extending the ReAct agent¶

Enhance the ReAct agent chatbot by adding custom tools and testing multi-step reasoning.

Duration: 45-60 minutes

Skills practiced:

Creating LangChain tools with the @tool decorator
Understanding agent decision-making and tool selection
Debugging tool execution and agent behavior
Multi-step problem solving with tool chaining

Prerequisites:

Completed Activities 1-3
Python programming experience
Understanding of Lesson 49 (LangChain advanced - agents)
Familiarity with the ReAct agent demo

What you’ll do:

Study existing tool implementations
Create a new tool (temperature converter, text analyzer, or custom)
Register your tool with the agent
Test single-tool and multi-tool reasoning
Debug and improve your implementation

Location: activities/activity_4_react_agent_tools.md

Activity 5: Extending the RAG knowledge system¶

Add a new document source to the RAG demo by implementing the BaseIngestor interface.

Duration: 45-60 minutes

Skills practiced:

Implementing a Python interface (BaseIngestor)
Using LangChain document loaders (WebBaseLoader)
Chunking documents with RecursiveCharacterTextSplitter
Registering new components in a running Gradio application

Prerequisites:

Completed Activities 1-4
Python programming experience
Understanding of Lesson 49 (LangChain advanced - RAG pipeline)
Familiarity with the RAG system demo

What you’ll do:

Explore the existing WikipediaIngestor and BaseIngestor interface
Implement URLIngestor using WebBaseLoader
Register it in the demo and verify it appears in the UI

Location: activities/activity_5_rag_sources.md

Activity 6: LoRA fine-tuning experiment¶

Fine-tune a small language model on a custom instruction dataset using LoRA, then compare its behaviour against the original base model.

Duration: 60-90 minutes

Skills practiced:

Loading and prompting base models directly with transformers
Building a small SFT dataset formatted as ChatML
Configuring and attaching a LoRA adapter with peft
Training with SFTTrainer from trl
Comparing base vs. fine-tuned outputs and measuring adapter size

Prerequisites:

Completed Activities 1-5
Python programming experience
Understanding of Lesson 50 (Fine-tuning, RLHF, and model alignment)
Familiarity with the fine-tuning demo (demos/finetuning/finetuning_demo.py)

What you’ll do:

Run test prompts through the base model to record a baseline
Write 10-20 (instruction, response) pairs in a consistent style of your choosing
Format the dataset as ChatML and load it with HuggingFace datasets
Attach a LoRA adapter (r=8, targeting q_proj/v_proj) and train for ~5 epochs
Compare fine-tuned responses to the baseline using the same test prompts
Save the adapter, measure its size, and reload it via PeftModel.from_pretrained()

Extension challenges: rank sweep (r=8/16/32), overfitting demonstration (20 epochs), expanding target modules, and merge_and_unload() to export a standalone checkpoint.

Location: activities/activity_6_finetuning.md

Activity 7: Evaluating LLM outputs¶

Measure LLM output quality using automated text metrics and an LLM-as-judge rubric, then interpret the results critically.

Duration: 45-60 minutes

Skills practiced:

Computing ROUGE-1/2/L, BLEU, and BERTScore with HuggingFace evaluate
Interpreting metric scores and understanding their limitations
Designing a threshold-based quality gate
Writing a structured LLM-as-judge rubric prompt
Parsing structured JSON output from an LLM
Identifying verbosity bias and prompt sensitivity in judge systems

Prerequisites:

Completed Activities 1-6
Python programming experience
Understanding of Lesson 51 (Benchmarking and evaluating LLMs)
Familiarity with the evaluation demo (demos/evaluation/evaluation_demo.py)

What you’ll do:

Compute metrics on three text pairs (exact match, paraphrase, factual error) and compare results
Build a quality gate function using ROUGE and BERTScore thresholds
Implement an LLM-as-judge function that returns rubric scores as parsed JSON
Evaluate three candidate answers across factual accuracy, relevance, and completeness
Detect verbosity bias by comparing concise vs. padded answers

Location: activities/activity_7_evaluation.md