# Activities

This section contains hands-on activities for practicing prompting techniques and LLM application development.

> **Note:** Activity files are available in the [GitHub repository](https://github.com/gperdrizet/llms-demo/tree/main/activities) and include detailed instructions and starter code.

## Activity 1: LLM word problems

Practice basic prompting techniques and chain-of-thought reasoning by solving word problems with an LLM using the Gradio web interface.

**Duration:** 30-45 minutes

**Skills practiced:**
- Basic prompting strategies
- Chain-of-thought reasoning
- System prompt experimentation
- Comparing different prompting approaches

**Prerequisites:**
- Completed [Quickstart](quickstart.md) setup
- Familiarity with Lesson 46 (Prompting fundamentals)

**What you'll use:**
- Gradio chatbot web interface (`demos/chatbots/gradio_chatbot.py`)
- System prompt customization

**Location:** `activities/activity_1_word_problems.md`

## Activity 2: Text summarization

Build a practical text summarization script applying various prompting techniques.

**Duration:** 45-60 minutes

**Skills practiced:**
- Text preprocessing and chunking
- Prompt engineering for summarization
- Handling long documents
- Iterative refinement

**Prerequisites:**
- Completed Activity 1
- Python programming experience
- Understanding of basic file I/O

**Location:** `activities/activity_2_text_summarization.md`

## Activity 3: Building LangChain chains

Build practical LangChain applications using prompt templates, output parsers, and chains.

**Duration:** 60-75 minutes

**Skills practiced:**
- Creating reusable prompt templates with variables
- Structured data extraction with Pydantic schemas
- JSON parsing with JsonOutputParser
- Composing multi-step chains
- Error handling and debugging chains

**Prerequisites:**
- Completed Activities 1 and 2
- Python programming experience
- Understanding of Lesson 48 (LangChain basics)
- Familiarity with Demo 5 (LangChain demo)

**What you'll do:**
- Build a template-based translator chain
- Create a structured book information extractor
- Compose a multi-step review analysis pipeline
- Learn debugging and best practices

**Location:** `activities/activity_3_langchain_chains.md`

## Activity 4: Extending the ReAct agent

Enhance the ReAct agent chatbot by adding custom tools and testing multi-step reasoning.

**Duration:** 45-60 minutes

**Skills practiced:**
- Creating LangChain tools with the `@tool` decorator
- Understanding agent decision-making and tool selection
- Debugging tool execution and agent behavior
- Multi-step problem solving with tool chaining

**Prerequisites:**
- Completed Activities 1-3
- Python programming experience
- Understanding of Lesson 49 (LangChain advanced - agents)
- Familiarity with the ReAct agent demo

**What you'll do:**
- Study existing tool implementations
- Create a new tool (temperature converter, text analyzer, or custom)
- Register your tool with the agent
- Test single-tool and multi-tool reasoning
- Debug and improve your implementation

**Location:** `activities/activity_4_react_agent_tools.md`

## Activity 5: Extending the RAG knowledge system

Add a new document source to the RAG demo by implementing the `BaseIngestor` interface.

**Duration:** 45-60 minutes

**Skills practiced:**
- Implementing a Python interface (`BaseIngestor`)
- Using LangChain document loaders (`WebBaseLoader`)
- Chunking documents with `RecursiveCharacterTextSplitter`
- Registering new components in a running Gradio application

**Prerequisites:**
- Completed Activities 1-4
- Python programming experience
- Understanding of Lesson 49 (LangChain advanced - RAG pipeline)
- Familiarity with the RAG system demo

**What you'll do:**
- Explore the existing `WikipediaIngestor` and `BaseIngestor` interface
- Implement `URLIngestor` using `WebBaseLoader`
- Register it in the demo and verify it appears in the UI

**Location:** `activities/activity_5_rag_sources.md`


## Activity 6: LoRA fine-tuning experiment

Fine-tune a small language model on a custom instruction dataset using LoRA, then compare its behaviour against the original base model.

**Duration:** 60-90 minutes

**Skills practiced:**
- Loading and prompting base models directly with `transformers`
- Building a small SFT dataset formatted as ChatML
- Configuring and attaching a LoRA adapter with `peft`
- Training with `SFTTrainer` from `trl`
- Comparing base vs. fine-tuned outputs and measuring adapter size

**Prerequisites:**
- Completed Activities 1-5
- Python programming experience
- Understanding of Lesson 50 (Fine-tuning, RLHF, and model alignment)
- Familiarity with the fine-tuning demo (`demos/finetuning/finetuning_demo.py`)

**What you'll do:**
- Run test prompts through the base model to record a baseline
- Write 10-20 `(instruction, response)` pairs in a consistent style of your choosing
- Format the dataset as ChatML and load it with HuggingFace `datasets`
- Attach a LoRA adapter (`r=8`, targeting `q_proj`/`v_proj`) and train for ~5 epochs
- Compare fine-tuned responses to the baseline using the same test prompts
- Save the adapter, measure its size, and reload it via `PeftModel.from_pretrained()`

**Extension challenges:** rank sweep (r=8/16/32), overfitting demonstration (20 epochs), expanding target modules, and `merge_and_unload()` to export a standalone checkpoint.

**Location:** `activities/activity_6_finetuning.md`

## Activity 7: Evaluating LLM outputs

Measure LLM output quality using automated text metrics and an LLM-as-judge rubric, then interpret the results critically.

**Duration:** 45-60 minutes

**Skills practiced:**
- Computing ROUGE-1/2/L, BLEU, and BERTScore with HuggingFace `evaluate`
- Interpreting metric scores and understanding their limitations
- Designing a threshold-based quality gate
- Writing a structured LLM-as-judge rubric prompt
- Parsing structured JSON output from an LLM
- Identifying verbosity bias and prompt sensitivity in judge systems

**Prerequisites:**
- Completed Activities 1-6
- Python programming experience
- Understanding of Lesson 51 (Benchmarking and evaluating LLMs)
- Familiarity with the evaluation demo (`demos/evaluation/evaluation_demo.py`)

**What you'll do:**
- Compute metrics on three text pairs (exact match, paraphrase, factual error) and compare results
- Build a quality gate function using ROUGE and BERTScore thresholds
- Implement an LLM-as-judge function that returns rubric scores as parsed JSON
- Evaluate three candidate answers across factual accuracy, relevance, and completeness
- Detect verbosity bias by comparing concise vs. padded answers

**Location:** `activities/activity_7_evaluation.md`