Activities

This section contains hands-on activities for practicing prompting techniques and LLM application development.

Note: Activity files are available in the GitHub repository and include detailed instructions and starter code.

Activity 1: LLM word problems

Practice basic prompting techniques and chain-of-thought reasoning by solving word problems with an LLM using the Gradio web interface.

Duration: 30-45 minutes

Skills practiced:

  • Basic prompting strategies

  • Chain-of-thought reasoning

  • System prompt experimentation

  • Comparing different prompting approaches

Prerequisites:

  • Completed Quickstart setup

  • Familiarity with Lesson 46 (Prompting fundamentals)

What you’ll use:

  • Gradio chatbot web interface (demos/chatbots/gradio_chatbot.py)

  • System prompt customization

Location: activities/activity_1_word_problems.md

Activity 2: Text summarization

Build a practical text summarization script applying various prompting techniques.

Duration: 45-60 minutes

Skills practiced:

  • Text preprocessing and chunking

  • Prompt engineering for summarization

  • Handling long documents

  • Iterative refinement

Prerequisites:

  • Completed Activity 1

  • Python programming experience

  • Understanding of basic file I/O

Location: activities/activity_2_text_summarization.md

Activity 3: Building LangChain chains

Build practical LangChain applications using prompt templates, output parsers, and chains.

Duration: 60-75 minutes

Skills practiced:

  • Creating reusable prompt templates with variables

  • Structured data extraction with Pydantic schemas

  • JSON parsing with JsonOutputParser

  • Composing multi-step chains

  • Error handling and debugging chains

Prerequisites:

  • Completed Activities 1 and 2

  • Python programming experience

  • Understanding of Lesson 48 (LangChain basics)

  • Familiarity with Demo 5 (LangChain demo)

What you’ll do:

  • Build a template-based translator chain

  • Create a structured book information extractor

  • Compose a multi-step review analysis pipeline

  • Learn debugging and best practices

Location: activities/activity_3_langchain_chains.md

Activity 4: Extending the ReAct agent

Enhance the ReAct agent chatbot by adding custom tools and testing multi-step reasoning.

Duration: 45-60 minutes

Skills practiced:

  • Creating LangChain tools with the @tool decorator

  • Understanding agent decision-making and tool selection

  • Debugging tool execution and agent behavior

  • Multi-step problem solving with tool chaining

Prerequisites:

  • Completed Activities 1-3

  • Python programming experience

  • Understanding of Lesson 49 (LangChain advanced - agents)

  • Familiarity with the ReAct agent demo

What you’ll do:

  • Study existing tool implementations

  • Create a new tool (temperature converter, text analyzer, or custom)

  • Register your tool with the agent

  • Test single-tool and multi-tool reasoning

  • Debug and improve your implementation

Location: activities/activity_4_react_agent_tools.md

Activity 5: Extending the RAG knowledge system

Add a new document source to the RAG demo by implementing the BaseIngestor interface.

Duration: 45-60 minutes

Skills practiced:

  • Implementing a Python interface (BaseIngestor)

  • Using LangChain document loaders (WebBaseLoader)

  • Chunking documents with RecursiveCharacterTextSplitter

  • Registering new components in a running Gradio application

Prerequisites:

  • Completed Activities 1-4

  • Python programming experience

  • Understanding of Lesson 49 (LangChain advanced - RAG pipeline)

  • Familiarity with the RAG system demo

What you’ll do:

  • Explore the existing WikipediaIngestor and BaseIngestor interface

  • Implement URLIngestor using WebBaseLoader

  • Register it in the demo and verify it appears in the UI

Location: activities/activity_5_rag_sources.md

Activity 6: LoRA fine-tuning experiment

Fine-tune a small language model on a custom instruction dataset using LoRA, then compare its behaviour against the original base model.

Duration: 60-90 minutes

Skills practiced:

  • Loading and prompting base models directly with transformers

  • Building a small SFT dataset formatted as ChatML

  • Configuring and attaching a LoRA adapter with peft

  • Training with SFTTrainer from trl

  • Comparing base vs. fine-tuned outputs and measuring adapter size

Prerequisites:

  • Completed Activities 1-5

  • Python programming experience

  • Understanding of Lesson 50 (Fine-tuning, RLHF, and model alignment)

  • Familiarity with the fine-tuning demo (demos/finetuning/finetuning_demo.py)

What you’ll do:

  • Run test prompts through the base model to record a baseline

  • Write 10-20 (instruction, response) pairs in a consistent style of your choosing

  • Format the dataset as ChatML and load it with HuggingFace datasets

  • Attach a LoRA adapter (r=8, targeting q_proj/v_proj) and train for ~5 epochs

  • Compare fine-tuned responses to the baseline using the same test prompts

  • Save the adapter, measure its size, and reload it via PeftModel.from_pretrained()

Extension challenges: rank sweep (r=8/16/32), overfitting demonstration (20 epochs), expanding target modules, and merge_and_unload() to export a standalone checkpoint.

Location: activities/activity_6_finetuning.md

Activity 7: Evaluating LLM outputs

Measure LLM output quality using automated text metrics and an LLM-as-judge rubric, then interpret the results critically.

Duration: 45-60 minutes

Skills practiced:

  • Computing ROUGE-1/2/L, BLEU, and BERTScore with HuggingFace evaluate

  • Interpreting metric scores and understanding their limitations

  • Designing a threshold-based quality gate

  • Writing a structured LLM-as-judge rubric prompt

  • Parsing structured JSON output from an LLM

  • Identifying verbosity bias and prompt sensitivity in judge systems

Prerequisites:

  • Completed Activities 1-6

  • Python programming experience

  • Understanding of Lesson 51 (Benchmarking and evaluating LLMs)

  • Familiarity with the evaluation demo (demos/evaluation/evaluation_demo.py)

What you’ll do:

  • Compute metrics on three text pairs (exact match, paraphrase, factual error) and compare results

  • Build a quality gate function using ROUGE and BERTScore thresholds

  • Implement an LLM-as-judge function that returns rubric scores as parsed JSON

  • Evaluate three candidate answers across factual accuracy, relevance, and completeness

  • Detect verbosity bias by comparing concise vs. padded answers

Location: activities/activity_7_evaluation.md