Libraries¶
HuggingFace Transformers¶
Direct model loading and inference in Python without an external server.
Links:
Key classes¶
Class |
Purpose |
|---|---|
|
Loads a causal language model (next-token prediction) |
|
Loads the tokenizer for a model |
|
Downloads and caches model/tokenizer from HuggingFace Hub |
|
Formats a list of messages into the model’s expected format |
|
Performs autoregressive token generation |
Environment variables¶
Variable |
Purpose |
|---|---|
|
Base directory for HuggingFace files (default |
|
HuggingFace API token (required for gated models) |
Example usage¶
import torch
from transformers import (
AutoModelForCausalLM,
AutoTokenizer,
)
# Load model and tokenizer
model_name = 'google/gemma-2-2b-it'
model = AutoModelForCausalLM.from_pretrained(
model_name,
device_map='auto',
torch_dtype=torch.bfloat16,
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Create a chat prompt
messages = [
{'role': 'system', 'content': 'You are a helpful assistant.'},
{'role': 'user', 'content': 'Hello!'},
]
prompt = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
# Generate response
inputs = tokenizer(prompt, return_tensors='pt').to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.7,
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
LangChain¶
High-level framework for building LLM applications with chains, agents, and RAG pipelines.
Links:
Key classes¶
Class |
Purpose |
|---|---|
|
LLM client that connects to Ollama |
|
Sets the model’s behavior/persona |
|
A user message |
|
A model response |
Example usage¶
from langchain_ollama import ChatOllama
# Create a client
llm = ChatOllama(
model='qwen2.5:3b',
base_url='http://localhost:11434',
temperature=0.7,
)
# Single request
response = llm.invoke('Tell me a joke')
print(response.content)
# Streaming
for chunk in llm.stream('Count to 10'):
print(chunk.content, end='', flush=True)
Message types¶
LangChain uses typed message objects for structured conversations:
from langchain_core.messages import (
SystemMessage,
HumanMessage,
AIMessage,
)
messages = [
SystemMessage(content='You are a helpful assistant.'),
HumanMessage(content='Hello!'),
AIMessage(content='Hi there! How can I help?'),
HumanMessage(content='What is 2+2?'),
]
response = llm.invoke(messages)
Gradio¶
Build web UIs for LLM chat apps with minimal code.
Links:
ChatInterface parameters¶
Parameter |
Purpose |
|---|---|
|
Function that takes |
|
|
|
Display title |
|
Optional subtitle |
|
List of example prompts as buttons |
|
Extra UI components (textboxes, sliders, etc.) |
Example usage¶
import gradio as gr
def chat(message, history):
# Your LLM call here
return f'Echo: {message}'
demo = gr.ChatInterface(
fn=chat,
title='My Chatbot',
type='messages', # Enable structured history
)
demo.launch()
PEFT¶
Parameter-efficient fine-tuning methods that adapt a pre-trained model by training only a small number of additional parameters, leaving the original weights frozen.
Links:
Key classes¶
Class |
Purpose |
|---|---|
|
Defines LoRA hyperparameters: rank, alpha, target modules |
|
Wraps a base model with a PEFT adapter |
|
Loads a saved adapter on top of a base model |
|
Shows how many parameters are trainable vs. frozen |
|
Saves only the adapter weights (a few MB, not the full model) |
|
Merges the adapter into the base weights and returns a plain model |
Example usage¶
import torch
from transformers import AutoModelForCausalLM
from peft import LoraConfig, get_peft_model, TaskType, PeftModel
# Load the base model
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-0.5B", dtype=torch.float16)
# Configure LoRA
lora_config = LoraConfig(
task_type=TaskType.CAUSAL_LM,
r=8, # rank
lora_alpha=16, # scaling factor
target_modules=["q_proj", "v_proj"],
lora_dropout=0.05,
bias="none",
)
# Attach the adapter - base model weights are frozen automatically
peft_model = get_peft_model(model, lora_config)
peft_model.print_trainable_parameters()
# trainable params: 786,432 || all params: 494,476,288 || trainable%: 0.1590
# Save only the small adapter
peft_model.save_pretrained("./my_adapter")
# Later: load adapter on top of the base model
base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-0.5B", dtype=torch.float16)
loaded = PeftModel.from_pretrained(base, "./my_adapter")
TRL¶
Transformer Reinforcement Learning - a library building on top of HuggingFace Transformers and PEFT to provide trainers for SFT, DPO, PPO, and other alignment methods.
Links:
Key classes¶
Class |
Purpose |
|---|---|
|
Training hyperparameters for supervised fine-tuning |
|
Trainer subclass for SFT; handles chat template formatting, packing, and PEFT integration |
Example usage¶
from datasets import Dataset
from trl import SFTConfig, SFTTrainer
# Dataset must have a `text` field with fully-formatted conversations
examples = [
{"text": "<|im_start|>system\nYou are helpful.<|im_end|>\n<|im_start|>user\nHello<|im_end|>\n<|im_start|>assistant\nHi!<|im_end|>\n"},
# ... more examples
]
dataset = Dataset.from_list(examples)
training_args = SFTConfig(
output_dir="./output",
num_train_epochs=3,
per_device_train_batch_size=2,
gradient_accumulation_steps=4,
learning_rate=2e-4,
fp16=True,
max_seq_length=256,
report_to="none",
)
trainer = SFTTrainer(
model=peft_model, # a PEFT-wrapped model from get_peft_model()
train_dataset=dataset,
args=training_args,
)
trainer.train()
HuggingFace evaluate¶
Unified API for computing text generation metrics (ROUGE, BLEU, BERTScore, and many others) with a consistent interface across all metric families.
Links:
Key functions¶
Function |
Purpose |
|---|---|
|
Load a metric by name (downloads and caches on first use) |
|
Compute the metric for a batch of predictions |
Example usage¶
import evaluate
# ROUGE (summarization / text generation)
rouge = evaluate.load("rouge")
result = rouge.compute(
predictions=["The cat sat on the mat."],
references=["A cat was sitting on a mat."],
use_stemmer=True,
)
# result: {'rouge1': 0.727, 'rouge2': 0.444, 'rougeL': 0.727, 'rougeLsum': 0.727}
# BLEU (translation / n-gram precision)
bleu = evaluate.load("bleu")
result = bleu.compute(
predictions=[["the", "cat", "sat", "on", "mat"]], # tokenised
references=[[["a", "cat", "was", "sitting", "on", "mat"]]],
)
# result: {'bleu': 0.134, ...}
# BERTScore (semantic similarity via contextual embeddings)
bertscore = evaluate.load("bertscore")
result = bertscore.compute(
predictions=["The cat sat on the mat."],
references=["A cat was sitting on a mat."],
lang="en",
model_type="distilbert-base-uncased",
)
# result: {'precision': [0.96], 'recall': [0.95], 'f1': [0.95], ...}
bert-score¶
Underlying library for BERTScore — contextual embedding similarity metric based on BERT-family models. Used automatically via the evaluate "bertscore" metric but can also be called directly.
Links:
When to use BERTScore¶
BERTScore complements ROUGE and BLEU by measuring semantic similarity rather than surface-form overlap. It excels at:
Detecting paraphrases that share meaning but few exact words
Penalising responses that change key entities (e.g. dates, names) — to a degree
Evaluating open-ended generation where many valid wordings exist
It does not reliably detect subtle factual errors (e.g. swapping “1887” for “1899”).
# Direct usage (also available via evaluate.load("bertscore"))
from bert_score import score
P, R, F1 = score(
cands=["Paris's iconic iron tower was built for a World's Fair."],
refs=["The Eiffel Tower was built for the 1889 World's Fair in Paris."],
lang="en",
model_type="distilbert-base-uncased",
verbose=True,
)
print(f"BERTScore F1: {F1.mean():.3f}") # ~0.91 for this paraphrase