Libraries

HuggingFace Transformers

Direct model loading and inference in Python without an external server.

Links:

Key classes

Class

Purpose

AutoModelForCausalLM

Loads a causal language model (next-token prediction)

AutoTokenizer

Loads the tokenizer for a model

.from_pretrained()

Downloads and caches model/tokenizer from HuggingFace Hub

.apply_chat_template()

Formats a list of messages into the model’s expected format

.generate()

Performs autoregressive token generation

Environment variables

Variable

Purpose

HF_HOME

Base directory for HuggingFace files (default ~/.cache/huggingface)

HF_TOKEN

HuggingFace API token (required for gated models)

Example usage

import torch
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
)

# Load model and tokenizer
model_name = 'google/gemma-2-2b-it'
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map='auto',
    torch_dtype=torch.bfloat16,
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Create a chat prompt
messages = [
    {'role': 'system', 'content': 'You are a helpful assistant.'},
    {'role': 'user', 'content': 'Hello!'},
]
prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)

# Generate response
inputs = tokenizer(prompt, return_tensors='pt').to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=0.7,
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

LangChain

High-level framework for building LLM applications with chains, agents, and RAG pipelines.

Links:

Key classes

Class

Purpose

ChatOllama

LLM client that connects to Ollama

SystemMessage

Sets the model’s behavior/persona

HumanMessage

A user message

AIMessage

A model response

Example usage

from langchain_ollama import ChatOllama

# Create a client
llm = ChatOllama(
    model='qwen2.5:3b',
    base_url='http://localhost:11434',
    temperature=0.7,
)

# Single request
response = llm.invoke('Tell me a joke')
print(response.content)

# Streaming
for chunk in llm.stream('Count to 10'):
    print(chunk.content, end='', flush=True)

Message types

LangChain uses typed message objects for structured conversations:

from langchain_core.messages import (
    SystemMessage,
    HumanMessage,
    AIMessage,
)

messages = [
    SystemMessage(content='You are a helpful assistant.'),
    HumanMessage(content='Hello!'),
    AIMessage(content='Hi there! How can I help?'),
    HumanMessage(content='What is 2+2?'),
]

response = llm.invoke(messages)

Gradio

Build web UIs for LLM chat apps with minimal code.

Links:

ChatInterface parameters

Parameter

Purpose

fn

Function that takes (message, history) and returns a response

type

'messages' = structured history with roles, 'tuples' = simple pairs

title

Display title

description

Optional subtitle

examples

List of example prompts as buttons

additional_inputs

Extra UI components (textboxes, sliders, etc.)

Example usage

import gradio as gr

def chat(message, history):
    # Your LLM call here
    return f'Echo: {message}'

demo = gr.ChatInterface(
    fn=chat,
    title='My Chatbot',
    type='messages',  # Enable structured history
)

demo.launch()

PEFT

Parameter-efficient fine-tuning methods that adapt a pre-trained model by training only a small number of additional parameters, leaving the original weights frozen.

Links:

Key classes

Class

Purpose

LoraConfig

Defines LoRA hyperparameters: rank, alpha, target modules

get_peft_model()

Wraps a base model with a PEFT adapter

PeftModel.from_pretrained()

Loads a saved adapter on top of a base model

.print_trainable_parameters()

Shows how many parameters are trainable vs. frozen

.save_pretrained()

Saves only the adapter weights (a few MB, not the full model)

.merge_and_unload()

Merges the adapter into the base weights and returns a plain model

Example usage

import torch
from transformers import AutoModelForCausalLM
from peft import LoraConfig, get_peft_model, TaskType, PeftModel

# Load the base model
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-0.5B", dtype=torch.float16)

# Configure LoRA
lora_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    r=8,                              # rank
    lora_alpha=16,                    # scaling factor
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.05,
    bias="none",
)

# Attach the adapter - base model weights are frozen automatically
peft_model = get_peft_model(model, lora_config)
peft_model.print_trainable_parameters()
# trainable params: 786,432 || all params: 494,476,288 || trainable%: 0.1590

# Save only the small adapter
peft_model.save_pretrained("./my_adapter")

# Later: load adapter on top of the base model
base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-0.5B", dtype=torch.float16)
loaded = PeftModel.from_pretrained(base, "./my_adapter")

TRL

Transformer Reinforcement Learning - a library building on top of HuggingFace Transformers and PEFT to provide trainers for SFT, DPO, PPO, and other alignment methods.

Links:

Key classes

Class

Purpose

SFTConfig

Training hyperparameters for supervised fine-tuning

SFTTrainer

Trainer subclass for SFT; handles chat template formatting, packing, and PEFT integration

Example usage

from datasets import Dataset
from trl import SFTConfig, SFTTrainer

# Dataset must have a `text` field with fully-formatted conversations
examples = [
    {"text": "<|im_start|>system\nYou are helpful.<|im_end|>\n<|im_start|>user\nHello<|im_end|>\n<|im_start|>assistant\nHi!<|im_end|>\n"},
    # ... more examples
]
dataset = Dataset.from_list(examples)

training_args = SFTConfig(
    output_dir="./output",
    num_train_epochs=3,
    per_device_train_batch_size=2,
    gradient_accumulation_steps=4,
    learning_rate=2e-4,
    fp16=True,
    max_seq_length=256,
    report_to="none",
)

trainer = SFTTrainer(
    model=peft_model,        # a PEFT-wrapped model from get_peft_model()
    train_dataset=dataset,
    args=training_args,
)

trainer.train()

HuggingFace evaluate

Unified API for computing text generation metrics (ROUGE, BLEU, BERTScore, and many others) with a consistent interface across all metric families.

Links:

Key functions

Function

Purpose

evaluate.load(name)

Load a metric by name (downloads and caches on first use)

.compute(predictions, references)

Compute the metric for a batch of predictions

Example usage

import evaluate

# ROUGE (summarization / text generation)
rouge = evaluate.load("rouge")
result = rouge.compute(
    predictions=["The cat sat on the mat."],
    references=["A cat was sitting on a mat."],
    use_stemmer=True,
)
# result: {'rouge1': 0.727, 'rouge2': 0.444, 'rougeL': 0.727, 'rougeLsum': 0.727}

# BLEU (translation / n-gram precision)
bleu = evaluate.load("bleu")
result = bleu.compute(
    predictions=[["the", "cat", "sat", "on", "mat"]],     # tokenised
    references=[[["a", "cat", "was", "sitting", "on", "mat"]]],
)
# result: {'bleu': 0.134, ...}

# BERTScore (semantic similarity via contextual embeddings)
bertscore = evaluate.load("bertscore")
result = bertscore.compute(
    predictions=["The cat sat on the mat."],
    references=["A cat was sitting on a mat."],
    lang="en",
    model_type="distilbert-base-uncased",
)
# result: {'precision': [0.96], 'recall': [0.95], 'f1': [0.95], ...}

bert-score

Underlying library for BERTScore — contextual embedding similarity metric based on BERT-family models. Used automatically via the evaluate "bertscore" metric but can also be called directly.

Links:

When to use BERTScore

BERTScore complements ROUGE and BLEU by measuring semantic similarity rather than surface-form overlap. It excels at:

  • Detecting paraphrases that share meaning but few exact words

  • Penalising responses that change key entities (e.g. dates, names) — to a degree

  • Evaluating open-ended generation where many valid wordings exist

It does not reliably detect subtle factual errors (e.g. swapping “1887” for “1899”).

# Direct usage (also available via evaluate.load("bertscore"))
from bert_score import score

P, R, F1 = score(
    cands=["Paris's iconic iron tower was built for a World's Fair."],
    refs=["The Eiffel Tower was built for the 1889 World's Fair in Paris."],
    lang="en",
    model_type="distilbert-base-uncased",
    verbose=True,
)
print(f"BERTScore F1: {F1.mean():.3f}")  # ~0.91 for this paraphrase