# Libraries

## HuggingFace Transformers

Direct model loading and inference in Python without an external server.

**Links:**
- [Documentation](https://huggingface.co/docs/transformers)
- [Model Hub](https://huggingface.co/models)
- [GitHub](https://github.com/huggingface/transformers)

### Key classes

| Class | Purpose |
|-------|---------|
| `AutoModelForCausalLM` | Loads a causal language model (next-token prediction) |
| `AutoTokenizer` | Loads the tokenizer for a model |
| `.from_pretrained()` | Downloads and caches model/tokenizer from HuggingFace Hub |
| `.apply_chat_template()` | Formats a list of messages into the model's expected format |
| `.generate()` | Performs autoregressive token generation |

### Environment variables

| Variable | Purpose |
|----------|----------|
| `HF_HOME` | Base directory for HuggingFace files (default `~/.cache/huggingface`) |
| `HF_TOKEN` | HuggingFace API token (required for gated models) |

### Example usage

```python
import torch
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
)

# Load model and tokenizer
model_name = 'google/gemma-2-2b-it'
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map='auto',
    torch_dtype=torch.bfloat16,
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Create a chat prompt
messages = [
    {'role': 'system', 'content': 'You are a helpful assistant.'},
    {'role': 'user', 'content': 'Hello!'},
]
prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)

# Generate response
inputs = tokenizer(prompt, return_tensors='pt').to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=0.7,
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```


---

## LangChain

High-level framework for building LLM applications with chains, agents, and RAG pipelines.

**Links:**
- [Documentation](https://python.langchain.com/docs)
- [Ollama integration](https://python.langchain.com/docs/integrations/chat/ollama)
- [Other chat integrations](https://docs.langchain.com/oss/python/integrations/chat)

### Key classes

| Class | Purpose |
|-------|----------|
| `ChatOllama` | LLM client that connects to Ollama |
| `SystemMessage` | Sets the model's behavior/persona |
| `HumanMessage` | A user message |
| `AIMessage` | A model response |

### Example usage

```python
from langchain_ollama import ChatOllama

# Create a client
llm = ChatOllama(
    model='qwen2.5:3b',
    base_url='http://localhost:11434',
    temperature=0.7,
)

# Single request
response = llm.invoke('Tell me a joke')
print(response.content)

# Streaming
for chunk in llm.stream('Count to 10'):
    print(chunk.content, end='', flush=True)
```

### Message types

LangChain uses typed message objects for structured conversations:

```python
from langchain_core.messages import (
    SystemMessage,
    HumanMessage,
    AIMessage,
)

messages = [
    SystemMessage(content='You are a helpful assistant.'),
    HumanMessage(content='Hello!'),
    AIMessage(content='Hi there! How can I help?'),
    HumanMessage(content='What is 2+2?'),
]

response = llm.invoke(messages)
```

---

## Gradio

Build web UIs for LLM chat apps with minimal code.

**Links:**
- [Documentation](https://www.gradio.app/docs)
- [GitHub](https://github.com/gradio-app/gradio)
- [ChatInterface guide](https://www.gradio.app/guides/creating-a-chatbot-fast)

### ChatInterface parameters

| Parameter | Purpose |
|-----------|----------|
| `fn` | Function that takes `(message, history)` and returns a response |
| `type` | `'messages'` = structured history with roles, `'tuples'` = simple pairs |
| `title` | Display title |
| `description` | Optional subtitle |
| `examples` | List of example prompts as buttons |
| `additional_inputs` | Extra UI components (textboxes, sliders, etc.) |

### Example usage

```python
import gradio as gr

def chat(message, history):
    # Your LLM call here
    return f'Echo: {message}'

demo = gr.ChatInterface(
    fn=chat,
    title='My Chatbot',
    type='messages',  # Enable structured history
)

demo.launch()
```

---

## PEFT

Parameter-efficient fine-tuning methods that adapt a pre-trained model by training only a small number of additional parameters, leaving the original weights frozen.

**Links:**
- [Documentation](https://huggingface.co/docs/peft)
- [GitHub](https://github.com/huggingface/peft)

### Key classes

| Class | Purpose |
|-------|---------|
| `LoraConfig` | Defines LoRA hyperparameters: rank, alpha, target modules |
| `get_peft_model()` | Wraps a base model with a PEFT adapter |
| `PeftModel.from_pretrained()` | Loads a saved adapter on top of a base model |
| `.print_trainable_parameters()` | Shows how many parameters are trainable vs. frozen |
| `.save_pretrained()` | Saves only the adapter weights (a few MB, not the full model) |
| `.merge_and_unload()` | Merges the adapter into the base weights and returns a plain model |

### Example usage

```python
import torch
from transformers import AutoModelForCausalLM
from peft import LoraConfig, get_peft_model, TaskType, PeftModel

# Load the base model
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-0.5B", dtype=torch.float16)

# Configure LoRA
lora_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    r=8,                              # rank
    lora_alpha=16,                    # scaling factor
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.05,
    bias="none",
)

# Attach the adapter - base model weights are frozen automatically
peft_model = get_peft_model(model, lora_config)
peft_model.print_trainable_parameters()
# trainable params: 786,432 || all params: 494,476,288 || trainable%: 0.1590

# Save only the small adapter
peft_model.save_pretrained("./my_adapter")

# Later: load adapter on top of the base model
base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-0.5B", dtype=torch.float16)
loaded = PeftModel.from_pretrained(base, "./my_adapter")
```

---

## TRL

Transformer Reinforcement Learning - a library building on top of HuggingFace Transformers and PEFT to provide trainers for SFT, DPO, PPO, and other alignment methods.

**Links:**
- [Documentation](https://huggingface.co/docs/trl)
- [GitHub](https://github.com/huggingface/trl)

### Key classes

| Class | Purpose |
|-------|---------|
| `SFTConfig` | Training hyperparameters for supervised fine-tuning |
| `SFTTrainer` | Trainer subclass for SFT; handles chat template formatting, packing, and PEFT integration |

### Example usage

```python
from datasets import Dataset
from trl import SFTConfig, SFTTrainer

# Dataset must have a `text` field with fully-formatted conversations
examples = [
    {"text": "<|im_start|>system\nYou are helpful.<|im_end|>\n<|im_start|>user\nHello<|im_end|>\n<|im_start|>assistant\nHi!<|im_end|>\n"},
    # ... more examples
]
dataset = Dataset.from_list(examples)

training_args = SFTConfig(
    output_dir="./output",
    num_train_epochs=3,
    per_device_train_batch_size=2,
    gradient_accumulation_steps=4,
    learning_rate=2e-4,
    fp16=True,
    max_seq_length=256,
    report_to="none",
)

trainer = SFTTrainer(
    model=peft_model,        # a PEFT-wrapped model from get_peft_model()
    train_dataset=dataset,
    args=training_args,
)

trainer.train()
```

---

## HuggingFace evaluate

Unified API for computing text generation metrics (ROUGE, BLEU, BERTScore, and many others) with a consistent interface across all metric families.

**Links:**
- [Documentation](https://huggingface.co/docs/evaluate)
- [GitHub](https://github.com/huggingface/evaluate)
- [Available metrics](https://huggingface.co/evaluate-metric)

### Key functions

| Function | Purpose |
|----------|---------|
| `evaluate.load(name)` | Load a metric by name (downloads and caches on first use) |
| `.compute(predictions, references)` | Compute the metric for a batch of predictions |

### Example usage

```python
import evaluate

# ROUGE (summarization / text generation)
rouge = evaluate.load("rouge")
result = rouge.compute(
    predictions=["The cat sat on the mat."],
    references=["A cat was sitting on a mat."],
    use_stemmer=True,
)
# result: {'rouge1': 0.727, 'rouge2': 0.444, 'rougeL': 0.727, 'rougeLsum': 0.727}

# BLEU (translation / n-gram precision)
bleu = evaluate.load("bleu")
result = bleu.compute(
    predictions=[["the", "cat", "sat", "on", "mat"]],     # tokenised
    references=[[["a", "cat", "was", "sitting", "on", "mat"]]],
)
# result: {'bleu': 0.134, ...}

# BERTScore (semantic similarity via contextual embeddings)
bertscore = evaluate.load("bertscore")
result = bertscore.compute(
    predictions=["The cat sat on the mat."],
    references=["A cat was sitting on a mat."],
    lang="en",
    model_type="distilbert-base-uncased",
)
# result: {'precision': [0.96], 'recall': [0.95], 'f1': [0.95], ...}
```

---

## bert-score

Underlying library for BERTScore — contextual embedding similarity metric based on BERT-family models. Used automatically via the `evaluate` `"bertscore"` metric but can also be called directly.

**Links:**
- [Paper](https://arxiv.org/abs/1904.09675)
- [GitHub](https://github.com/Tiiiger/bert_score)

### When to use BERTScore

BERTScore complements ROUGE and BLEU by measuring *semantic similarity* rather than surface-form overlap. It excels at:
- Detecting paraphrases that share meaning but few exact words
- Penalising responses that change key entities (e.g. dates, names) — to a degree
- Evaluating open-ended generation where many valid wordings exist

It does **not** reliably detect subtle factual errors (e.g. swapping "1887" for "1899").

```python
# Direct usage (also available via evaluate.load("bertscore"))
from bert_score import score

P, R, F1 = score(
    cands=["Paris's iconic iron tower was built for a World's Fair."],
    refs=["The Eiffel Tower was built for the 1889 World's Fair in Paris."],
    lang="en",
    model_type="distilbert-base-uncased",
    verbose=True,
)
print(f"BERTScore F1: {F1.mean():.3f}")  # ~0.91 for this paraphrase
```