Training

Training functions for models.

image_classification_tools.pytorch.training.train_one_epoch(model, data_loader, criterion, optimizer, device, lazy_loading=False, cyclic_scheduler=None, history=None)[source]

Run one training epoch, tracking metrics per batch.

Parameters:
  • model – PyTorch model to train

  • data_loader – Training data loader

  • criterion – Loss function

  • optimizer – Optimizer

  • device – Device for training (e.g., ‘cuda’ or ‘cpu’)

  • lazy_loading – If True, move batches to device during training. If False, assumes data is already on device (default: False)

  • cyclic_scheduler – Optional CyclicLR scheduler to step after each batch

  • history – Optional history dictionary to record batch-level metrics

Returns:

Tuple of (average_loss, accuracy_percentage) for the epoch

image_classification_tools.pytorch.training.evaluate(model, data_loader, criterion, device, lazy_loading=False)[source]

Evaluate model on a dataset.

Parameters:
  • model – PyTorch model to evaluate

  • data_loader – Data loader (validation or test set)

  • criterion – Loss function

  • device – Device for evaluation (e.g., ‘cuda’ or ‘cpu’)

  • lazy_loading – If True, move batches to device during evaluation. If False, assumes data is already on device (default: False)

Returns:

Tuple of (average_loss, accuracy_percentage) for the dataset

image_classification_tools.pytorch.training.train_model(model, train_loader, val_loader=None, criterion=None, optimizer=None, device=None, lazy_loading=False, cyclic_scheduler=None, epoch_scheduler=None, lr_schedule=None, epochs=10, enable_early_stopping=False, early_stopping_patience=10, print_every=1)[source]

Training loop with optional validation and early stopping.

Tracks metrics at both batch and epoch levels.

Parameters:
  • model (Module) – PyTorch model to train

  • train_loader (DataLoader) – Training data loader

  • val_loader (DataLoader) – Validation data loader (None for training without validation)

  • criterion (Module) – Loss function

  • optimizer (Optimizer) – Optimizer

  • device (device) – Device for training (e.g., ‘cuda’ or ‘cpu’)

  • lazy_loading (bool) – If True, move batches to device during training. If False, assumes data is already on device (default: False)

  • cyclic_scheduler – CyclicLR scheduler (steps per batch)

  • epoch_scheduler – Epoch-based scheduler like ReduceLROnPlateau (steps per epoch)

  • lr_schedule (dict) –

    Optional dict with scheduled LR bounds reduction: {‘initial_base_lr’, ‘initial_max_lr’, ‘final_base_lr’,

    ’final_max_lr’, ‘schedule_epochs’}

  • epochs (int) – Maximum number of epochs

  • enable_early_stopping (bool) – Whether to enable early stopping (default: False)

  • early_stopping_patience (int) – Stop if val_loss doesn’t improve for this many epochs (only used if enable_early_stopping=True and val_loader is not None)

  • print_every (int) – Print progress every N epochs

Return type:

dict[str, list[float]]

Returns:

Dictionary containing training history with epoch and batch-level metrics

Functions

image_classification_tools.pytorch.training.train_model(model, train_loader, val_loader=None, criterion=None, optimizer=None, device=None, lazy_loading=False, cyclic_scheduler=None, epoch_scheduler=None, lr_schedule=None, epochs=10, enable_early_stopping=False, early_stopping_patience=10, print_every=1)[source]

Training loop with optional validation and early stopping.

Tracks metrics at both batch and epoch levels.

Parameters:
  • model (Module) – PyTorch model to train

  • train_loader (DataLoader) – Training data loader

  • val_loader (DataLoader) – Validation data loader (None for training without validation)

  • criterion (Module) – Loss function

  • optimizer (Optimizer) – Optimizer

  • device (device) – Device for training (e.g., ‘cuda’ or ‘cpu’)

  • lazy_loading (bool) – If True, move batches to device during training. If False, assumes data is already on device (default: False)

  • cyclic_scheduler – CyclicLR scheduler (steps per batch)

  • epoch_scheduler – Epoch-based scheduler like ReduceLROnPlateau (steps per epoch)

  • lr_schedule (dict) –

    Optional dict with scheduled LR bounds reduction: {‘initial_base_lr’, ‘initial_max_lr’, ‘final_base_lr’,

    ’final_max_lr’, ‘schedule_epochs’}

  • epochs (int) – Maximum number of epochs

  • enable_early_stopping (bool) – Whether to enable early stopping (default: False)

  • early_stopping_patience (int) – Stop if val_loss doesn’t improve for this many epochs (only used if enable_early_stopping=True and val_loader is not None)

  • print_every (int) – Print progress every N epochs

Return type:

dict[str, list[float]]

Returns:

Dictionary containing training history with epoch and batch-level metrics

Overview

The training module provides utilities for model training with:

  • Progress tracking and reporting

  • Training/validation loss and accuracy logging at epoch and batch levels

  • Configurable print frequency

  • Device specification with lazy loading or pre-loaded data support

  • Optional validation (can train without validation data)

  • Optional early stopping with model checkpoint restoration (disabled by default)

  • Learning rate scheduler support (cyclic and epoch-based like ReduceLROnPlateau)

  • Training history returned as a dictionary

Example usage

Basic training:

import torch
from image_classification_tools.pytorch.training import train_model

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = create_model().to(device)
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

history = train_model(
    model=model,
    train_loader=train_loader,
    val_loader=val_loader,
    criterion=criterion,
    optimizer=optimizer,
    device=device,
    lazy_loading=False,  # Data already on device
    epochs=50,
    print_every=5
)

# Or enable early stopping
history = train_model(
    model=model,
    train_loader=train_loader,
    val_loader=val_loader,
    criterion=criterion,
    optimizer=optimizer,
    device=device,
    lazy_loading=False,
    enable_early_stopping=True,
    early_stopping_patience=10,
    epochs=50,
    print_every=5
)

# Access training history
print(f"Final train accuracy: {history['train_accuracy'][-1]:.2f}%")
print(f"Final val accuracy: {history['val_accuracy'][-1]:.2f}%")

With learning rate schedulers and early stopping:

device = torch.device('cuda')
model = create_model().to(device)
optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.9)

# Cyclic learning rate (steps per batch)
cyclic_scheduler = torch.optim.lr_scheduler.CyclicLR(
    optimizer, base_lr=0.001, max_lr=0.01, step_size_up=500
)

# Or use epoch-based scheduler like ReduceLROnPlateau
epoch_scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(
    optimizer, mode='min', factor=0.5, patience=5
)

history = train_model(
    model=model,
    train_loader=train_loader,
    val_loader=val_loader,
    criterion=criterion,
    optimizer=optimizer,
    device=device,
    lazy_loading=False,
    cyclic_scheduler=cyclic_scheduler,  # Or use epoch_scheduler
    enable_early_stopping=True,  # Enable early stopping (disabled by default)
    early_stopping_patience=10,
    epochs=100,
    print_every=10
)

# Access batch-level metrics
print(f"Total training batches: {len(history['batch_train_loss'])}")
print(f"Learning rate progression: {history['batch_learning_rates'][:10]}")