Training
Training functions for models.
- image_classification_tools.pytorch.training.train_one_epoch(model, data_loader, criterion, optimizer, device, lazy_loading=False, cyclic_scheduler=None, history=None)[source]
Run one training epoch, tracking metrics per batch.
- Parameters:
model – PyTorch model to train
data_loader – Training data loader
criterion – Loss function
optimizer – Optimizer
device – Device for training (e.g., ‘cuda’ or ‘cpu’)
lazy_loading – If True, move batches to device during training. If False, assumes data is already on device (default: False)
cyclic_scheduler – Optional CyclicLR scheduler to step after each batch
history – Optional history dictionary to record batch-level metrics
- Returns:
Tuple of (average_loss, accuracy_percentage) for the epoch
- image_classification_tools.pytorch.training.evaluate(model, data_loader, criterion, device, lazy_loading=False)[source]
Evaluate model on a dataset.
- Parameters:
model – PyTorch model to evaluate
data_loader – Data loader (validation or test set)
criterion – Loss function
device – Device for evaluation (e.g., ‘cuda’ or ‘cpu’)
lazy_loading – If True, move batches to device during evaluation. If False, assumes data is already on device (default: False)
- Returns:
Tuple of (average_loss, accuracy_percentage) for the dataset
- image_classification_tools.pytorch.training.train_model(model, train_loader, val_loader=None, criterion=None, optimizer=None, device=None, lazy_loading=False, cyclic_scheduler=None, epoch_scheduler=None, lr_schedule=None, epochs=10, enable_early_stopping=False, early_stopping_patience=10, print_every=1)[source]
Training loop with optional validation and early stopping.
Tracks metrics at both batch and epoch levels.
- Parameters:
model (
Module) – PyTorch model to traintrain_loader (
DataLoader) – Training data loaderval_loader (
DataLoader) – Validation data loader (None for training without validation)criterion (
Module) – Loss functionoptimizer (
Optimizer) – Optimizerdevice (
device) – Device for training (e.g., ‘cuda’ or ‘cpu’)lazy_loading (
bool) – If True, move batches to device during training. If False, assumes data is already on device (default: False)cyclic_scheduler – CyclicLR scheduler (steps per batch)
epoch_scheduler – Epoch-based scheduler like ReduceLROnPlateau (steps per epoch)
lr_schedule (
dict) –Optional dict with scheduled LR bounds reduction: {‘initial_base_lr’, ‘initial_max_lr’, ‘final_base_lr’,
’final_max_lr’, ‘schedule_epochs’}
epochs (
int) – Maximum number of epochsenable_early_stopping (
bool) – Whether to enable early stopping (default: False)early_stopping_patience (
int) – Stop if val_loss doesn’t improve for this many epochs (only used if enable_early_stopping=True and val_loader is not None)print_every (
int) – Print progress every N epochs
- Return type:
- Returns:
Dictionary containing training history with epoch and batch-level metrics
Functions
- image_classification_tools.pytorch.training.train_model(model, train_loader, val_loader=None, criterion=None, optimizer=None, device=None, lazy_loading=False, cyclic_scheduler=None, epoch_scheduler=None, lr_schedule=None, epochs=10, enable_early_stopping=False, early_stopping_patience=10, print_every=1)[source]
Training loop with optional validation and early stopping.
Tracks metrics at both batch and epoch levels.
- Parameters:
model (
Module) – PyTorch model to traintrain_loader (
DataLoader) – Training data loaderval_loader (
DataLoader) – Validation data loader (None for training without validation)criterion (
Module) – Loss functionoptimizer (
Optimizer) – Optimizerdevice (
device) – Device for training (e.g., ‘cuda’ or ‘cpu’)lazy_loading (
bool) – If True, move batches to device during training. If False, assumes data is already on device (default: False)cyclic_scheduler – CyclicLR scheduler (steps per batch)
epoch_scheduler – Epoch-based scheduler like ReduceLROnPlateau (steps per epoch)
lr_schedule (
dict) –Optional dict with scheduled LR bounds reduction: {‘initial_base_lr’, ‘initial_max_lr’, ‘final_base_lr’,
’final_max_lr’, ‘schedule_epochs’}
epochs (
int) – Maximum number of epochsenable_early_stopping (
bool) – Whether to enable early stopping (default: False)early_stopping_patience (
int) – Stop if val_loss doesn’t improve for this many epochs (only used if enable_early_stopping=True and val_loader is not None)print_every (
int) – Print progress every N epochs
- Return type:
- Returns:
Dictionary containing training history with epoch and batch-level metrics
Overview
The training module provides utilities for model training with:
Progress tracking and reporting
Training/validation loss and accuracy logging at epoch and batch levels
Configurable print frequency
Device specification with lazy loading or pre-loaded data support
Optional validation (can train without validation data)
Optional early stopping with model checkpoint restoration (disabled by default)
Learning rate scheduler support (cyclic and epoch-based like ReduceLROnPlateau)
Training history returned as a dictionary
Example usage
Basic training:
import torch
from image_classification_tools.pytorch.training import train_model
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = create_model().to(device)
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
history = train_model(
model=model,
train_loader=train_loader,
val_loader=val_loader,
criterion=criterion,
optimizer=optimizer,
device=device,
lazy_loading=False, # Data already on device
epochs=50,
print_every=5
)
# Or enable early stopping
history = train_model(
model=model,
train_loader=train_loader,
val_loader=val_loader,
criterion=criterion,
optimizer=optimizer,
device=device,
lazy_loading=False,
enable_early_stopping=True,
early_stopping_patience=10,
epochs=50,
print_every=5
)
# Access training history
print(f"Final train accuracy: {history['train_accuracy'][-1]:.2f}%")
print(f"Final val accuracy: {history['val_accuracy'][-1]:.2f}%")
With learning rate schedulers and early stopping:
device = torch.device('cuda')
model = create_model().to(device)
optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
# Cyclic learning rate (steps per batch)
cyclic_scheduler = torch.optim.lr_scheduler.CyclicLR(
optimizer, base_lr=0.001, max_lr=0.01, step_size_up=500
)
# Or use epoch-based scheduler like ReduceLROnPlateau
epoch_scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(
optimizer, mode='min', factor=0.5, patience=5
)
history = train_model(
model=model,
train_loader=train_loader,
val_loader=val_loader,
criterion=criterion,
optimizer=optimizer,
device=device,
lazy_loading=False,
cyclic_scheduler=cyclic_scheduler, # Or use epoch_scheduler
enable_early_stopping=True, # Enable early stopping (disabled by default)
early_stopping_patience=10,
epochs=100,
print_every=10
)
# Access batch-level metrics
print(f"Total training batches: {len(history['batch_train_loss'])}")
print(f"Learning rate progression: {history['batch_learning_rates'][:10]}")