Hyperparameter optimization

Overview

The hyperparameter optimization module provides Optuna integration for automated CNN architecture search and hyperparameter tuning. It includes:

Dynamic CNN architecture with configurable depth, width, and components
Flexible search spaces defined via dictionaries
Automatic trial pruning for faster optimization
Error handling for OOM and architecture mismatches

Key components

create_cnn

Creates a CNN with dynamic architecture based on hyperparameters:

Variable number of convolutional blocks (1-5)
Doubling filter sizes per block
Configurable kernel sizes (decreasing pattern)
Dynamic fully-connected layers with halving sizes
Separate dropout for conv and FC layers
Choice of max or average pooling
Optional batch normalization

create_objective

Factory function that creates an Optuna objective for hyperparameter search:

Accepts configurable search space dictionary
Creates data loaders per trial with suggested batch size
Handles architecture errors gracefully
Supports MedianPruner for early stopping

train_trial

Trains a model for a single Optuna trial with pruning support.

Example usage

Basic hyperparameter optimization for MNIST:

import optuna
from torchvision import datasets, transforms
from image_classification_tools.pytorch.hyperparameter_optimization import create_objective

# Define transforms
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

# Define search space
search_space = {
    'batch_size': [64, 128, 256],
    'n_conv_blocks': (1, 3),
    'initial_filters': [16, 32],
    'n_fc_layers': (1, 3),
    'conv_dropout_rate': (0.1, 0.5),
    'fc_dropout_rate': (0.3, 0.7),
    'learning_rate': (1e-4, 1e-2, 'log'),
    'optimizer': ['Adam', 'SGD'],
    'sgd_momentum': (0.8, 0.99),
    'weight_decay': (1e-6, 1e-3, 'log')
}

# Create objective
objective = create_objective(
    data_dir='./data',
    transform=transform,
    n_epochs=20,
    device=torch.device('cuda'),
    num_classes=10,
    in_channels=1,
    search_space=search_space
)

# Run optimization
study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=50)

# Get best parameters
print(f"Best accuracy: {study.best_trial.value:.2f}%")
print("Best hyperparameters:", study.best_trial.params)

With persistent storage:

from pathlib import Path

# SQLite storage for resumable studies
storage_path = Path('./optimization.db')
storage_url = f'sqlite:///{storage_path}'

study = optuna.create_study(
    study_name='cnn_optimization',
    direction='maximize',
    storage=storage_url,
    load_if_exists=True,  # Resume if interrupted
    pruner=optuna.pruners.MedianPruner(n_warmup_steps=5)
)

study.optimize(objective, n_trials=200)

Creating the final model:

from image_classification_tools.pytorch.hyperparameter_optimization import create_cnn

best_params = study.best_trial.params

model = create_cnn(
    n_conv_blocks=best_params['n_conv_blocks'],
    initial_filters=best_params['initial_filters'],
    n_fc_layers=best_params['n_fc_layers'],
    conv_dropout_rate=best_params['conv_dropout_rate'],
    fc_dropout_rate=best_params['fc_dropout_rate'],
    num_classes=10,
    in_channels=3
).to(device)

Search space format

The search space dictionary supports three formats:

List: Categorical choices - [64, 128, 256]
Tuple (2 elements): Continuous range - (0.0, 0.5) for float, (1, 8) for int
Tuple (3 elements): Range with scale - (1e-5, 1e-1, 'log') for log-scaled float

Default search space includes:

Batch size: [64, 128, 256, 512]
Conv blocks: 1-5
Initial filters: [16, 32, 64, 128]
FC layers: 1-4
Conv dropout: 0.1-0.5
FC dropout: 0.3-0.7
Learning rate: 1e-5 to 1e-2 (log scale)
Optimizer: [‘Adam’, ‘SGD’, ‘RMSprop’]
SGD momentum: 0.8-0.99
Weight decay: 1e-6 to 1e-3 (log scale)