Hyperparameter optimization

Overview

The hyperparameter optimization module provides Optuna integration for automated CNN architecture search and hyperparameter tuning. It includes:

  • Dynamic CNN architecture with configurable depth, width, and components

  • Flexible search spaces defined via dictionaries

  • Automatic trial pruning for faster optimization

  • Error handling for OOM and architecture mismatches

Key components

create_cnn

Creates a CNN with dynamic architecture based on hyperparameters:

  • Variable number of convolutional blocks (1-5)

  • Doubling filter sizes per block

  • Configurable kernel sizes (decreasing pattern)

  • Dynamic fully-connected layers with halving sizes

  • Separate dropout for conv and FC layers

  • Choice of max or average pooling

  • Optional batch normalization

create_objective

Factory function that creates an Optuna objective for hyperparameter search:

  • Accepts configurable search space dictionary

  • Creates data loaders per trial with suggested batch size

  • Handles architecture errors gracefully

  • Supports MedianPruner for early stopping

train_trial

Trains a model for a single Optuna trial with pruning support.

Example usage

Basic hyperparameter optimization for MNIST:

import optuna
from torchvision import datasets, transforms
from image_classification_tools.pytorch.hyperparameter_optimization import create_objective

# Define transforms
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

# Define search space
search_space = {
    'batch_size': [64, 128, 256],
    'n_conv_blocks': (1, 3),
    'initial_filters': [16, 32],
    'n_fc_layers': (1, 3),
    'conv_dropout_rate': (0.1, 0.5),
    'fc_dropout_rate': (0.3, 0.7),
    'learning_rate': (1e-4, 1e-2, 'log'),
    'optimizer': ['Adam', 'SGD'],
    'sgd_momentum': (0.8, 0.99),
    'weight_decay': (1e-6, 1e-3, 'log')
}

# Create objective
objective = create_objective(
    data_dir='./data',
    transform=transform,
    n_epochs=20,
    device=torch.device('cuda'),
    num_classes=10,
    in_channels=1,
    search_space=search_space
)

# Run optimization
study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=50)

# Get best parameters
print(f"Best accuracy: {study.best_trial.value:.2f}%")
print("Best hyperparameters:", study.best_trial.params)

With persistent storage:

from pathlib import Path

# SQLite storage for resumable studies
storage_path = Path('./optimization.db')
storage_url = f'sqlite:///{storage_path}'

study = optuna.create_study(
    study_name='cnn_optimization',
    direction='maximize',
    storage=storage_url,
    load_if_exists=True,  # Resume if interrupted
    pruner=optuna.pruners.MedianPruner(n_warmup_steps=5)
)

study.optimize(objective, n_trials=200)

Creating the final model:

from image_classification_tools.pytorch.hyperparameter_optimization import create_cnn

best_params = study.best_trial.params

model = create_cnn(
    n_conv_blocks=best_params['n_conv_blocks'],
    initial_filters=best_params['initial_filters'],
    n_fc_layers=best_params['n_fc_layers'],
    conv_dropout_rate=best_params['conv_dropout_rate'],
    fc_dropout_rate=best_params['fc_dropout_rate'],
    num_classes=10,
    in_channels=3
).to(device)

Search space format

The search space dictionary supports three formats:

  • List: Categorical choices - [64, 128, 256]

  • Tuple (2 elements): Continuous range - (0.0, 0.5) for float, (1, 8) for int

  • Tuple (3 elements): Range with scale - (1e-5, 1e-1, 'log') for log-scaled float

Default search space includes:

  • Batch size: [64, 128, 256, 512]

  • Conv blocks: 1-5

  • Initial filters: [16, 32, 64, 128]

  • FC layers: 1-4

  • Conv dropout: 0.1-0.5

  • FC dropout: 0.3-0.7

  • Learning rate: 1e-5 to 1e-2 (log scale)

  • Optimizer: [‘Adam’, ‘SGD’, ‘RMSprop’]

  • SGD momentum: 0.8-0.99

  • Weight decay: 1e-6 to 1e-3 (log scale)