Hyperparameter optimization
============================

.. automodule:: image_classification_tools.pytorch.hyperparameter_optimization
   :members:
   :undoc-members:
   :show-inheritance:

Overview
--------

The hyperparameter optimization module provides Optuna integration for automated CNN architecture 
search and hyperparameter tuning. It includes:

* **Dynamic CNN architecture** with configurable depth, width, and components
* **Flexible search spaces** defined via dictionaries
* **Automatic trial pruning** for faster optimization
* **Error handling** for OOM and architecture mismatches

Key components
--------------

create_cnn
~~~~~~~~~~

Creates a CNN with dynamic architecture based on hyperparameters:

* Variable number of convolutional blocks (1-5)
* Doubling filter sizes per block
* Configurable kernel sizes (decreasing pattern)
* Dynamic fully-connected layers with halving sizes
* Separate dropout for conv and FC layers
* Choice of max or average pooling
* Optional batch normalization

create_objective
~~~~~~~~~~~~~~~~

Factory function that creates an Optuna objective for hyperparameter search:

* Accepts configurable search space dictionary
* Creates data loaders per trial with suggested batch size
* Handles architecture errors gracefully
* Supports MedianPruner for early stopping

train_trial
~~~~~~~~~~~

Trains a model for a single Optuna trial with pruning support.

Example usage
-------------

Basic hyperparameter optimization for MNIST:

.. code-block:: python

   import optuna
   from torchvision import datasets, transforms
   from image_classification_tools.pytorch.hyperparameter_optimization import create_objective

   # Define transforms
   transform = transforms.Compose([
       transforms.ToTensor(),
       transforms.Normalize((0.5,), (0.5,))
   ])

   # Define search space
   search_space = {
       'batch_size': [64, 128, 256],
       'n_conv_blocks': (1, 3),
       'initial_filters': [16, 32],
       'n_fc_layers': (1, 3),
       'conv_dropout_rate': (0.1, 0.5),
       'fc_dropout_rate': (0.3, 0.7),
       'learning_rate': (1e-4, 1e-2, 'log'),
       'optimizer': ['Adam', 'SGD'],
       'sgd_momentum': (0.8, 0.99),
       'weight_decay': (1e-6, 1e-3, 'log')
   }

   # Create objective
   objective = create_objective(
       data_dir='./data',
       transform=transform,
       n_epochs=20,
       device=torch.device('cuda'),
       num_classes=10,
       in_channels=1,
       search_space=search_space
   )

   # Run optimization
   study = optuna.create_study(direction='maximize')
   study.optimize(objective, n_trials=50)

   # Get best parameters
   print(f"Best accuracy: {study.best_trial.value:.2f}%")
   print("Best hyperparameters:", study.best_trial.params)

With persistent storage:

.. code-block:: python

   from pathlib import Path

   # SQLite storage for resumable studies
   storage_path = Path('./optimization.db')
   storage_url = f'sqlite:///{storage_path}'

   study = optuna.create_study(
       study_name='cnn_optimization',
       direction='maximize',
       storage=storage_url,
       load_if_exists=True,  # Resume if interrupted
       pruner=optuna.pruners.MedianPruner(n_warmup_steps=5)
   )

   study.optimize(objective, n_trials=200)

Creating the final model:

.. code-block:: python

   from image_classification_tools.pytorch.hyperparameter_optimization import create_cnn

   best_params = study.best_trial.params

   model = create_cnn(
       n_conv_blocks=best_params['n_conv_blocks'],
       initial_filters=best_params['initial_filters'],
       n_fc_layers=best_params['n_fc_layers'],
       conv_dropout_rate=best_params['conv_dropout_rate'],
       fc_dropout_rate=best_params['fc_dropout_rate'],
       num_classes=10,
       in_channels=3
   ).to(device)

Search space format
-------------------

The search space dictionary supports three formats:

* **List**: Categorical choices - ``[64, 128, 256]``
* **Tuple (2 elements)**: Continuous range - ``(0.0, 0.5)`` for float, ``(1, 8)`` for int
* **Tuple (3 elements)**: Range with scale - ``(1e-5, 1e-1, 'log')`` for log-scaled float

Default search space includes:

* Batch size: [64, 128, 256, 512]
* Conv blocks: 1-5
* Initial filters: [16, 32, 64, 128]
* FC layers: 1-4
* Conv dropout: 0.1-0.5
* FC dropout: 0.3-0.7
* Learning rate: 1e-5 to 1e-2 (log scale)
* Optimizer: ['Adam', 'SGD', 'RMSprop']
* SGD momentum: 0.8-0.99
* Weight decay: 1e-6 to 1e-3 (log scale)