User guide
This guide explains the key concepts and parameters of Hill Climber.
Data format and terminology
Hill Climber works with tabular data:
- Data Shape
Input data has shape
(N, M)where:N= number of samples (rows/data points)M= number of features (columns)
- Accepted formats
NumPy arrays:
np.ndarraywith shape(N, M)Pandas DataFrames:
pd.DataFramewith M columns
- Objective function signature
Your objective function receives M separate 1D arrays (one per column):
For M=2:
objective_func(x, y)For M=3:
objective_func(x, y, z)For M=4:
objective_func(w, x, y, z)
Note
The term “2D data” in this documentation refers to data with 2 features (M=2),
not the numpy array dimensionality. All input data are 2D numpy arrays with
shape (N, M).
Optimization modes
Hill Climber supports three modes:
- Maximize mode (
mode='maximize') Searches for solutions that maximize the objective function value. Use this when higher objective values are better.
- Minimize mode (
mode='minimize') Searches for solutions that minimize the objective function value. Use this when lower objective values are better.
- Target mode (
mode='target') Searches for solutions that approach a specific target value. Requires setting
target_valueparameter. The objective function should return the distance from the target (minimized internally).
Objective functions
An objective function takes the data columns as arguments and returns:
A dictionary of metrics to track
A single objective value to optimize
Your objective function should accept as many arguments as you have columns in your input data.
Examples:
# For 2-column data (M=2)
def objective_2col(x, y):
# Calculate metrics
mean_x = np.mean(x)
mean_y = np.mean(y)
# Calculate objective (e.g., minimize difference)
objective = -abs(mean_x - mean_y)
# Return metrics and objective
metrics = {
'Mean X': mean_x,
'Mean Y': mean_y,
'Difference': abs(mean_x - mean_y)
}
return metrics, objective
# For 3-column data (M=3)
def objective_3col(x, y, z):
# Calculate metrics
correlation_xy = pearsonr(x, y)[0]
correlation_xz = pearsonr(x, z)[0]
# Calculate objective (e.g., maximize sum of correlations)
objective = correlation_xy + correlation_xz
# Return metrics and objective
metrics = {
'Corr XY': correlation_xy,
'Corr XZ': correlation_xz
}
return metrics, objective
Hyperparameters
- n_replicas (default: 4)
Number of replicas for parallel tempering (replica exchange). More replicas provide better exploration but use more memory. Each replica runs at a different temperature from the temperature ladder. Set to 1 for simulated annealing without replica exchange.
- T_min (default: 0.0001)
Minimum temperature for the coldest replica in the temperature ladder. Also used as the base temperature for simulated annealing. Higher temperatures allow more exploration of suboptimal solutions.
- T_max (default: 100 * T_min)
Maximum temperature for the hottest replica in the temperature ladder. Should be significantly higher than T_min for effective replica exchange. Defaults to 100 times T_min if not specified.
- temperature_scheme (default: ‘geometric’)
How to space temperatures in the ladder: ‘geometric’ or ‘linear’. Geometric spacing typically provides better exchange acceptance rates.
- exchange_interval (default: 100)
Number of optimization steps between replica exchange attempts. Smaller values attempt exchanges more frequently but increase overhead.
- exchange_strategy (default: ‘even_odd’)
Strategy for selecting replica pairs for exchange:
‘even_odd’: Alternates between even and odd neighboring pairs
‘random’: Random pair selection
‘all_neighbors’: All neighboring pairs
- initial_step_spread (default: 0.25)
Initial perturbation spread as a fraction of each feature’s data range (0.25 = 25% of range). Controls the magnitude of changes relative to your data scale. The actual perturbation standard deviation is calculated per-feature as
initial_step_spread * feature_range, where each feature uses its own range for more appropriate perturbations across different scales. Larger values create more dramatic perturbations, smaller values make more subtle adjustments.- final_step_spread (default: None)
Final perturbation spread as a fraction of each feature’s data range. If specified, step spread linearly decreases from initial_step_spread to final_step_spread over the course of max_time, enabling time-based cooling for more refined optimization near the end of the run. Leave as None to maintain constant step spread throughout.
- perturb_fraction (default: 0.001)
Fraction of data points to modify in each iteration (0.0 to 1.0). Higher values create more dramatic changes per step.
- cooling_rate (default: 1e-10)
Amount subtracted from 1 to get the multiplicative cooling factor. The temperature is multiplied by
(1 - cooling_rate)each iteration. Smaller values result in slower cooling and longer exploration. For example,1e-10meanstemp *= 0.9999999999each step.- max_time (default: 10)
Maximum optimization time in minutes.
- checkpoint_file (default: None)
Path to save checkpoints. If specified, the optimizer saves its state after each batch, allowing resumption if interrupted.
- checkpoint_interval (default: 1)
Number of batches between checkpoint saves. Default is 1 (save every batch). Set higher to reduce I/O overhead.
- db_enabled (default: True)
Enable database logging for real-time dashboard monitoring. Requires installing with dashboard extras.
- db_path (default: ‘../data/hill_climb.db’)
Path to SQLite database file for dashboard data.
- db_step_interval (default: tiered based on exchange_interval)
Sample perturbations every Nth evaluation for database logging. Uses tiered sampling:
exchange_interval < 10: sample every step (db_step_interval = 1)
exchange_interval 10-99: sample every 10 steps (db_step_interval = 10)
exchange_interval 100-999: sample every 100 steps (db_step_interval = 100)
exchange_interval >= 1000: sample every 1000 steps (db_step_interval = 1000)
This creates a sampled view of all perturbations in the database while keeping database size manageable.
Note
All accepted steps and improvements are recorded regardless of this setting. Only the sampled perturbation records are affected by db_step_interval.
- verbose (default: False)
Print progress messages during optimization.
- n_workers (default: n_replicas)
Number of worker processes for parallel execution. Defaults to number of replicas.
Checkpoints
If you specify a checkpoint_file path, the optimizer saves its state periodically,
allowing you to resume from the most recent state if interrupted or continue optimization
after the run ends.
Note
Checkpoints store the entire optimizer state, including current solutions, best solutions, temperatures, and history. This allows seamless resumption.
- Batch size
The batch size is determined by
exchange_interval(default: 100 steps). After each batch, the optimizer:Attempts replica exchanges
Saves a checkpoint (if
checkpoint_fileis specified and checkpoint_interval condition is met)Updates the progress dashboard database (if
db_enabledis True)
- Checkpoint frequency
The actual frequency of checkpoints is controlled by
checkpoint_interval. By default, a checkpoint is saved after every batch (i.e., everyexchange_intervalsteps). You can save checkpoints less frequently by settingcheckpoint_intervalto a higher value to reduce I/O.
Boundary handling
Hill Climber uses reflection to keep perturbed values within the original data bounds:
When a perturbation would push a value beyond the minimum bound, it reflects back into the valid range
Same for maximum bounds
This prevents artificial accumulation of points at boundaries
Example: If minimum is 5 and a perturbation creates 4.5, it reflects to 5.5.
Replica exchange (parallel tempering)
Hill Climber 2.0 uses replica exchange to improve global optimization. Multiple replicas run simultaneously at different temperatures:
How it works
Each replica has its own temperature from a ladder (e.g., 1000, 2154, 4641, 10000)
All replicas perform optimization steps independently
Periodically, replicas attempt to exchange configurations
Exchanges use Metropolis criterion: better solutions move to cooler temperatures
The coldest replica typically finds the best solution
Temperature ladder
from hill_climber import TemperatureLadder
# Geometric spacing (default, recommended)
ladder = TemperatureLadder.geometric(n_replicas=4, T_min=0.0001, T_max=0.01)
print(ladder.temperatures) # [0.0001, 0.000215, 0.000464, 0.01]
# Linear spacing
ladder = TemperatureLadder.linear(n_replicas=4, T_min=0.0001, T_max=0.01)
print(ladder.temperatures) # [0.0001, 0.0034, 0.0067, 0.01]
Benefits
Better global optimization compared to single-temperature annealing
Hotter replicas explore broadly, cooler replicas exploit locally
Exchanges allow good solutions to refine at low temperatures
More robust than independent parallel runs
Checkpointing
For long optimizations, save intermediate progress:
climber = HillClimber(
data=data,
objective_func=my_objective,
max_time=60,
checkpoint_file='optimization.pkl',
)
result = climber.climb()
Resume from a checkpoint:
# Continue with saved temperatures (default)
resumed = HillClimber.load_checkpoint(
filepath='optimization.pkl',
objective_func=my_objective
)
# Or reset temperatures to original ladder values
resumed = HillClimber.load_checkpoint(
filepath='optimization.pkl',
objective_func=my_objective,
reset_temperatures=True
)
# Continue optimizing
best_data = resumed.climb()
Note
It is also possible to resume a run while it is still in memory by simply calling
climb() again on the existing HillClimber instance.
Results structure
The climb() method returns the best data found:
best_data = climber.climb()
Where:
best_data: Optimized data (DataFrame or numpy array, same format as input) from the best-performing replica
After optimization, you can access replica state:
# Get best replica
best_replica = max(climber.replicas, key=lambda r: r['best_objective'])
# Access state
print(f"Perturbation number: {best_replica['perturbation_num']}")
print(f"Accepted steps: {best_replica['num_accepted']}")
print(f"Improvements found: {best_replica['num_improvements']}")
print(f"Best objective: {best_replica['best_objective']}")
print(f"Best metrics: {best_replica['best_metrics']}")
State tracking
Hill Climber tracks three types of events:
- Perturbations (all evaluations)
Every perturbation is evaluated. A sampled view is recorded to the database every
db_step_intervalevaluations for monitoring without overwhelming storage.- Accepted steps (SA acceptances)
When simulated annealing accepts a move (even if worse), it’s recorded with full user-defined metrics. This shows the complete SA exploration path.
- Improvements (new best found)
When a new best solution is found, it’s recorded with full metrics. This provides a monotonic view of progress toward the optimal solution.
All events are indexed by perturbation_num, a monotonically increasing counter
that never resets. This provides:
Single source of truth: Database contains complete history
Clear semantics: No confusion between different counters
Three views: Sample, complete SA path, improvements only
Easy analysis: Query any view independently
Database schema
When db_enabled=True, Hill Climber creates:
- perturbations table
Sampled view of all evaluations (every db_step_interval)
perturbation_num, objective, is_accepted, is_improvement, temperature
- accepted_steps table
Complete record of all SA-accepted moves
perturbation_num, objective, temperature
- step_metrics table
User-defined metrics for each accepted step
perturbation_num, metric_name, value
- improvements table
Complete record of all improvements found
perturbation_num, best_objective, temperature
- improvement_metrics table
User-defined metrics at each improvement
perturbation_num, metric_name, value
- replica_status table
Current snapshot of each replica (updated after each batch)
current_perturbation_num, num_accepted, num_improvements, best_objective
Internal Architecture
Hill Climber uses a ReplicaState dataclass to manage the state of each replica
during optimization. Key state attributes:
perturbation_num: Global counter (increments with every evaluation)
num_accepted: Count of SA-accepted moves
num_improvements: Count of improvements found
current_data: Current solution being explored
best_data: Best solution found so far
best_objective: Best objective value found
best_metrics: User-defined metrics at best solution
temperature: Current temperature
This provides:
Clean separation: Hyperparameters in
HillClimber, runtime state inReplicaStateEasy checkpointing: State serializes as a unit
Single counter: No confusion between different metrics
Type safety: Dataclass provides clear typing
You don’t need to interact with ReplicaState directly - it’s used internally
by the HillClimber class to manage each replica’s optimization state.