Performance Tips

Optimize your distributed GAN training setup using built-in configuration options.

Configuration-Based Optimizations

These performance improvements work out-of-the-box by adjusting config.yaml.

Batch Size Tuning

Workers can adjust batch size independently based on their hardware:

GPU workers:

worker:
  batch_size: 64  # Increase for better GPU utilization (watch for OOM)

Check GPU utilization in a separate terminal:

watch -n 1 nvidia-smi
# GPU usage should be >80%
# Memory should be well utilized

CPU workers:

worker:
  batch_size: 16  # Reduce to avoid memory issues

DataLoader Workers

Adjust parallel data loading:

data:
  num_workers_dataloader: 4  # Default: 4, adjust based on CPU cores

Guidelines:

GPU: 4-8 workers (4 is a good default, increase if you have cores available)
CPU: 0-2 workers (to avoid overhead)

Work Unit Configuration

Balance database overhead vs. processing efficiency:

training:
  images_per_work_unit: 320  # Images assigned per work unit
  num_workunits_per_update: 3  # How many work unit gradients before updating

Trade-offs:

Larger images_per_work_unit (500-1000):
- Less database overhead
- Fewer work units to manage
- Longer to process each work unit
- Slower feedback if workers disconnect
Smaller images_per_work_unit (100-200):
- Faster work unit completion
- Better for unstable workers
- More database operations
- Higher coordination overhead

For num_workunits_per_update:

Set based on your expected number of workers
Too low (1-2): Noisy gradients, potential wasted work units
Too high (>50% of workers): Slower updates, better gradient quality
Sweet spot: ~30-50% of your total workers

Worker Polling

Reduce unnecessary database checks:

worker:
  poll_interval: 5   # Seconds between work unit checks (increase if many workers)
  heartbeat_interval: 30  # Seconds between heartbeat updates

When to increase poll_interval:

Many workers (>10): Set to 8-10 seconds
Slow network: Set to 10-15 seconds
Fast training iterations: Keep at 3-5 seconds

Monitoring performance

Check GPU Utilization

# Real-time GPU monitoring
nvidia-smi --query-gpu=utilization.gpu,memory.used,memory.total --format=csv -l 1

# Target: >80% GPU utilization

Worker Throughput

Monitor worker output for:

Images/second: Should be >100 for GPU, >10 for CPU
Work unit completion time: Should be <60 seconds for typical settings
Gradient upload time: Should be <5 seconds

Database Performance

If work units take too long to claim or complete:

Check network latency to database server
Verify database server isn’t overloaded
Consider reducing poll_interval

Best practices

Start conservative - Begin with default settings
Monitor first - Watch GPU/CPU usage before optimizing
Change one thing at a time - Easier to identify impact
Match batch size to hardware - Max out GPU memory without OOM errors
Tune for your class size - Set num_workunits_per_update based on worker count

Performance targets

Good performance indicators:

GPU utilization: >80%
Work unit processing: <30 seconds (for default config)
Worker throughput: >100 images/second (GPU), >10 images/second (CPU)
Database query time: <100ms

If below targets:

Increase batch size (GPU workers)
Increase num_workers_dataloader (if CPU available)
Check network connection to database
See Troubleshooting

Example configurations

Note: Default settings in config.yaml.template:

images_per_work_unit: 320
num_workunits_per_update: 3
batch_size: 32 (in worker section)
num_workers_dataloader: 4
poll_interval: 5

These examples show how to adjust for different class sizes:

Small class (3-5 workers)

training:
  images_per_work_unit: 320
  num_workunits_per_update: 2

worker:
  batch_size: 64  # Workers tune based on their GPU
  poll_interval: 5

Medium class (10-20 workers)

training:
  images_per_work_unit: 480
  num_workunits_per_update: 8

worker:
  batch_size: 64  # Workers tune based on their GPU
  poll_interval: 8

Large class (30+ workers)

training:
  images_per_work_unit: 640
  num_workunits_per_update: 15

worker:
  batch_size: 64  # Workers tune based on their GPU
  poll_interval: 10

Next steps

Configuration Guide - Detailed config options
Troubleshooting - Fix performance issues
Contributing - Help implement advanced optimizations