Quick start

Get up and running with distributed GAN training in minutes.

For students (workers)

Step 1: Choose your path

Pick the installation method that works best for you:

Easiest: Google Colab - No installation needed
Full features: Dev container - Complete development environment
Direct: Native Python - Install directly on your system
Conda users: Conda environment - Use conda package manager

Step 2: Get database credentials

Contact your instructor to receive:

Database host address
Database name
Username
Password

Step 3: Configure and run

All paths follow the same basic pattern:

Clone or fork the repository
Install dependencies (varies by path)
Configure database connection in config.yaml
Start the worker: python src/worker.py

The CelebA dataset will be automatically downloaded from Hugging Face on first run.

Your GPU (or CPU) is now part of the training cluster!

For instructors (coordinator)

Step 1: Set up database

Deploy a publicly accessible PostgreSQL database:

# Example using PostgreSQL
createdb distributed_gan
psql distributed_gan < src/database/schema.sql

Or use a cloud provider:

AWS RDS
Google Cloud SQL
Azure Database for PostgreSQL
ElephantSQL (free tier available)

Step 2: Initialize the training system

# Install dependencies
pip install -r requirements.txt

# Configure database in config.yaml
cp config.yaml.template config.yaml
# Edit config.yaml with your database details

# Initialize database schema
python src/database/init_db.py

Step 3: Start coordinator

python src/main.py --epochs 50 --sample-interval 1

The coordinator will:

Create work units for workers to claim
Aggregate gradients from completed work
Update model weights
Generate sample images periodically
Save checkpoints

Step 4: Monitor progress

Watch the output for:

Number of active workers
Training iteration progress
Loss values
Sample generation

Check generated samples in data/outputs/samples/.

Optional: Hugging Face integration

To push checkpoints to your own Hugging Face repository:

Create a repo at huggingface.co/new
Get a write token from huggingface.co/settings/tokens
Update config.yaml:

huggingface:
  enabled: true
  repo_id: your-username/your-repo-name
  token: your_hf_write_token
  push_interval: 5

Students can then view live results without running the training themselves.

Viewing results

After training starts, use the demo notebook:

jupyter notebook notebooks/demo_trained_model.ipynb

The notebook will:

Download the latest model from Hugging Face (or use local checkpoint)
Generate new celebrity faces
Show training progress

Troubleshooting

No work units available:

Wait for the coordinator to start and create work units
Check database connection

Worker crashes:

Reduce batch size in config.yaml
Check GPU memory with nvidia-smi
Try CPU-only mode

Slow training:

Need more workers participating
Check network connection to database
Verify workers are actively completing work units

See the troubleshooting guide for more help.

Next steps

Student guide - Detailed student workflow
Instructor guide - Coordinator management
Configuration reference - All config options