Quick start

Get up and running with distributed GAN training in minutes.

For students (workers)

Step 1: Choose your path

Pick the installation method that works best for you:

Step 2: Get database credentials

Contact your instructor to receive:

  • Database host address

  • Database name

  • Username

  • Password

Step 3: Configure and run

All paths follow the same basic pattern:

  1. Clone or fork the repository

  2. Install dependencies (varies by path)

  3. Configure database connection in config.yaml

  4. Start the worker: python src/worker.py

The CelebA dataset will be automatically downloaded from Hugging Face on first run.

Your GPU (or CPU) is now part of the training cluster!

For instructors (coordinator)

Step 1: Set up database

Deploy a publicly accessible PostgreSQL database:

# Example using PostgreSQL
createdb distributed_gan
psql distributed_gan < src/database/schema.sql

Or use a cloud provider:

  • AWS RDS

  • Google Cloud SQL

  • Azure Database for PostgreSQL

  • ElephantSQL (free tier available)

Step 2: Initialize the training system

# Install dependencies
pip install -r requirements.txt

# Configure database in config.yaml
cp config.yaml.template config.yaml
# Edit config.yaml with your database details

# Initialize database schema
python src/database/init_db.py

Step 3: Start coordinator

python src/main.py --epochs 50 --sample-interval 1

The coordinator will:

  • Create work units for workers to claim

  • Aggregate gradients from completed work

  • Update model weights

  • Generate sample images periodically

  • Save checkpoints

Step 4: Monitor progress

Watch the output for:

  • Number of active workers

  • Training iteration progress

  • Loss values

  • Sample generation

Check generated samples in data/outputs/samples/.

Optional: Hugging Face integration

To push checkpoints to your own Hugging Face repository:

  1. Create a repo at huggingface.co/new

  2. Get a write token from huggingface.co/settings/tokens

  3. Update config.yaml:

huggingface:
  enabled: true
  repo_id: your-username/your-repo-name
  token: your_hf_write_token
  push_interval: 5

Students can then view live results without running the training themselves.

Viewing results

After training starts, use the demo notebook:

jupyter notebook notebooks/demo_trained_model.ipynb

The notebook will:

  • Download the latest model from Hugging Face (or use local checkpoint)

  • Generate new celebrity faces

  • Show training progress

Troubleshooting

No work units available:

  • Wait for the coordinator to start and create work units

  • Check database connection

Worker crashes:

  • Reduce batch size in config.yaml

  • Check GPU memory with nvidia-smi

  • Try CPU-only mode

Slow training:

  • Need more workers participating

  • Check network connection to database

  • Verify workers are actively completing work units

See the troubleshooting guide for more help.

Next steps