Quick start
Get up and running with distributed GAN training in minutes.
For students (workers)
Step 1: Choose your path
Pick the installation method that works best for you:
Easiest: Google Colab - No installation needed
Full features: Dev container - Complete development environment
Direct: Native Python - Install directly on your system
Conda users: Conda environment - Use conda package manager
Step 2: Get database credentials
Contact your instructor to receive:
Database host address
Database name
Username
Password
Step 3: Configure and run
All paths follow the same basic pattern:
Clone or fork the repository
Install dependencies (varies by path)
Configure database connection in
config.yamlStart the worker:
python src/worker.py
The CelebA dataset will be automatically downloaded from Hugging Face on first run.
Your GPU (or CPU) is now part of the training cluster!
For instructors (coordinator)
Step 1: Set up database
Deploy a publicly accessible PostgreSQL database:
# Example using PostgreSQL
createdb distributed_gan
psql distributed_gan < src/database/schema.sql
Or use a cloud provider:
AWS RDS
Google Cloud SQL
Azure Database for PostgreSQL
ElephantSQL (free tier available)
Step 2: Initialize the training system
# Install dependencies
pip install -r requirements.txt
# Configure database in config.yaml
cp config.yaml.template config.yaml
# Edit config.yaml with your database details
# Initialize database schema
python src/database/init_db.py
Step 3: Start coordinator
python src/main.py --epochs 50 --sample-interval 1
The coordinator will:
Create work units for workers to claim
Aggregate gradients from completed work
Update model weights
Generate sample images periodically
Save checkpoints
Step 4: Monitor progress
Watch the output for:
Number of active workers
Training iteration progress
Loss values
Sample generation
Check generated samples in data/outputs/samples/.
Optional: Hugging Face integration
To push checkpoints to your own Hugging Face repository:
Create a repo at huggingface.co/new
Get a write token from huggingface.co/settings/tokens
Update config.yaml:
huggingface:
enabled: true
repo_id: your-username/your-repo-name
token: your_hf_write_token
push_interval: 5
Students can then view live results without running the training themselves.
Viewing results
After training starts, use the demo notebook:
jupyter notebook notebooks/demo_trained_model.ipynb
The notebook will:
Download the latest model from Hugging Face (or use local checkpoint)
Generate new celebrity faces
Show training progress
Troubleshooting
No work units available:
Wait for the coordinator to start and create work units
Check database connection
Worker crashes:
Reduce batch size in
config.yamlCheck GPU memory with
nvidia-smiTry CPU-only mode
Slow training:
Need more workers participating
Check network connection to database
Verify workers are actively completing work units
See the troubleshooting guide for more help.
Next steps
Student guide - Detailed student workflow
Instructor guide - Coordinator management
Configuration reference - All config options