Contributing
Thank you for your interest in contributing to GANNs with friends!
Ways to contribute
Report bugs and issues
Improve documentation
Add new features
Optimize performance
Create tutorials and examples
Getting started
Fork and clone
git clone https://github.com/YOUR_USERNAME/GANNs-with-friends.git
cd GANNs-with-friends
git checkout -b feature/my-new-feature
Make your changes
Edit the code
Test manually by running the affected scripts
Update documentation if needed
Commit with a clear message
Push and create a pull request
Pull request guidelines
Test your changes manually
Update documentation if needed
Write clear commit messages
Describe what changed and why in the PR description
Reporting bugs
Include in your bug report:
Clear title and description
Steps to reproduce
Expected vs actual behavior
System info (OS, Python version, GPU)
Error messages and logs
Development philosophy
This project uses AI-assisted development with human oversight. For details about the development approach, collaborative workflow, and concrete examples, see the Development approach section.
Advanced performance optimization ideas
These are advanced code modifications that could significantly improve system performance. They require deeper understanding of the codebase and are excellent contributions for experienced developers or students learning about distributed systems optimization.
Database Optimizations
Add indexes for faster queries
-- Speed up work unit queries
CREATE INDEX idx_work_units_status_iteration
ON work_units(status, iteration) WHERE status = 'pending';
CREATE INDEX idx_work_units_claimed
ON work_units(claimed_at) WHERE status = 'in_progress';
-- Speed up worker queries
CREATE INDEX idx_workers_heartbeat
ON workers(last_heartbeat);
-- Speed up gradient lookups
CREATE INDEX idx_gradients_work_unit
ON gradients(work_unit_id);
Connection pooling
Currently each database operation creates a new connection. Implementing connection pooling could reduce overhead:
from psycopg2 import pool
class DatabaseManager:
def __init__(self):
self.connection_pool = pool.SimpleConnectionPool(
minconn=1,
maxconn=10,
**db_config
)
def get_connection(self):
return self.connection_pool.getconn()
def release_connection(self, conn):
self.connection_pool.putconn(conn)
Regular maintenance
Add automated database cleanup:
-- Delete old gradients after aggregation
DELETE FROM gradients
WHERE work_unit_id IN (
SELECT id FROM work_units
WHERE iteration < (SELECT current_iteration FROM training_state) - 1
);
-- Archive old work units
CREATE TABLE work_units_archive AS
SELECT * FROM work_units
WHERE iteration < CURRENT_ITERATION - 10;
DELETE FROM work_units
WHERE iteration < CURRENT_ITERATION - 10;
Training Optimizations
Mixed precision training
Use automatic mixed precision for faster training on modern GPUs:
from torch.cuda.amp import autocast, GradScaler
class Worker:
def __init__(self, config_path):
# ... existing init code ...
self.scaler = GradScaler()
def compute_gradients(self, real_batch):
with autocast():
fake_images = self.generator(self.noise)
fake_output = self.discriminator(fake_images)
loss_g = self.criterion(fake_output, self.real_labels)
self.scaler.scale(loss_g).backward()
# ... rest of gradient computation
Gradient accumulation
Simulate larger batch sizes without increasing memory:
accumulation_steps = 4
for i, batch in enumerate(batches):
loss = compute_loss(batch)
loss = loss / accumulation_steps
loss.backward()
if (i + 1) % accumulation_steps == 0:
optimizer.step()
optimizer.zero_grad()
Network Optimizations
Gradient compression
Reduce network traffic by compressing gradients:
def compress_gradients(gradients):
"""Compress gradients before upload to database."""
# Quantize to float16
compressed = {
k: v.half() for k, v in gradients.items()
}
# Could also use: top-k sparsification, quantization, etc.
return compressed
def decompress_gradients(compressed):
"""Decompress after download from database."""
return {
k: v.float() for k, v in compressed.items()
}
Batched uploads
Upload multiple work unit results together:
class Worker:
def __init__(self):
self.results_buffer = []
self.buffer_size = 5
def process_work_unit(self, work_unit):
gradients = self.compute_gradients(work_unit)
self.results_buffer.append((work_unit.id, gradients))
if len(self.results_buffer) >= self.buffer_size:
self.upload_batch(self.results_buffer)
self.results_buffer.clear()
Local weight caching
Only download weights when they change:
class Worker:
def __init__(self):
self.cached_iteration = -1
self.cached_weights = None
def get_weights(self):
current_iteration = self.db.get_current_iteration()
if current_iteration != self.cached_iteration:
self.cached_weights = self.db.get_weights()
self.cached_iteration = current_iteration
return self.cached_weights
Monitoring Optimizations
Async heartbeats
Send heartbeats in background thread to avoid blocking:
import threading
class Worker:
def start_heartbeat(self):
def heartbeat_loop():
while self.running:
self.db.update_heartbeat(self.worker_id)
time.sleep(30)
self.heartbeat_thread = threading.Thread(
target=heartbeat_loop,
daemon=True
)
self.heartbeat_thread.start()
Reduced logging overhead
Log less frequently in tight loops:
# Instead of logging every iteration
if iteration % 10 == 0:
logger.info(f"Progress: {iteration}")
Profiling Tools
Measure performance
Add benchmarking code to find bottlenecks:
import time
class PerformanceTimer:
def __init__(self, name):
self.name = name
self.start_time = None
def __enter__(self):
self.start_time = time.time()
return self
def __exit__(self, *args):
elapsed = time.time() - self.start_time
print(f"{self.name}: {elapsed:.3f}s")
# Usage:
with PerformanceTimer("Gradient computation"):
gradients = compute_gradients(batch)
with PerformanceTimer("Database upload"):
upload_gradients(gradients)
Code profiling
Use cProfile to find slow functions:
import cProfile
import pstats
profiler = cProfile.Profile()
profiler.enable()
# Code to profile
process_work_unit()
profiler.disable()
stats = pstats.Stats(profiler)
stats.sort_stats('cumulative')
stats.print_stats(20) # Show top 20 functions
Resource Management
CPU affinity
Pin workers to specific CPU cores:
import os
def set_cpu_affinity(core_ids):
"""Pin process to specific CPU cores."""
os.sched_setaffinity(0, set(core_ids))
# Example: use cores 0-3 for this worker
set_cpu_affinity([0, 1, 2, 3])
Multiple workers per GPU
Run multiple worker processes on one GPU:
# Terminal 1
CUDA_VISIBLE_DEVICES=0 python src/worker.py --config config.yaml &
# Terminal 2
CUDA_VISIBLE_DEVICES=0 python src/worker.py --config config.yaml &
Implementation Notes
Before implementing:
Profile to confirm the optimization is needed
Understand the trade-offs (complexity vs. performance gain)
Test manually to verify functionality
Document the changes thoroughly
Consider backward compatibility
Testing performance improvements:
Measure before and after with realistic workloads
Test with multiple workers, not just one
Check for edge cases and failure modes
Verify results are numerically equivalent
Good first optimizations:
Database indexes (easy, high impact)
Local weight caching (medium difficulty, good gains)
Reduced logging (easy, modest gains)
Connection pooling (medium difficulty, good for many workers)
Advanced optimizations:
Mixed precision training (requires careful testing)
Gradient compression (complex, measure quality impact)
Async operations (increases code complexity)
License
By contributing, you agree that your contributions will be licensed under the project’s MIT License.
Questions?
Open an issue or contact the project maintainers.