TL;DR
- Context: Undergraduate research project implementing GAN-based single-image super-resolution for model zoo repository
- Problem: Traditional upscaling (MSE optimization) produces blurry results; needed perceptually realistic detail generation
- Intervention: Implemented SRGAN with perceptual loss (VGG features) and adversarial training for photo-realism
- Impact: Achieved significant sharpness improvement over bicubic upscaling with learned texture synthesis
Intro
This project was completed during undergraduate studies as part of an open-source model zoo repository containing implementations of deep learning research models. The goal was to implement a Generative Adversarial Network for single-image super-resolution in TensorFlow, following the SRGAN architecture. The challenge was to generate high-resolution images from low-resolution inputs with perceptually realistic detail rather than pixel-perfect but blurry reconstructions.
Problem
- Deep learning super-resolution traditionally uses mean squared error, producing overly smooth results
- Pixel-wise accuracy and perceptual quality are misaligned objectives
- Raw GAN training on images is unstable; super-resolution adds additional complexity
- Limited labeled training data (LR-HR pairs) for specific domains
Intervention
- Implemented SRGAN architecture with residual blocks in generator for upscaling
- Designed combined loss function balancing pixel loss, perceptual loss (VGG network features), and adversarial loss
- Built data preprocessing pipeline generating LR-HR pairs with augmentations
- Modularized code supporting dataset replacement and transfer learning
- Tuned loss weights to balance visual fidelity vs statistical similarity
Impact
- Achieved significant sharpness improvement over bicubic upscaling on PSNR/SSIM metrics
- Generator learned fine textures and structures beyond pixel-wise reconstruction
- Delivered complete end-to-end pipeline from data loading to model export
- Created reusable implementation suitable for custom datasets and domains
Why This Matters
Understanding the gap between mathematical optimization (MSE) and human perception is critical for ML systems serving real users. This project demonstrates balancing multiple competing objectives through careful loss engineering—a skill essential when standard metrics don't align with product goals.
Technical Deep Dive (Optional)
This section expands on the architecture, loss engineering, and training strategy for readers who want technical depth.
View technical deep dive
SRGAN Architecture
Generator: Convolutional Neural Network with Residual Blocks
- Designed to upscale LR images (e.g., 64×64) to HR (e.g., 256×256)
- Multiple residual blocks for deep feature extraction
- Sub-pixel convolution (pixel shuffle) for efficient upsampling
- Skip connections for gradient flow
Discriminator: CNN Classifier
- Distinguishes generated HR images from real HR samples
- VGG-style architecture with strided convolutions
- LeakyReLU activations
- Binary real/fake output
Loss Function Engineering
The key innovation in SRGAN is the combined loss function addressing multiple objectives:
Content Loss: Pixel + Perceptual
Pixel Loss (MSE)
- Ensures basic structural similarity
- Prevents complete hallucination
- Limitation: Alone produces blurry results
Perceptual Loss
- Uses features from pre-trained VGG network
- Computes MSE on high-level feature representations (e.g., conv4_3 layer)
- Why this matters: Captures perceptual similarity beyond pixel-wise differences
- Aligns with human visual perception
Adversarial Loss
- Standard GAN discriminator loss
- Encourages generator to produce realistic images that fool the discriminator
- Purpose: Adds fine texture details for photo-realism
Total Generator Loss: L_total = L_pixel + λ_perceptual × L_perceptual + λ_adversarial × L_adversarial
Loss Balancing: Critical trade-off determining whether output is sharp/realistic vs accurate/blurry
Training Strategy
Workflow
- Data preprocessing → LR-HR dataset pairs
- Model training (iterative adversarial updates)
- Inference / evaluation on unseen images
Data Preprocessing
- Generate LR-HR pairs via downsampling
- Apply augmentations (rotation, flipping, cropping) to increase diversity
- Normalize pixel values for stable training
Training Loop
- Train discriminator on real HR and generated HR images
- Train generator with combined loss
- Alternate updates with careful ratio management
Stabilization Techniques
- Learning rate scheduling
- Gradient clipping
- Optional: Discriminator pretraining
Evaluation Methodology
Quantitative Metrics
- Peak Signal-to-Noise Ratio (PSNR): Measures reconstruction accuracy
- Structural Similarity Index (SSIM): Assesses perceptual similarity
Qualitative Assessment
- Visual comparison against bicubic baseline
- Texture detail and edge sharpness inspection
- Artifact presence check (ringing, aliasing)
Key Insight: PSNR/SSIM don't fully capture perceptual quality; visual assessment critical
Implementation Details
Language/Framework: Python, TensorFlow (original version)
Key Modules
- Data loader and augmentations (LR/HR pairing)
- Generator and discriminator model definitions
- Training loops with alternating optimization
- Logging & checkpoints for reproducibility
Reusability: Modularized code supporting:
- Replacement of datasets
- Transfer learning with custom image sources
- Exporting trained models for deployment
Performance Results
Metrics
- Significant improvement in sharpness over bicubic upscaling
- PSNR and SSIM scores indicate better structural preservation
- Qualitative: Sharper edges, finer textures, learned texture synthesis
Trade-offs
- Some artifacts in challenging regions (inherent GAN trade-off)
- Balance between perceptual quality and pixel accuracy
Challenges & Solutions
| Challenge | Approach | |-----------|----------| | GAN training instability | Careful tuning of learning rates, perceptual loss balancing | | Limited labeled training images | Used standard datasets with augmentation | | Evaluating perceptual quality | Combined PSNR/SSIM with qualitative visual results |
Potential Production Extensions
If revisited for deployment:
- TensorFlow 2.x / Keras refactor with modern API patterns
- ESRGAN for enhanced visual quality
- Export as TF-Lite or TensorFlow Serving for production deployment
- API or demo interface for client proof-of-concept
Repository: Original code reference: pclubiitk/model-zoo → super_resolution/SRGAN_TensorFlow on GitHub.
Completed during undergraduate studies as part of open-source model zoo repository focused on research implementations.