Implementing SRGAN for Photo-Realistic Image Super-Resolution

TL;DR

Context: Undergraduate research project implementing GAN-based single-image super-resolution for model zoo repository
Problem: Traditional upscaling (MSE optimization) produces blurry results; needed perceptually realistic detail generation
Intervention: Implemented SRGAN with perceptual loss (VGG features) and adversarial training for photo-realism
Impact: Achieved significant sharpness improvement over bicubic upscaling with learned texture synthesis

Intro

This project was completed during undergraduate studies as part of an open-source model zoo repository containing implementations of deep learning research models. The goal was to implement a Generative Adversarial Network for single-image super-resolution in TensorFlow, following the SRGAN architecture. The challenge was to generate high-resolution images from low-resolution inputs with perceptually realistic detail rather than pixel-perfect but blurry reconstructions.

Problem

Deep learning super-resolution traditionally uses mean squared error, producing overly smooth results
Pixel-wise accuracy and perceptual quality are misaligned objectives
Raw GAN training on images is unstable; super-resolution adds additional complexity
Limited labeled training data (LR-HR pairs) for specific domains

Intervention

Implemented SRGAN architecture with residual blocks in generator for upscaling
Designed combined loss function balancing pixel loss, perceptual loss (VGG network features), and adversarial loss
Built data preprocessing pipeline generating LR-HR pairs with augmentations
Modularized code supporting dataset replacement and transfer learning
Tuned loss weights to balance visual fidelity vs statistical similarity

Impact

Achieved significant sharpness improvement over bicubic upscaling on PSNR/SSIM metrics
Generator learned fine textures and structures beyond pixel-wise reconstruction
Delivered complete end-to-end pipeline from data loading to model export
Created reusable implementation suitable for custom datasets and domains

Why This Matters

Understanding the gap between mathematical optimization (MSE) and human perception is critical for ML systems serving real users. This project demonstrates balancing multiple competing objectives through careful loss engineering—a skill essential when standard metrics don't align with product goals.

Technical Deep Dive (Optional)

This section expands on the architecture, loss engineering, and training strategy for readers who want technical depth.

View technical deep dive

SRGAN Architecture

Generator: Convolutional Neural Network with Residual Blocks

Designed to upscale LR images (e.g., 64×64) to HR (e.g., 256×256)
Multiple residual blocks for deep feature extraction
Sub-pixel convolution (pixel shuffle) for efficient upsampling
Skip connections for gradient flow

Discriminator: CNN Classifier

Distinguishes generated HR images from real HR samples
VGG-style architecture with strided convolutions
LeakyReLU activations
Binary real/fake output

Loss Function Engineering

The key innovation in SRGAN is the combined loss function addressing multiple objectives:

Content Loss: Pixel + Perceptual

Pixel Loss (MSE)

Ensures basic structural similarity
Prevents complete hallucination
Limitation: Alone produces blurry results

Perceptual Loss

Uses features from pre-trained VGG network
Computes MSE on high-level feature representations (e.g., conv4_3 layer)
Why this matters: Captures perceptual similarity beyond pixel-wise differences
Aligns with human visual perception

Adversarial Loss

Standard GAN discriminator loss
Encourages generator to produce realistic images that fool the discriminator
Purpose: Adds fine texture details for photo-realism

Total Generator Loss: L_total = L_pixel + λ_perceptual × L_perceptual + λ_adversarial × L_adversarial

Loss Balancing: Critical trade-off determining whether output is sharp/realistic vs accurate/blurry

Training Strategy

Workflow

Data preprocessing → LR-HR dataset pairs
Model training (iterative adversarial updates)
Inference / evaluation on unseen images

Data Preprocessing

Generate LR-HR pairs via downsampling
Apply augmentations (rotation, flipping, cropping) to increase diversity
Normalize pixel values for stable training

Training Loop

Train discriminator on real HR and generated HR images
Train generator with combined loss
Alternate updates with careful ratio management

Stabilization Techniques

Learning rate scheduling
Gradient clipping
Optional: Discriminator pretraining

Evaluation Methodology

Quantitative Metrics

Peak Signal-to-Noise Ratio (PSNR): Measures reconstruction accuracy
Structural Similarity Index (SSIM): Assesses perceptual similarity

Qualitative Assessment

Visual comparison against bicubic baseline
Texture detail and edge sharpness inspection
Artifact presence check (ringing, aliasing)

Key Insight: PSNR/SSIM don't fully capture perceptual quality; visual assessment critical

Implementation Details

Language/Framework: Python, TensorFlow (original version)

Key Modules

Data loader and augmentations (LR/HR pairing)
Generator and discriminator model definitions
Training loops with alternating optimization
Logging & checkpoints for reproducibility

Reusability: Modularized code supporting:

Replacement of datasets
Transfer learning with custom image sources
Exporting trained models for deployment

Performance Results

Metrics

Significant improvement in sharpness over bicubic upscaling
PSNR and SSIM scores indicate better structural preservation
Qualitative: Sharper edges, finer textures, learned texture synthesis

Trade-offs

Some artifacts in challenging regions (inherent GAN trade-off)
Balance between perceptual quality and pixel accuracy

Challenges & Solutions

| Challenge | Approach | |-----------|----------| | GAN training instability | Careful tuning of learning rates, perceptual loss balancing | | Limited labeled training images | Used standard datasets with augmentation | | Evaluating perceptual quality | Combined PSNR/SSIM with qualitative visual results |

Potential Production Extensions

If revisited for deployment:

TensorFlow 2.x / Keras refactor with modern API patterns
ESRGAN for enhanced visual quality
Export as TF-Lite or TensorFlow Serving for production deployment
API or demo interface for client proof-of-concept

Repository: Original code reference: pclubiitk/model-zoo → super_resolution/SRGAN_TensorFlow on GitHub.

Completed during undergraduate studies as part of open-source model zoo repository focused on research implementations.