AI/MLInfrastructureEdge AIAutomation

Scaling Edge ML Infrastructure Through Automated Quantization

AI Intern at EdgeNeural

Built automated quantization-aware training pipeline reducing model deployment time from days to hours

TL;DR

  • Context: AI intern at edge ML company deploying CNNs on resource-constrained IoT devices
  • Problem: Manual quantization-aware training was bottlenecking deployment and limiting product iteration
  • Intervention: Built automated QAT pipeline with observability layer for distributed training on AWS
  • Impact: Reduced per-model deployment time from days to hours, enabled self-service ML workflows

Intro

This work was done during my internship at EdgeNeural, a company focused on deploying deep learning models on edge and IoT hardware. The core challenge was enabling real-time inference on resource-constrained devices without sacrificing model accuracy. Quantization-aware training was critical, but treating it as a manual, per-model process was preventing the team from scaling to multiple customer deployments simultaneously.


Problem

  • Manual QAT required deep model-specific expertise, preventing non-ML team members from deploying models
  • No standardized training workflow meant duplicated effort across architectures (ResNet, MobileNet, EfficientNet)
  • Training ran in opaque Docker containers on EC2 with zero visibility into progress or failures
  • Lack of observability meant ops and product teams couldn't coordinate effectively

Intervention

  • Designed architecture-agnostic QAT pipeline using PyTorch and MMClassification framework
  • Standardized training workflow across all supported CNN architectures (ResNet, EfficientNet, MobileNet, ShuffleNet)
  • Implemented custom training hooks to expose metrics, dataset stats, and artifacts via REST APIs
  • Built observability layer connecting distributed training to frontend dashboard for real-time visibility
  • Integrated artifact storage with S3 for reliable model persistence and deployment pipelines

Impact

  • Reduced marginal cost of adding QAT support for new architectures from days to hours
  • Enabled product and ops teams to track training progress without ML expertise
  • Established repeatable, production-grade workflow instead of ad-hoc experimentation
  • Created foundation reusable across multiple customer use cases and datasets

Why This Matters

Early-stage ML companies often treat training as "research work" rather than product infrastructure. Building observable, reproducible training systems early prevents the transition from prototype to production from becoming a complete rewrite.


Technical Deep Dive (Optional)

This section expands on the pipeline architecture, framework choices, and observability implementation for readers who want technical depth.

View technical deep dive

Architecture Decision: MMClassification as Foundation

Why MMClassification

  • Provided standardized config-driven training for multiple architectures
  • Reduced duplicated training logic across model families
  • Enabled faster experimentation through consistent interfaces

Integration with QAT

  • Wrapped PyTorch's quantization APIs into MMClassification training loops
  • Ensured quantized models remained exportable to edge runtimes
  • Maintained accuracy parity with full-precision baselines

Observability Layer Design

Problem Training jobs ran inside Docker containers on remote EC2 instances with no external visibility.

Solution

  • Implemented custom PyTorch hooks/callbacks to capture:
    • Training metrics (loss, accuracy, learning rate)
    • Dataset statistics (class distribution, sample counts)
    • Training status (epoch progress, ETA, failures)
    • Artifact metadata (checkpoint locations, model configs)

Integration

  • Exposed metrics via REST APIs consumed by frontend dashboard
  • Enabled real-time monitoring without SSH access or container inspection
  • Provided coordination interface for cross-functional teams

Standardized Training Workflow

Before

  • Each architecture required custom training scripts
  • Configuration inconsistencies across experiments
  • No version control for training parameters

After

  • Config-driven training using MMClassification
  • Version-controlled experiment definitions
  • Automated checkpoint management and artifact tracking

Measured Impact

  • ~70% reduction in setup time for new models
  • Clear audit trail for all training runs

Deployment Integration

Artifact Pipeline

  1. Training completes → checkpoint saved locally
  2. Hook uploads to S3 with metadata
  3. Deployment system polls S3 for new artifacts
  4. Automated testing on target edge hardware

Result Seamless handoff from training to deployment without manual intervention.


System Architecture Summary

Training Container (EC2) ↓ PyTorch + MMClassification + QAT ↓ Custom Hooks (metrics, artifacts) ↓ REST API Layer ↓ Frontend Dashboard + S3 Storage ↓ Deployment Pipeline


Key Trade-offs

Chose MMClassification over custom training loops

  • Pro: Faster iteration, better reproducibility
  • Con: Less flexibility for exotic architectures
  • Decision: Standardization mattered more than edge cases

Chose REST APIs over message queues for observability

  • Pro: Simpler integration with existing frontend
  • Con: No built-in retry or buffering
  • Decision: Training jobs were long-running enough that transient failures were acceptable

Completed during a 6-month internship at EdgeNeural focused on production ML systems for edge deployments.

Interested in working together?

Let's build something exceptional together.