Architecture

Mend is a distributed media processing engine built with Go, designed for high-throughput asynchronous processing of images, videos, and audio files stored in S3-compatible storage.

System Overview

┌─────────────────────────────────────────────────────────────────┐
│                         Client Layer                             │
│  (HTTP Clients, Web Apps, Mobile Apps, CLI Tools)               │
└────────────────────────────┬────────────────────────────────────┘
                             │
                             │ HTTP/REST
                             ▼
┌─────────────────────────────────────────────────────────────────┐
│                      API Server (Gin)                            │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐          │
│  │   Routes     │  │  Handlers    │  │  Middleware  │          │
│  │  /api/v1/*   │  │  - Jobs      │  │  - Auth      │          │
│  │  /health     │  │  - Health    │  │  - Logger    │          │
│  │  /metrics    │  │  - Batch     │  │  - CORS      │          │
│  └──────────────┘  └──────────────┘  └──────────────┘          │
└────────────────────────────┬────────────────────────────────────┘
                             │
                             │ Enqueue Jobs
                             ▼
┌─────────────────────────────────────────────────────────────────┐
│                    Redis (Asynq Queue)                           │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐          │
│  │ image queue  │  │ video queue  │  │ audio queue  │          │
│  │  Priority: 5 │  │  Priority: 3 │  │  Priority: 2 │          │
│  └──────────────┘  └──────────────┘  └──────────────┘          │
└────────────────────────────┬────────────────────────────────────┘
                             │
                             │ Dequeue Jobs
                             ▼
┌─────────────────────────────────────────────────────────────────┐
│                    Worker Pool (Asynq)                           │
│  ┌────────────────────────────────────────────────────────────┐ │
│  │  Worker 1  │  Worker 2  │  Worker 3  │  ...  │  Worker N  │ │
│  └────────────────────────────────────────────────────────────┘ │
│                                                                   │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐          │
│  │   Image      │  │    Video     │  │    Audio     │          │
│  │  Processor   │  │  Processor   │  │  Processor   │          │
│  │  - Resize    │  │  - Thumbnail │  │  - Convert   │          │
│  │  - Optimize  │  │  - Extract   │  │  - Compress  │          │
│  │  - Convert   │  │  (FFmpeg)    │  │  (FFmpeg)    │          │
│  └──────────────┘  └──────────────┘  └──────────────┘          │
└────────────────────────────┬────────────────────────────────────┘
                             │
                             │ Upload/Download
                             ▼
┌─────────────────────────────────────────────────────────────────┐
│                   S3 Storage (AWS S3 / MinIO)                    │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐          │
│  │   Images     │  │    Videos    │  │    Audio     │          │
│  │   Bucket     │  │    Bucket    │  │    Bucket    │          │
│  └──────────────┘  └──────────────┘  └──────────────┘          │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│                    Observability Layer                           │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐          │
│  │  Prometheus  │  │  Structured  │  │    Health    │          │
│  │   Metrics    │  │     Logs     │  │    Checks    │          │
│  │  /metrics    │  │  (Zap/JSON)  │  │   /health    │          │
│  └──────────────┘  └──────────────┘  └──────────────┘          │
└─────────────────────────────────────────────────────────────────┘

Component Details

API Server

Gin Web Framework

Responsibilities:

Accept HTTP requests for media processing jobs
Validate request parameters
Enqueue jobs to Redis
Return job IDs to clients
Serve health checks and metrics
Provide Swagger documentation

Key Endpoints:

Job Creation

POST /api/v1/jobs/image/*
POST /api/v1/jobs/video/*
POST /api/v1/jobs/audio/*
POST /api/v1/jobs/batch

Job Management

GET /api/v1/jobs/:id
GET /api/v1/jobs/:id/stream
GET /api/v1/jobs

Monitoring

GET /health
GET /metrics
GET /api/v1/queue/stats

Scaling: Horizontal (multiple instances behind load balancer)

Redis Queue

Redis + Asynq

Responsibilities:

Store pending jobs
Manage job priorities
Handle job retries
Track job status
Provide job scheduling

Queue Configuration:

Queue	Priority	Use Case
`image`	5	Image processing (highest)
`video`	3	Video processing (medium)
`audio`	2	Audio processing (low)
`default`	1	Other tasks (lowest)

Features:

Automatic retries (max 3 attempts)
Job timeout (30 minutes default)
Dead letter queue for failed jobs
Job deduplication
Scheduled/delayed jobs

Worker Pool

Asynq Workers + Goroutines

Responsibilities:

Dequeue jobs from Redis
Download media from S3
Process media (resize, convert, etc.)
Upload results to S3
Update job status
Clean up temporary files
Send webhook notifications

Concurrency: Configurable (default: 10 workers per instance)

Processing Flow:

1. Dequeue job from Redis
2. Create temporary directory
3. Download source file from S3
4. Process file (image/video/audio)
5. Upload result to S3
6. Update job status
7. Send webhooks (if configured)
8. Clean up temp files

Worker Types:

Technology: Go image libraries + ImageMagick

Operations:

Resize and crop
Format conversion (JPEG, PNG, WebP, AVIF)
Quality optimization
Watermarking
Color extraction
Metadata manipulation

S3 Storage

AWS S3 / MinIO / R2

Responsibilities:

Store source media files
Store processed output files
Provide presigned URLs for access
Handle large file uploads/downloads

Supported Providers:

AWS S3 - Production-grade cloud storage
MinIO - Self-hosted S3-compatible storage
Cloudflare R2 - Cost-effective alternative
DigitalOcean Spaces - Simple cloud storage
Any S3-compatible service

Configuration:

s3:
  region: us-east-1
  access_key_id: "${AWS_ACCESS_KEY_ID}"
  secret_access_key: "${AWS_SECRET_ACCESS_KEY}"
  # Optional: for non-AWS providers
  endpoint: "https://s3.example.com"
  use_path_style: true

Design Patterns

Async Processing

Benefits:

Non-blocking API responses
Better resource utilization
Horizontal scalability
Fault tolerance

Flow:

Client → API (202 Accepted + Job ID)
         ↓
       Redis Queue
         ↓
       Worker (processes job)
         ↓
       Update status in Redis
         ↓
       Send webhook notification

Job Lifecycle

1. Pending

Job is queued, waiting for a worker

Duration: Seconds to minutes

2. Processing

Worker is actively processing

Duration: Seconds to minutes

3. Completed

Job finished successfully

Result: Output file in S3

4. Failed

Job encountered an error

Retry: Up to 3 attempts

Error Handling

Retry Strategy:

Attempt 1: Immediate
Attempt 2: After 30 seconds
Attempt 3: After 5 minutes

Failure Scenarios:

Source file not found → No retry
S3 access denied → No retry
Processing error → Retry with backoff
Temporary network issue → Retry
Worker crash → Job requeued automatically

Observability

Structured Logging:

{
  "level": "info",
  "timestamp": "2025-10-18T22:00:00Z",
  "job_id": "550e8400",
  "type": "image_resize",
  "duration_ms": 1234,
  "message": "Job completed successfully"
}

Metrics:

Job throughput (jobs/second)
Processing duration (histograms)
Queue depth (gauge)
Success/failure rates (counters)
Worker utilization (gauge)

Health Checks:

Liveness: Is the service running?
Readiness: Can it accept traffic?
Dependency checks: Redis, S3 connectivity

Scalability

Horizontal Scaling

API Servers

Stateless design allows unlimited API instances

Add more instances behind load balancer
No shared state between instances
Session-less architecture

Workers

Independent workers process jobs in parallel

Scale workers based on queue depth
Each worker handles multiple jobs concurrently
No coordination required between workers

Redis

Redis clustering for high availability

Master-replica replication
Sentinel for automatic failover
Redis Cluster for sharding

Performance Characteristics

Component	Throughput	Latency	Bottleneck
API	10k+ req/s	< 10ms	Network
Redis	100k+ ops/s	< 1ms	Memory
Workers	Varies	1-60s	CPU/FFmpeg
S3	Unlimited	50-200ms	Network

Capacity Planning

Small Deployment (< 1000 jobs/day):

1 API instance
2-3 workers
Single Redis instance
Standard S3

Medium Deployment (1k-100k jobs/day):

2-3 API instances
5-10 workers
Redis with replication
S3 with Transfer Acceleration

Large Deployment (> 100k jobs/day):

5+ API instances (auto-scaled)
20+ workers (auto-scaled)
Redis Cluster
S3 + CloudFront CDN

Security

Authentication

API Key

API keys stored as environment variables
HMAC-SHA256 for webhook signatures
Optional: JWT tokens for user sessions

Authorization

Role-based access control (planned)
Tenant isolation via API keys
S3 bucket policies

Data Protection

TLS/HTTPS for all API traffic
Encrypted S3 buckets (SSE-S3 or SSE-KMS)
Temporary files cleaned up immediately
No persistent storage of media on workers

Technology Stack

Backend

Go 1.21+

High performance
Excellent concurrency
Strong typing
Fast compilation

Web Framework

Gin

Fast HTTP router
Middleware support
JSON validation
Great documentation

Queue

Asynq

Redis-backed
Reliable delivery
Priority queues
Scheduled jobs

Media Processing

FFmpeg

Industry standard
Format support
Hardware acceleration
Battle-tested

Future Enhancements

Next Steps

Deploy

Follow the Deployment Guide

Monitor

Set up Metrics & Monitoring

Optimize

Learn about performance tuning