Files
2026-02-05 19:23:03 +05:30
..
2026-02-05 19:23:03 +05:30
2026-02-05 19:23:03 +05:30
2026-02-05 19:23:03 +05:30
2026-02-05 19:23:03 +05:30

XML Sitemap Generator API

A high-performance Go-based API for generating XML sitemaps with real-time progress tracking via Server-Sent Events (SSE).

Features

  • Concurrent Web Crawling - Fast sitemap generation using goroutines
  • Real-time Progress - SSE streaming for live updates
  • Multi-user Support - Handle multiple simultaneous crawls
  • Client Metadata Tracking - IP, browser, OS, session data stored in SQLite
  • Clean REST API - Simple endpoints for generate, stream, and download
  • Professional UI - Beautiful web interface included

Architecture

sitemap-api/
├── main.go              # Entry point & HTTP server
├── handlers/
│   └── handler.go       # HTTP handlers & SSE streaming
├── crawler/
│   └── crawler.go       # Concurrent web crawler
├── database/
│   └── db.go            # SQLite operations
├── models/
│   └── site.go          # Data structures
└── static/
    └── index.html       # Frontend UI

API Endpoints

POST /generate-sitemap-xml

Start sitemap generation (backend generates UUID)

Request:

{
  "url": "https://example.com",
  "max_depth": 3
}

Response:

{
  "uuid": "550e8400-e29b-41d4-a716-446655440000",
  "site_id": 123,
  "status": "processing",
  "stream_url": "/stream/550e8400-...",
  "message": "Sitemap generation started"
}

GET /stream/{uuid}

Server-Sent Events stream for real-time progress

Events: connected, started, progress, complete, error

GET /download/{uuid}

Download generated sitemap XML

GET /sites

List all generated sitemaps

GET /sites/{id}

Get specific site details

DELETE /sites/{id}

Delete a sitemap

GET /health

Health check endpoint

Installation

Prerequisites

  • Go 1.21+
  • SQLite3

Setup

# Clone/navigate to directory
cd sitemap-api

# Install dependencies
go mod download

# Build
go build -o sitemap-api

# Run
./sitemap-api

Server starts on http://localhost:8080

Or run directly:

go run main.go

Usage

  1. Open http://localhost:8080 in your browser
  2. Enter a website URL
  3. Set crawl depth (1-5)
  4. Click "Generate Sitemap"
  5. Watch real-time progress
  6. Download XML when complete

Database Schema

SQLite database (sitemap.db) stores:

  • sites - Crawl sessions with client metadata
  • pages - Discovered URLs with priority/frequency
  • sessions - User session tracking

Environment Variables

  • PORT - Server port (default: 8080)

Example:

PORT=3000 ./sitemap-api

How It Works

  1. Frontend sends POST to /generate-sitemap-xml
  2. Backend generates UUID, saves metadata, returns UUID
  3. Frontend connects to /stream/{uuid} for SSE updates
  4. Crawler runs in goroutine, sends events via channel
  5. Handler streams events to frontend in real-time
  6. On completion, sitemap available at /download/{uuid}

Multi-User Concurrency

The StreamManager handles concurrent users:

  • Each UUID maps to a Go channel
  • Concurrent map with mutex for thread safety
  • Automatic cleanup after crawl completion
  • Supports unlimited simultaneous crawls

Client Metadata Captured

  • IP Address (with X-Forwarded-For support)
  • User-Agent
  • Browser name & version
  • Operating System
  • Device Type (Desktop/Mobile/Tablet)
  • Session ID (cookie-based)
  • All cookies (JSON)
  • Referrer

Performance

  • Concurrent crawling with goroutines
  • Configurable concurrency limit (default: 5 parallel requests)
  • Depth-limited to prevent infinite crawls
  • Same-domain restriction
  • Duplicate URL prevention
  • 10-second HTTP timeout per request

Customization

Adjust Concurrency

Edit crawler/crawler.go:

semaphore := make(chan struct{}, 10) // Increase to 10 concurrent

Change Priority Calculation

Modify calculatePriority() in crawler/crawler.go

Add Custom Metadata

Extend models.Site struct and database schema

Production Deployment

Recommendations:

  1. Use reverse proxy (nginx/caddy)
  2. Enable HTTPS
  3. Add rate limiting
  4. Configure CORS properly
  5. Use PostgreSQL for production (replace SQLite)
  6. Add authentication
  7. Implement cleanup jobs for old sitemaps

Example nginx config:

location / {
    proxy_pass http://localhost:8080;
    proxy_http_version 1.1;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection 'upgrade';
    proxy_set_header Host $host;
    proxy_cache_bypass $http_upgrade;
    
    # SSE support
    proxy_buffering off;
    proxy_cache off;
}

License

MIT

Support

For issues or questions, please open a GitHub issue.