kar/sitemap-generator-xml-golang

Fork 0

Files

History

Kar@k5 b80e988191 init

2026-02-05 19:23:03 +05:30

PROJECT_OVERVIEW.md

init

2026-02-05 19:23:03 +05:30

QUICKSTART.md

init

2026-02-05 19:23:03 +05:30

README.md

init

2026-02-05 19:23:03 +05:30

STRUCTURE.md

init

2026-02-05 19:23:03 +05:30

README.md

XML Sitemap Generator API

A high-performance Go-based API for generating XML sitemaps with real-time progress tracking via Server-Sent Events (SSE).

Features

✅ Concurrent Web Crawling - Fast sitemap generation using goroutines
✅ Real-time Progress - SSE streaming for live updates
✅ Multi-user Support - Handle multiple simultaneous crawls
✅ Client Metadata Tracking - IP, browser, OS, session data stored in SQLite
✅ Clean REST API - Simple endpoints for generate, stream, and download
✅ Professional UI - Beautiful web interface included

Architecture

sitemap-api/
├── main.go              # Entry point & HTTP server
├── handlers/
│   └── handler.go       # HTTP handlers & SSE streaming
├── crawler/
│   └── crawler.go       # Concurrent web crawler
├── database/
│   └── db.go            # SQLite operations
├── models/
│   └── site.go          # Data structures
└── static/
    └── index.html       # Frontend UI

API Endpoints

`POST /generate-sitemap-xml`

Start sitemap generation (backend generates UUID)

Request:

{
  "url": "https://example.com",
  "max_depth": 3
}

Response:

{
  "uuid": "550e8400-e29b-41d4-a716-446655440000",
  "site_id": 123,
  "status": "processing",
  "stream_url": "/stream/550e8400-...",
  "message": "Sitemap generation started"
}

`GET /stream/{uuid}`

Server-Sent Events stream for real-time progress

Events: connected, started, progress, complete, error

`GET /download/{uuid}`

Download generated sitemap XML

`GET /sites`

List all generated sitemaps

`GET /sites/{id}`

Get specific site details

`DELETE /sites/{id}`

Delete a sitemap

`GET /health`

Health check endpoint

Installation

Prerequisites

Go 1.21+
SQLite3

Setup

# Clone/navigate to directory
cd sitemap-api

# Install dependencies
go mod download

# Build
go build -o sitemap-api

# Run
./sitemap-api

Server starts on http://localhost:8080

Or run directly:

go run main.go

Usage

Open http://localhost:8080 in your browser
Enter a website URL
Set crawl depth (1-5)
Click "Generate Sitemap"
Watch real-time progress
Download XML when complete

Database Schema

SQLite database (sitemap.db) stores:

sites - Crawl sessions with client metadata
pages - Discovered URLs with priority/frequency
sessions - User session tracking

Environment Variables

PORT - Server port (default: 8080)

Example:

PORT=3000 ./sitemap-api

How It Works

Frontend sends POST to /generate-sitemap-xml
Backend generates UUID, saves metadata, returns UUID
Frontend connects to /stream/{uuid} for SSE updates
Crawler runs in goroutine, sends events via channel
Handler streams events to frontend in real-time
On completion, sitemap available at /download/{uuid}

Multi-User Concurrency

The StreamManager handles concurrent users:

Each UUID maps to a Go channel
Concurrent map with mutex for thread safety
Automatic cleanup after crawl completion
Supports unlimited simultaneous crawls

Client Metadata Captured

IP Address (with X-Forwarded-For support)
User-Agent
Browser name & version
Operating System
Device Type (Desktop/Mobile/Tablet)
Session ID (cookie-based)
All cookies (JSON)
Referrer

Performance

Concurrent crawling with goroutines
Configurable concurrency limit (default: 5 parallel requests)
Depth-limited to prevent infinite crawls
Same-domain restriction
Duplicate URL prevention
10-second HTTP timeout per request

Customization

Adjust Concurrency

Edit crawler/crawler.go:

semaphore := make(chan struct{}, 10) // Increase to 10 concurrent

Change Priority Calculation

Modify calculatePriority() in crawler/crawler.go

Add Custom Metadata

Extend models.Site struct and database schema

Production Deployment

Recommendations:

Use reverse proxy (nginx/caddy)
Enable HTTPS
Add rate limiting
Configure CORS properly
Use PostgreSQL for production (replace SQLite)
Add authentication
Implement cleanup jobs for old sitemaps

Example nginx config:

location / {
    proxy_pass http://localhost:8080;
    proxy_http_version 1.1;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection 'upgrade';
    proxy_set_header Host $host;
    proxy_cache_bypass $http_upgrade;
    
    # SSE support
    proxy_buffering off;
    proxy_cache off;
}

License

MIT

Support

For issues or questions, please open a GitHub issue.