XML Sitemap Generator API
A high-performance Go-based API for generating XML sitemaps with real-time progress tracking via Server-Sent Events (SSE).
Features
- ✅ Concurrent Web Crawling - Fast sitemap generation using goroutines
- ✅ Real-time Progress - SSE streaming for live updates
- ✅ Multi-user Support - Handle multiple simultaneous crawls
- ✅ Client Metadata Tracking - IP, browser, OS, session data stored in SQLite
- ✅ Clean REST API - Simple endpoints for generate, stream, and download
- ✅ Professional UI - Beautiful web interface included
Architecture
sitemap-api/
├── main.go # Entry point & HTTP server
├── handlers/
│ └── handler.go # HTTP handlers & SSE streaming
├── crawler/
│ └── crawler.go # Concurrent web crawler
├── database/
│ └── db.go # SQLite operations
├── models/
│ └── site.go # Data structures
└── static/
└── index.html # Frontend UI
API Endpoints
POST /generate-sitemap-xml
Start sitemap generation (backend generates UUID)
Request:
{
"url": "https://example.com",
"max_depth": 3
}
Response:
{
"uuid": "550e8400-e29b-41d4-a716-446655440000",
"site_id": 123,
"status": "processing",
"stream_url": "/stream/550e8400-...",
"message": "Sitemap generation started"
}
GET /stream/{uuid}
Server-Sent Events stream for real-time progress
Events: connected, started, progress, complete, error
GET /download/{uuid}
Download generated sitemap XML
GET /sites
List all generated sitemaps
GET /sites/{id}
Get specific site details
DELETE /sites/{id}
Delete a sitemap
GET /health
Health check endpoint
Installation
Prerequisites
- Go 1.21+
- SQLite3
Setup
# Clone/navigate to directory
cd sitemap-api
# Install dependencies
go mod download
# Build
go build -o sitemap-api
# Run
./sitemap-api
Server starts on http://localhost:8080
Or run directly:
go run main.go
Usage
- Open http://localhost:8080 in your browser
- Enter a website URL
- Set crawl depth (1-5)
- Click "Generate Sitemap"
- Watch real-time progress
- Download XML when complete
Database Schema
SQLite database (sitemap.db) stores:
- sites - Crawl sessions with client metadata
- pages - Discovered URLs with priority/frequency
- sessions - User session tracking
Environment Variables
PORT- Server port (default: 8080)
Example:
PORT=3000 ./sitemap-api
How It Works
- Frontend sends POST to
/generate-sitemap-xml - Backend generates UUID, saves metadata, returns UUID
- Frontend connects to
/stream/{uuid}for SSE updates - Crawler runs in goroutine, sends events via channel
- Handler streams events to frontend in real-time
- On completion, sitemap available at
/download/{uuid}
Multi-User Concurrency
The StreamManager handles concurrent users:
- Each UUID maps to a Go channel
- Concurrent map with mutex for thread safety
- Automatic cleanup after crawl completion
- Supports unlimited simultaneous crawls
Client Metadata Captured
- IP Address (with X-Forwarded-For support)
- User-Agent
- Browser name & version
- Operating System
- Device Type (Desktop/Mobile/Tablet)
- Session ID (cookie-based)
- All cookies (JSON)
- Referrer
Performance
- Concurrent crawling with goroutines
- Configurable concurrency limit (default: 5 parallel requests)
- Depth-limited to prevent infinite crawls
- Same-domain restriction
- Duplicate URL prevention
- 10-second HTTP timeout per request
Customization
Adjust Concurrency
Edit crawler/crawler.go:
semaphore := make(chan struct{}, 10) // Increase to 10 concurrent
Change Priority Calculation
Modify calculatePriority() in crawler/crawler.go
Add Custom Metadata
Extend models.Site struct and database schema
Production Deployment
Recommendations:
- Use reverse proxy (nginx/caddy)
- Enable HTTPS
- Add rate limiting
- Configure CORS properly
- Use PostgreSQL for production (replace SQLite)
- Add authentication
- Implement cleanup jobs for old sitemaps
Example nginx config:
location / {
proxy_pass http://localhost:8080;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection 'upgrade';
proxy_set_header Host $host;
proxy_cache_bypass $http_upgrade;
# SSE support
proxy_buffering off;
proxy_cache off;
}
License
MIT
Support
For issues or questions, please open a GitHub issue.