# XML Sitemap Generator API A high-performance Go-based API for generating XML sitemaps with real-time progress tracking via Server-Sent Events (SSE). ## Features - ✅ **Concurrent Web Crawling** - Fast sitemap generation using goroutines - ✅ **Real-time Progress** - SSE streaming for live updates - ✅ **Multi-user Support** - Handle multiple simultaneous crawls - ✅ **Client Metadata Tracking** - IP, browser, OS, session data stored in SQLite - ✅ **Clean REST API** - Simple endpoints for generate, stream, and download - ✅ **Professional UI** - Beautiful web interface included ## Architecture ``` sitemap-api/ ├── main.go # Entry point & HTTP server ├── handlers/ │ └── handler.go # HTTP handlers & SSE streaming ├── crawler/ │ └── crawler.go # Concurrent web crawler ├── database/ │ └── db.go # SQLite operations ├── models/ │ └── site.go # Data structures └── static/ └── index.html # Frontend UI ``` ## API Endpoints ### `POST /generate-sitemap-xml` Start sitemap generation (backend generates UUID) **Request:** ```json { "url": "https://example.com", "max_depth": 3 } ``` **Response:** ```json { "uuid": "550e8400-e29b-41d4-a716-446655440000", "site_id": 123, "status": "processing", "stream_url": "/stream/550e8400-...", "message": "Sitemap generation started" } ``` ### `GET /stream/{uuid}` Server-Sent Events stream for real-time progress **Events:** `connected`, `started`, `progress`, `complete`, `error` ### `GET /download/{uuid}` Download generated sitemap XML ### `GET /sites` List all generated sitemaps ### `GET /sites/{id}` Get specific site details ### `DELETE /sites/{id}` Delete a sitemap ### `GET /health` Health check endpoint ## Installation ### Prerequisites - Go 1.21+ - SQLite3 ### Setup ```bash # Clone/navigate to directory cd sitemap-api # Install dependencies go mod download # Build go build -o sitemap-api # Run ./sitemap-api ``` Server starts on **http://localhost:8080** ### Or run directly: ```bash go run main.go ``` ## Usage 1. Open http://localhost:8080 in your browser 2. Enter a website URL 3. Set crawl depth (1-5) 4. Click "Generate Sitemap" 5. Watch real-time progress 6. Download XML when complete ## Database Schema SQLite database (`sitemap.db`) stores: - **sites** - Crawl sessions with client metadata - **pages** - Discovered URLs with priority/frequency - **sessions** - User session tracking ## Environment Variables - `PORT` - Server port (default: 8080) Example: ```bash PORT=3000 ./sitemap-api ``` ## How It Works 1. **Frontend** sends POST to `/generate-sitemap-xml` 2. **Backend** generates UUID, saves metadata, returns UUID 3. **Frontend** connects to `/stream/{uuid}` for SSE updates 4. **Crawler** runs in goroutine, sends events via channel 5. **Handler** streams events to frontend in real-time 6. **On completion**, sitemap available at `/download/{uuid}` ## Multi-User Concurrency The `StreamManager` handles concurrent users: - Each UUID maps to a Go channel - Concurrent map with mutex for thread safety - Automatic cleanup after crawl completion - Supports unlimited simultaneous crawls ## Client Metadata Captured - IP Address (with X-Forwarded-For support) - User-Agent - Browser name & version - Operating System - Device Type (Desktop/Mobile/Tablet) - Session ID (cookie-based) - All cookies (JSON) - Referrer ## Performance - Concurrent crawling with goroutines - Configurable concurrency limit (default: 5 parallel requests) - Depth-limited to prevent infinite crawls - Same-domain restriction - Duplicate URL prevention - 10-second HTTP timeout per request ## Customization ### Adjust Concurrency Edit `crawler/crawler.go`: ```go semaphore := make(chan struct{}, 10) // Increase to 10 concurrent ``` ### Change Priority Calculation Modify `calculatePriority()` in `crawler/crawler.go` ### Add Custom Metadata Extend `models.Site` struct and database schema ## Production Deployment ### Recommendations: 1. Use reverse proxy (nginx/caddy) 2. Enable HTTPS 3. Add rate limiting 4. Configure CORS properly 5. Use PostgreSQL for production (replace SQLite) 6. Add authentication 7. Implement cleanup jobs for old sitemaps ### Example nginx config: ```nginx location / { proxy_pass http://localhost:8080; proxy_http_version 1.1; proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection 'upgrade'; proxy_set_header Host $host; proxy_cache_bypass $http_upgrade; # SSE support proxy_buffering off; proxy_cache off; } ``` ## License MIT ## Support For issues or questions, please open a GitHub issue.