Go to file
Kar c0c3c7405d Update public/app.js 2025-06-05 12:39:02 +00:00
.claude init 2025-06-05 14:31:34 +05:30
public Update public/app.js 2025-06-05 12:39:02 +00:00
.dockerignore init 2025-06-05 14:31:34 +05:30
.gitignore init 2025-06-05 14:31:34 +05:30
.yarnrc init 2025-06-05 14:31:34 +05:30
CLAUDE.md init 2025-06-05 14:31:34 +05:30
Dockerfile Update Dockerfile 2025-06-05 09:13:28 +00:00
README.md init 2025-06-05 14:31:34 +05:30
client-example.js init 2025-06-05 14:31:34 +05:30
docker-compose.yml 5080 2025-06-05 09:14:02 +00:00
package.json init 2025-06-05 14:31:34 +05:30
requirements.md init 2025-06-05 14:31:34 +05:30
requirements.txt init 2025-06-05 14:31:34 +05:30
server.js Update server.js 2025-06-05 09:12:29 +00:00
speech_processor.py init 2025-06-05 14:31:34 +05:30

README.md

Speech-to-Text POC

A speech-to-text proof of concept that processes audio locally using Vosk without requiring cloud APIs. The system exposes a WebSocket API that any client can connect to for real-time speech recognition.

Features

  • Local Processing: Uses Vosk for offline speech recognition
  • WebSocket API: Server exposes ws://localhost:3000 for any client to connect
  • Web Interface: Browser-based demo for testing
  • Docker Support: Complete containerized solution
  • No Cloud Dependencies: Everything runs locally

Quick Start

  1. Download Vosk model:

    curl -L -o vosk-model-small-en-us-0.15.zip https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip
    unzip vosk-model-small-en-us-0.15.zip
    mv vosk-model-small-en-us-0.15 vosk-model
    
  2. Start with Docker:

    docker-compose up --build
    
  3. Test the web interface:

    • Open http://localhost:3000 in your browser
    • Click "Start Recording" and speak
    • See transcriptions appear in real-time

WebSocket API Usage

The server exposes a WebSocket endpoint at ws://localhost:3000 that accepts:

  • Input: Raw WAV audio data (16kHz, 16-bit, mono)
  • Output: JSON messages with transcriptions

Example Client Usage

const WebSocket = require('ws');
const fs = require('fs');

const ws = new WebSocket('ws://localhost:3000');

ws.on('open', () => {
    // Send WAV audio file
    const audioData = fs.readFileSync('audio.wav');
    ws.send(audioData);
});

ws.on('message', (data) => {
    const message = JSON.parse(data);
    if (message.type === 'transcription') {
        console.log('Text:', message.text);
    }
});

See client-example.js for a complete Node.js client implementation.

Local Development Setup

Prerequisites

  • Node.js 14+
  • Python 3.8+
  • Vosk model (downloaded as above)

Installation

  1. Install Node.js dependencies:

    yarn install
    
  2. Install Python dependencies:

    python3 -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
    pip install -r requirements.txt
    
  3. Start the server:

    yarn start
    

Architecture

  • Backend: Node.js Express server with WebSocket support
  • Speech Processing: Python subprocess using Vosk library
  • Frontend: HTML5 + JavaScript with AudioWorklet for microphone capture
  • Communication: WebSocket for bidirectional real-time communication

Supported Audio Formats

  • Input: WAV files (16kHz, 16-bit, mono preferred)
  • Browser: Automatic conversion from microphone input
  • API: Raw audio buffers or WAV format

Performance Notes

  • Model Size: Small model (~39MB) for fast loading
  • Latency: Near real-time processing depending on audio chunk size
  • Accuracy: Good for clear speech, may vary with background noise
  • Resource Usage: Lightweight, suitable for local deployment

Troubleshooting

Common Issues

  1. Model not found: Ensure Vosk model is extracted to ./vosk-model/ directory
  2. Python errors: Check that virtual environment is activated and dependencies installed
  3. WebSocket connection fails: Verify server is running on port 3000
  4. No audio: Check browser microphone permissions

Docker Issues

  • Build failures: Ensure you have enough disk space for the image
  • Model mounting: Verify ./vosk-model/ exists before running docker-compose
  • Permission errors: Check file permissions on the vosk-model directory

Development

  • Server logs: docker-compose logs -f to see real-time logs
  • Rebuild: docker-compose up --build after code changes
  • Stop: docker-compose down to stop all services

Model Information

  • Current: Vosk Small English US (0.15)
  • Size: ~39MB
  • Languages: English (US)
  • Accuracy: Optimized for speed over accuracy
  • Alternatives: See Vosk Models for other languages/sizes