init
This commit is contained in:
136
README.md
Normal file
136
README.md
Normal file
@@ -0,0 +1,136 @@
|
||||
# Speech-to-Text POC
|
||||
|
||||
A speech-to-text proof of concept that processes audio locally using Vosk without requiring cloud APIs. The system exposes a WebSocket API that any client can connect to for real-time speech recognition.
|
||||
|
||||
## Features
|
||||
|
||||
- **Local Processing**: Uses Vosk for offline speech recognition
|
||||
- **WebSocket API**: Server exposes `ws://localhost:3000` for any client to connect
|
||||
- **Web Interface**: Browser-based demo for testing
|
||||
- **Docker Support**: Complete containerized solution
|
||||
- **No Cloud Dependencies**: Everything runs locally
|
||||
|
||||
## Quick Start
|
||||
|
||||
1. **Download Vosk model:**
|
||||
```bash
|
||||
curl -L -o vosk-model-small-en-us-0.15.zip https://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip
|
||||
unzip vosk-model-small-en-us-0.15.zip
|
||||
mv vosk-model-small-en-us-0.15 vosk-model
|
||||
```
|
||||
|
||||
2. **Start with Docker:**
|
||||
```bash
|
||||
docker-compose up --build
|
||||
```
|
||||
|
||||
3. **Test the web interface:**
|
||||
- Open `http://localhost:3000` in your browser
|
||||
- Click "Start Recording" and speak
|
||||
- See transcriptions appear in real-time
|
||||
|
||||
## WebSocket API Usage
|
||||
|
||||
The server exposes a WebSocket endpoint at `ws://localhost:3000` that accepts:
|
||||
|
||||
- **Input**: Raw WAV audio data (16kHz, 16-bit, mono)
|
||||
- **Output**: JSON messages with transcriptions
|
||||
|
||||
### Example Client Usage
|
||||
|
||||
```javascript
|
||||
const WebSocket = require('ws');
|
||||
const fs = require('fs');
|
||||
|
||||
const ws = new WebSocket('ws://localhost:3000');
|
||||
|
||||
ws.on('open', () => {
|
||||
// Send WAV audio file
|
||||
const audioData = fs.readFileSync('audio.wav');
|
||||
ws.send(audioData);
|
||||
});
|
||||
|
||||
ws.on('message', (data) => {
|
||||
const message = JSON.parse(data);
|
||||
if (message.type === 'transcription') {
|
||||
console.log('Text:', message.text);
|
||||
}
|
||||
});
|
||||
```
|
||||
|
||||
See `client-example.js` for a complete Node.js client implementation.
|
||||
|
||||
## Local Development Setup
|
||||
|
||||
### Prerequisites
|
||||
- Node.js 14+
|
||||
- Python 3.8+
|
||||
- Vosk model (downloaded as above)
|
||||
|
||||
### Installation
|
||||
|
||||
1. **Install Node.js dependencies:**
|
||||
```bash
|
||||
yarn install
|
||||
```
|
||||
|
||||
2. **Install Python dependencies:**
|
||||
```bash
|
||||
python3 -m venv venv
|
||||
source venv/bin/activate # On Windows: venv\Scripts\activate
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
3. **Start the server:**
|
||||
```bash
|
||||
yarn start
|
||||
```
|
||||
|
||||
## Architecture
|
||||
|
||||
- **Backend**: Node.js Express server with WebSocket support
|
||||
- **Speech Processing**: Python subprocess using Vosk library
|
||||
- **Frontend**: HTML5 + JavaScript with AudioWorklet for microphone capture
|
||||
- **Communication**: WebSocket for bidirectional real-time communication
|
||||
|
||||
## Supported Audio Formats
|
||||
|
||||
- **Input**: WAV files (16kHz, 16-bit, mono preferred)
|
||||
- **Browser**: Automatic conversion from microphone input
|
||||
- **API**: Raw audio buffers or WAV format
|
||||
|
||||
## Performance Notes
|
||||
|
||||
- **Model Size**: Small model (~39MB) for fast loading
|
||||
- **Latency**: Near real-time processing depending on audio chunk size
|
||||
- **Accuracy**: Good for clear speech, may vary with background noise
|
||||
- **Resource Usage**: Lightweight, suitable for local deployment
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
1. **Model not found**: Ensure Vosk model is extracted to `./vosk-model/` directory
|
||||
2. **Python errors**: Check that virtual environment is activated and dependencies installed
|
||||
3. **WebSocket connection fails**: Verify server is running on port 3000
|
||||
4. **No audio**: Check browser microphone permissions
|
||||
|
||||
### Docker Issues
|
||||
|
||||
- **Build failures**: Ensure you have enough disk space for the image
|
||||
- **Model mounting**: Verify `./vosk-model/` exists before running docker-compose
|
||||
- **Permission errors**: Check file permissions on the vosk-model directory
|
||||
|
||||
## Development
|
||||
|
||||
- **Server logs**: `docker-compose logs -f` to see real-time logs
|
||||
- **Rebuild**: `docker-compose up --build` after code changes
|
||||
- **Stop**: `docker-compose down` to stop all services
|
||||
|
||||
## Model Information
|
||||
|
||||
- **Current**: Vosk Small English US (0.15)
|
||||
- **Size**: ~39MB
|
||||
- **Languages**: English (US)
|
||||
- **Accuracy**: Optimized for speed over accuracy
|
||||
- **Alternatives**: See [Vosk Models](https://alphacephei.com/vosk/models) for other languages/sizes
|
||||
Reference in New Issue
Block a user