stt-vosk-py-node/CLAUDE.md

51 lines
1.9 KiB
Markdown

# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project Overview
This is a speech-to-text proof of concept that runs entirely locally without third-party APIs. The system captures live microphone audio from a browser, sends it to a backend server, and converts it to text using open-source libraries like Vosk.
## Architecture
The project consists of two main components:
- **Frontend**: Basic HTML page with JavaScript for microphone capture and audio streaming
- **Backend**: Server (Node.js or Python) that receives audio streams and performs speech-to-text conversion using local libraries
## Development Environment
- User runs fish terminal
- All processing must be local (no cloud services)
- System should utilize local hardware for speech recognition
## Key Implementation Requirements
- Real-time or near-real-time audio streaming from browser to backend
- Local speech-to-text processing using libraries like Vosk
- Display transcribed text on the frontend UI
- Start/stop recording functionality
- WebSocket or similar real-time communication between frontend and backend
## Development Commands
### Docker (Recommended)
- `docker-compose up --build` - Build and start the application
- `docker-compose down` - Stop the application
### Local Development
- `yarn install` - Install dependencies (yarn is configured)
- `yarn start` - Start the server
- `yarn dev` - Start with nodemon for development
## Technology Stack
- **Backend**: Node.js with Express and WebSocket server
- **Frontend**: HTML5 + JavaScript with AudioWorklet for audio capture
- **Speech Recognition**: Vosk library (Python) for local processing
- **Communication**: WebSocket for real-time audio streaming and transcription
## Setup Requirements
- Download Vosk model to `./vosk-model/` directory
- Server runs on http://localhost:3000
- WebSocket API available at `ws://localhost:3000` for external clients