1.9 KiB
1.9 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project Overview
This is a speech-to-text proof of concept that runs entirely locally without third-party APIs. The system captures live microphone audio from a browser, sends it to a backend server, and converts it to text using open-source libraries like Vosk.
Architecture
The project consists of two main components:
- Frontend: Basic HTML page with JavaScript for microphone capture and audio streaming
- Backend: Server (Node.js or Python) that receives audio streams and performs speech-to-text conversion using local libraries
Development Environment
- User runs fish terminal
- All processing must be local (no cloud services)
- System should utilize local hardware for speech recognition
Key Implementation Requirements
- Real-time or near-real-time audio streaming from browser to backend
- Local speech-to-text processing using libraries like Vosk
- Display transcribed text on the frontend UI
- Start/stop recording functionality
- WebSocket or similar real-time communication between frontend and backend
Development Commands
Docker (Recommended)
docker-compose up --build
- Build and start the applicationdocker-compose down
- Stop the application
Local Development
yarn install
- Install dependencies (yarn is configured)yarn start
- Start the serveryarn dev
- Start with nodemon for development
Technology Stack
- Backend: Node.js with Express and WebSocket server
- Frontend: HTML5 + JavaScript with AudioWorklet for audio capture
- Speech Recognition: Vosk library (Python) for local processing
- Communication: WebSocket for real-time audio streaming and transcription
Setup Requirements
- Download Vosk model to
./vosk-model/
directory - Server runs on http://localhost:3000
- WebSocket API available at
ws://localhost:3000
for external clients