stt-vosk-py-node/CLAUDE.md

1.9 KiB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

This is a speech-to-text proof of concept that runs entirely locally without third-party APIs. The system captures live microphone audio from a browser, sends it to a backend server, and converts it to text using open-source libraries like Vosk.

Architecture

The project consists of two main components:

  • Frontend: Basic HTML page with JavaScript for microphone capture and audio streaming
  • Backend: Server (Node.js or Python) that receives audio streams and performs speech-to-text conversion using local libraries

Development Environment

  • User runs fish terminal
  • All processing must be local (no cloud services)
  • System should utilize local hardware for speech recognition

Key Implementation Requirements

  • Real-time or near-real-time audio streaming from browser to backend
  • Local speech-to-text processing using libraries like Vosk
  • Display transcribed text on the frontend UI
  • Start/stop recording functionality
  • WebSocket or similar real-time communication between frontend and backend

Development Commands

  • docker-compose up --build - Build and start the application
  • docker-compose down - Stop the application

Local Development

  • yarn install - Install dependencies (yarn is configured)
  • yarn start - Start the server
  • yarn dev - Start with nodemon for development

Technology Stack

  • Backend: Node.js with Express and WebSocket server
  • Frontend: HTML5 + JavaScript with AudioWorklet for audio capture
  • Speech Recognition: Vosk library (Python) for local processing
  • Communication: WebSocket for real-time audio streaming and transcription

Setup Requirements

  • Download Vosk model to ./vosk-model/ directory
  • Server runs on http://localhost:3000
  • WebSocket API available at ws://localhost:3000 for external clients