stt-vosk-py-node/requirements.md

896 B

🧩 Requirement: Speech-to-Text POC (No 3rd-Party APIs)

Goal

Build a simple proof of concept (POC) that captures live microphone audio from the browser, sends it to a backend server, converts the audio to text using an open-source/local library, and displays the text on the UI.

Key Points

  • A basic index.html page to:

    • Start/stop microphone recording.
    • Stream audio to the backend.
    • Display the transcribed text in real-time or after processing.
  • A backend server (e.g., Node.js or Python) that:

    • Receives audio stream.
    • Uses a local speech-to-text library (e.g., Vosk) — no external APIs.
    • Sends back the transcribed text to the frontend.

Note

  • I am using fish terminal
  • The solution should run locally and utilize system hardware.
  • Avoid any third-party cloud services.