896 B
896 B
🧩 Requirement: Speech-to-Text POC (No 3rd-Party APIs)
Goal
Build a simple proof of concept (POC) that captures live microphone audio from the browser, sends it to a backend server, converts the audio to text using an open-source/local library, and displays the text on the UI.
Key Points
-
A basic
index.html
page to:- Start/stop microphone recording.
- Stream audio to the backend.
- Display the transcribed text in real-time or after processing.
-
A backend server (e.g., Node.js or Python) that:
- Receives audio stream.
- Uses a local speech-to-text library (e.g., Vosk) — no external APIs.
- Sends back the transcribed text to the frontend.
Note
- I am using fish terminal
- The solution should run locally and utilize system hardware.
- Avoid any third-party cloud services.