M4v3R 3 days ago

We were investigating speech-to-speech for a project before and estimated that creating an end to end solution with the previous method would take us weeks at best for the MVP (because the pipeline was basically: speech -> whisper STT model -> text, retrieval, API calls, etc. -> prompt -> LLM -> text -> TTS model -> speech). If this works as advertised it could cut the amount of work required quite significantly, excited to try it out (when it’s available in Europe that is…).

  • joshstrange 2 days ago

    It’s not for a production-type thing but Home Assistant has this pipeline built in and you can swap out any of the 3 steps:

    * STT

    * LLM

    * TTS

    It’s pretty cool to be able to replace one of the parts, do some tests, then change another part.

    Again, it’s nothing you would use directly for a product but it’s fairly easy to test your pipeline by plugging into different aspects. (Also HA provides each component out of the box if you want them to handle STT/TTS and just test your LLM).

    • BrutalCoding 2 days ago

      Add VAD to this list and it’s basically the same stack that I am running on mobile phones (on-device). It doesn’t beat OpenAI’s voice chat in terms of speed and intelligence, but it’s funny.

      The LLM part isn’t great ofc due to the small size. Still experimenting with different models/tweaks until I’m satisfied enough with the total outcome on a recent’ish iPhone/Pixel.

  • michaelanckaert 2 days ago

    For what it's worth, I created an MVP solution using that pipeline that took about 3 days. I used the Azure AI Speech service and SDK. Worked pretty good despite the obvious long pipeline you described.

serf 2 days ago

it's frustrating that things like this get released from oAI but one still cannot use voice on the web-app, nor any of the advanced voice model stuff, without essentially emulating a phone.

it's hard to know who oAI is working for -- is it a developer resource group or an actual customer-facing business? it feels like they don't know, either.