Mistral AI has launched Voxtral Transcribe 2, a new on-device speech-to-text model family featuring real-time transcription, ...
Too many GPUs makes you lazy,” says the French startup’s vice president of science operations, as the company carves out a ...
Pocket TTS delivers high-quality text-to-speech on standard CPUs. No GPU, no cloud APIs. It is the first local TTS with voice ...
Abstract: Although recent neural text-to-speech (TTS) systems have achieved high-quality speech synthesis, there are cases where a TTS system generates low-quality speech, mainly caused by limited ...
Small and fast: only 123M parameters. High-quality voice cloning: state-of-the-art performance in speaker similarity, intelligibility, and naturalness. Multi-lingual: support Chinese and English.
Abstract: We propose a lightweight end-to-end text-to-speech model using multi-band generation and inverse short-time Fourier transform. Our model is based on VITS, a high-quality end-to-end ...
VoiceCraft is a token infilling neural codec language model, that achieves state-of-the-art performance on both speech editing and zero-shot text-to-speech (TTS) on in-the-wild data including ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results