Skynet Report

Generative AI News

#agent #audio #code #image #language-model #research #software #speech #video

Published on

3 April 2024

Universal-1

speech

Universal-1 achieves industry-leading performance in multilingual speech-to-text, with 10% or greater accuracy improvement over the next-best system in English, Spanish, and German.

It reduces hallucination rate by 30% on speech data and 90% on ambient noise, compared to a widely used open-source model.

Universal-1 also exhibits the ability to code-switch, transcribing multiple languages within a single audio file.

Additionally, it improves word-level timestamp accuracy by 25.5% relative to a popular open-source model and enables 5x faster parallel inference.

These advancements are the result of leveraging state-of-the-art ASR research and a robust system design.

https://www.assemblyai.com/discover/research/universal-1
Published on

30 November 2023

Seamless Communication by Meta

speech

Meta has launched Seamless, a new system for preserving expression and improving real-time translation using AI.
The system includes two new models. The first is SeamlessExpressive, which preserves expression in speech-to-speech translation, and the second is SeamlessStreaming, which delivers “state-of-the-art results with around two seconds of latency”.
The models are based on the latest version of the company’s foundational model, SeamlessM4T, and are designed to improve automatic speech recognition, speech-to-speech, speech-to-text and text-to-speech capabilities.
Alongside the models, Meta is releasing metadata, data and data alignment tools to help the research community to improve on the work.

https://ai.meta.com/blog/seamless-communication/

Skynet Report

Universal-1

Seamless Communication by Meta