| | |
| | |
Stat |
Members: 3645 Articles: 2'506'133 Articles rated: 2609
27 April 2024 |
|
| | | |
|
Article overview
| |
|
Streaming Parrotron for on-device speech-to-speech conversion | Oleg Rybakov
; Fadi Biadsy
; Xia Zhang
; Liyang Jiang
; Phoenix Meadowlark
; Shivani Agrawal
; | Date: |
25 Oct 2022 | Abstract: | We present a fully on-device and streaming Speech-To-Speech (STS) conversion
model that normalizes a given input speech directly to synthesized output
speech (a.k.a. Parrotron). Deploying such an end-to-end model locally on mobile
devices pose significant challenges in terms of memory footprint and
computation requirements. In this paper, we present a streaming-based approach
to produce an acceptable delay, with minimal loss in speech conversion quality,
when compared to a non-streaming server-based approach. Our approach consists
of first streaming the encoder in real time while the speaker is speaking.
Then, as soon as the speaker stops speaking, we run the spectrogram decoder in
streaming mode along the side of a streaming vocoder to generate output speech
in real time. To achieve an acceptable delay quality trade-off, we study a
novel hybrid approach for look-ahead in the encoder which combines a look-ahead
feature stacker with a look-ahead self-attention. We also compare the model
with int4 quantization aware training and int8 post training quantization and
show that our streaming approach is 2x faster than real time on the Pixel4 CPU. | Source: | arXiv, 2210.13761 | Services: | Forum | Review | PDF | Favorites |
|
|
No review found.
Did you like this article?
Note: answers to reviews or questions about the article must be posted in the forum section.
Authors are not allowed to review their own article. They can use the forum section.
browser Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)
|
| |
|
|
|
| News, job offers and information for researchers and scientists:
| |