microsoft/SpeechT5 issues and pull requests

#96 - WavLLM use case or example

Issue - State: open - Opened by sfofgalaxy about 2 months ago - 1 comment

#95 - Can't Adding Indonesian.

Issue - State: open - Opened by SyamsQ 2 months ago

#94 - Can u opensource SpeechT5 base weights to huggingface ?

Issue - State: open - Opened by MonolithFoundation 2 months ago

#93 - Are there any performance optimization for inference available for this model or faster inference versions or streaming I only find this example?

Issue - State: open - Opened by lukaLLM 4 months ago

#92 - Speech conversion: process whole input without stopping

Issue - State: closed - Opened by torlarse 5 months ago - 1 comment

#91 - gaokao_audio can not be download? something error

Issue - State: open - Opened by liyunlongaaa 5 months ago - 1 comment

#90 - Why can WavLLM understand audio sounds as well?

Issue - State: open - Opened by BenoitWang 5 months ago - 1 comment

#89 - Setup Error about WavLLM

Issue - State: open - Opened by StupidDebugger 6 months ago - 4 comments

#88 - Request for Assistance with VATLM Implementation: Accessing Wave2Vec Model File

Issue - State: open - Opened by yeonju7kim 6 months ago

#87 - I found minor typo in Readme

Issue - State: open - Opened by yeonju7kim 6 months ago

#86 - Please fix the broken download link!!! So many models cann't be used without checkpoint.

Issue - State: open - Opened by world1tree 6 months ago

#85 - How to fine-tune SpeechT5 HifiGAN vocoder?

Issue - State: closed - Opened by yukiarimo 6 months ago - 2 comments

#84 - soundfile.LibsndfileError: <exception str() failed>

Issue - State: closed - Opened by ciwei6107563 7 months ago - 1 comment

#83 - Unable to Download wavLLM Due to Error

Issue - State: open - Opened by minkyu119 7 months ago - 1 comment

#82 - What languages are supported? How to specify a language?

Issue - State: open - Opened by secsilm 8 months ago

#81 - SpeechUT does not have a link for download

Issue - State: closed - Opened by world1tree 9 months ago - 4 comments

#80 - What's the model_path and data_name on inference code?

Issue - State: open - Opened by YepJin 9 months ago - 3 comments

#79 - Confusion/Question about SpeechT5SpeechDecoderPostnet output

Issue - State: open - Opened by Student204161 10 months ago

#78 - Error in loading WavLLM model

Issue - State: open - Opened by rishabh004-ai 10 months ago - 9 comments

#77 - Single Task Training

Issue - State: closed - Opened by yangjiabupt 10 months ago - 1 comment

#76 - WavLLM checkpoint

Issue - State: open - Opened by ming024 10 months ago - 5 comments

#75 - ASR fine-tuning loss goes to zero after several epochs

Issue - State: closed - Opened by yunigma 10 months ago - 2 comments

#74 - extract transorformer layer feature

Issue - State: open - Opened by zbpjlc 12 months ago - 2 comments

#73 - Does the pre-trained model for hidden unit tokenizer use speaker embeddings?

Issue - State: open - Opened by Kodhandarama 12 months ago

#72 - What is the time taken to converge for the hidden unit tokenizer?

Issue - State: open - Opened by Kodhandarama about 1 year ago

#71 - Link to train_960.tsv is broken

Issue - State: open - Opened by Kodhandarama about 1 year ago

#70 - "SpeechT5" on Android OS

Issue - State: open - Opened by taeyeonlee about 1 year ago

#69 - British English TTS model

Issue - State: closed - Opened by omega3 about 1 year ago - 1 comment

#68 - Text feature extraction using SpeechLM

Issue - State: open - Opened by wonjune-kang about 1 year ago

#67 - Baseline implementation

Issue - State: open - Opened by ussenuk about 1 year ago - 1 comment

#66 - How to setting language when do S2T

Issue - State: open - Opened by nhha1602 about 1 year ago - 1 comment

#65 - 是否支持中文转语音？

Issue - State: open - Opened by xxm1668 over 1 year ago - 4 comments

#64 - The size of tensor a (674) must match the size of tensor b (600) at non-singleton dimension 1

Issue - State: open - Opened by poojitharamachandra over 1 year ago - 1 comment

#63 - SpeechT5 - TTS - Tokenizer adding `▁` token between newly added Vietnamese characters

Issue - State: closed - Opened by GinUTE over 1 year ago - 2 comments

#62 - ASR SpeechT5 training - model predicts same output for different inputs

Issue - State: open - Opened by L7uan over 1 year ago - 1 comment

#61 - Is end-to-end S2ST possible with Speecht5?

Issue - State: open - Opened by elia-ashraf over 1 year ago

#60 - Generate the N-best (top few) hypotheses

Issue - State: open - Opened by cyfer0618 over 1 year ago

#59 - Reproduce ASR experiment results in Hugging Face

Issue - State: closed - Opened by jjyaoao over 1 year ago

#58 - Voice Conversion - Error with Some Mono, 16kHz, 16bit Audio

Issue - State: open - Opened by fabiocat93 over 1 year ago - 3 comments

#57 - Getting TTS output voice close to the training data - Finetuning on different language

Issue - State: open - Opened by Srija616 over 1 year ago - 2 comments

#56 - pretrain loss

Issue - State: open - Opened by MarsMeng1994 over 1 year ago - 4 comments

#55 - Bump scipy from 1.5.4 to 1.10.0 in /VATLM/vat_hubert

Pull Request - State: open - Opened by dependabot[bot] over 1 year ago
Labels: dependencies

#54 - VATLM: Error when loading finetuned checkpoints for infer_s2s

Issue - State: open - Opened by naraysa over 1 year ago

#53 - Pretraining SpeechT5, meet problems about batch_sampler in multitask_dataset. Should I get idx and bin files of data one by one (wav) or get all of them in only two file(idx and bin each have one)

Issue - State: open - Opened by Lemonaddeee over 1 year ago

#52 - SpeechUT inference error in en_fr checkpoint

Issue - State: open - Opened by ytf-philp over 1 year ago - 2 comments

#51 - Using SpeechT5 Large for TTS

Issue - State: open - Opened by imranmaj almost 2 years ago

#50 - SpeechT5: extracting Chinese speaker embedding

Issue - State: open - Opened by QQ-777777 almost 2 years ago - 6 comments

#49 - SpeechT5-tts fine-tuned on Chinese

Issue - State: open - Opened by qlmbeck almost 2 years ago - 4 comments

#48 - add link to Hugging Face fine-tuning example

Pull Request - State: closed - Opened by hollance almost 2 years ago - 1 comment

#47 - The link for Prosody-SpeechT5 in the Readme is dead/404

Issue - State: closed - Opened by svantana almost 2 years ago - 2 comments

#46 - SpeechLM

Issue - State: closed - Opened by blueblue-bubble almost 2 years ago - 2 comments

#45 - SpeechT5：how much epoch is set

Issue - State: closed - Opened by QQ-777777 almost 2 years ago - 5 comments

#43 - how to pause between two words ?

Issue - State: open - Opened by hulk10425 almost 2 years ago - 2 comments

#42 - how to fine tune sid on pretrained model？

Issue - State: closed - Opened by haha010508 almost 2 years ago - 11 comments

#41 - hydra fine-tunning for speechT5?

Issue - State: open - Opened by ramonsanabria almost 2 years ago

#40 - [SpeechLM] About phoneme tokenizer in detail?

Issue - State: closed - Opened by yuseungwoo almost 2 years ago - 1 comment

#39 - reproduction steps for inference

Issue - State: open - Opened by ghost almost 2 years ago - 2 comments

#38 - Pretrain SpeechT5 on my own dataset

Issue - State: closed - Opened by hungker almost 2 years ago - 3 comments

#37 - Missing speecht5 task

Issue - State: closed - Opened by maximerenou almost 2 years ago - 1 comment

#36 - SpeechT5 Speech Enhancement

Issue - State: open - Opened by avramandrei almost 2 years ago - 2 comments

#35 - Fine-tunning on Hugging Face

Issue - State: open - Opened by ramonsanabria almost 2 years ago - 1 comment

#34 - SpeechUT inference and fine-tune problem

Issue - State: closed - Opened by ytf-philp about 2 years ago - 3 comments

#33 - add Hugging Face links

Pull Request - State: closed - Opened by hollance about 2 years ago - 2 comments

#32 - add SID in SpeechT5

Pull Request - State: closed - Opened by mechanicalsea about 2 years ago - 1 comment

#31 - SpeechT5: Finetuned SID model

Issue - State: closed - Opened by entn-at about 2 years ago - 2 comments

#30 - SpeechT5 pretrain

Issue - State: open - Opened by benyang0506 about 2 years ago - 5 comments

#29 - About the SpeechT5 pre-training curve

Issue - State: closed - Opened by benyang0506 about 2 years ago - 4 comments

#28 - SpeechT5 Pretrain ERROR

Issue - State: closed - Opened by benyang0506 about 2 years ago - 1 comment

#27 - Whether fp16 is enabled in VATLM during pre-training

Issue - State: closed - Opened by xiabingquan about 2 years ago - 2 comments

#26 - SpeechLM：KeyError: 'text_transformer' while initing the SpeechLMConfig

Issue - State: closed - Opened by JunZhan2000 about 2 years ago - 2 comments

#25 - VATLM: ModuleNotFoundError: No module named 'fairseq.data.audio.multi_corpus_dataset_audio'

Issue - State: closed - Opened by xiabingquan about 2 years ago - 6 comments

#24 - Same benchmark, same architecture, but the WER is differenet, why?

Issue - State: closed - Opened by splinter21 about 2 years ago - 2 comments

#23 - SpeechLM: How to train 'Phone-unit tokenizer for speech' using kaldi?

Issue - State: closed - Opened by YWMditto over 2 years ago - 7 comments

#22 - Speech2C "Inf detected in output" while training

Issue - State: closed - Opened by Sreyan88 over 2 years ago - 4 comments

#21 - Speech2C training error

Issue - State: closed - Opened by Sreyan88 over 2 years ago - 6 comments

#20 - Missing SPM and Vocabulary files

Issue - State: closed - Opened by sumanthd17 over 2 years ago - 2 comments

#19 - Port to Huggingface

Issue - State: closed - Opened by StephennFernandes over 2 years ago - 1 comment

#18 - SpeechLM: How to resample phonemes' frame rate from 30ms to 20ms?

Issue - State: closed - Opened by Arrivederci over 2 years ago - 3 comments

#17 - SpeechLM: how to prepare phoneme sequence for T2U generator

Issue - State: closed - Opened by cwang621 over 2 years ago - 5 comments

#16 - SpeechT5: How to get speaker embeddings ?

Issue - State: closed - Opened by Arrivederci over 2 years ago - 12 comments

#15 - Example values for finetuning asr

Issue - State: closed - Opened by YWMditto over 2 years ago - 18 comments

#14 - Sample Rates are different between speech pre-training dataset and tts dataset

Issue - State: closed - Opened by Maggione over 2 years ago - 1 comment

#13 - Combining speech and text in the encoder

Issue - State: closed - Opened by jacqle over 2 years ago - 1 comment

#12 - Can you provide a voice conversion finetune recipe?

Issue - State: closed - Opened by hpjang over 2 years ago - 2 comments

#11 - This repo is missing important files

Issue - State: closed - Opened by microsoft-github-policy-service[bot] over 2 years ago

#10 - Adding Microsoft SECURITY.MD

Pull Request - State: closed - Opened by microsoft-github-policy-service[bot] over 2 years ago

#9 - Text data preparation

Issue - State: closed - Opened by tskim9439 over 2 years ago - 3 comments

#8 - No code for Speech Synthesis

Issue - State: closed - Opened by petervickers over 2 years ago - 4 comments

#7 - ArgumentError in SpeechT5Task.add_args() when running fairseq-generate

Issue - State: closed - Opened by busukxuan over 2 years ago - 1 comment

#6 - Does the quantizer is used when fine-tune the pretrained backbone for the downstream task ?

Issue - State: closed - Opened by zhhao1 over 2 years ago - 2 comments

#5 - Difficulties loading pre-trained weights!

Issue - State: closed - Opened by sanchit-gandhi over 2 years ago - 2 comments

#4 - Missing text_to_speech_dataset.py in speecht5/data

Issue - State: closed - Opened by ayushtues over 2 years ago - 1 comment

#3 - How to load the pretrained models in pytorch

Issue - State: closed - Opened by ayushtues over 2 years ago - 5 comments

#2 - Are you planning to open source the configuration of the baselines and downstream tasks?

Issue - State: closed - Opened by Maggione over 2 years ago - 1 comment

#1 - how to pre-train on a custom dataset ?

Issue - State: closed - Opened by StephennFernandes over 2 years ago - 16 comments

GitHub / microsoft/SpeechT5 issues and pull requests