PKU-YuanGroup/MoE-LLaVA issues and pull requests

#92 - [Question] 论文table 7的 non-MoE LLaVA-phi的train scripts和eval scripts

Issue - State: open - Opened by sharkdrop 5 days ago

#92 - [Question] 论文table 7的 non-MoE LLaVA-phi的train scripts和eval scripts

Issue - State: open - Opened by sharkdrop 5 days ago

#91 - [Question] Is there any Moe checkpoint of Qwen1.5 or Qwen2 released?

Issue - State: open - Opened by double-fire-0 24 days ago

#90 - [Question] How to eval textqa

Issue - State: open - Opened by fanminshi about 1 month ago - 1 comment

#90 - [Question] How to eval textqa

Issue - State: open - Opened by fanminshi about 1 month ago - 1 comment

#89 - [Question] Step 3 loss curve

Issue - State: open - Opened by fanminshi about 1 month ago

#89 - [Question] Step 3 loss curve

Issue - State: open - Opened by fanminshi about 1 month ago

#88 - [Question] Question about the tokenizer of required pretrained model stabilityai/stablelm-2-1_6

Issue - State: open - Opened by Taylorfire about 2 months ago - 1 comment

#88 - [Question] Question about the tokenizer of required pretrained model stabilityai/stablelm-2-1_6

Issue - State: open - Opened by Taylorfire about 2 months ago - 1 comment

#87 - [Question] In paper Table 6, why variant (d) is better than variant (c)?

Issue - State: open - Opened by pkumc about 2 months ago

#87 - [Question] In paper Table 6, why variant (d) is better than variant (c)?

Issue - State: open - Opened by pkumc about 2 months ago

#86 - [Feature request] 是否會訓練更進一步的模型

Issue - State: open - Opened by gesen2egee 2 months ago

#85 - Training of Stage 3 , 第三阶段训练，代码中实际的训练参数与论文不符

Issue - State: open - Opened by Wuyingwen 2 months ago - 1 comment

#84 - [Question] What exactly does the language model mean?

Issue - State: open - Opened by dana-niu 3 months ago

#83 - [Discussion] What is the expert relationship between different layers with the same index? If not, what is the role of figures 4, 5 and 6 in the paper?

Issue - State: open - Opened by meteorlium 4 months ago

#82 - [Question] ValueError: Unknown image tower: /hy-tmp/LLaVA/clip-vit-large-patch14-336

Issue - State: open - Opened by FanshuoZeng 4 months ago - 5 comments

#81 - Can the confidence coefficient of an answer be obtained?

Issue - State: open - Opened by IsabelJimenez99 4 months ago

#80 - [Question] Inconsistency on MoE Layer Number in paper and model config

Issue - State: open - Opened by QAQdev 4 months ago

#79 - [Usage] ADD windows support for more exposure

Issue - State: open - Opened by mr-lab 4 months ago

#78 - can you please give me python script to use API with your demo ?

Issue - State: open - Opened by gamesubzero 5 months ago

#77 - Moe finetuning error

Issue - State: open - Opened by sahilqure 5 months ago

#76 - [Question] 多图collate_fn

Issue - State: open - Opened by PangziZhang523 5 months ago

#75 - Minor fix and tips update for README

Pull Request - State: closed - Opened by QAQdev 5 months ago

#74 - [Question] 能解释一下llava_arch中的class LlavaQWenMetaForCausalLM(LlavaMetaForCausalLM)这个类吗

Issue - State: open - Opened by 20191864218 5 months ago

#73 - [Question] Pretrain step

Issue - State: closed - Opened by rlagustmd82 5 months ago

#72 - [Question] CUDA OOM when finetune phi2-clipL336 at stage 2 with 8-A100-40G

Issue - State: closed - Opened by terry-for-github 5 months ago - 1 comment

#71 - [Feature request] Support Llama3

Issue - State: open - Opened by xiweideng 5 months ago

#70 - [Question] About parameter ep_size

Issue - State: open - Opened by puppy2000 5 months ago

#69 - [Usage] tokenizer.pad_token_id == None？

Issue - State: open - Opened by sjtu-cz 5 months ago - 1 comment

#68 - [Question] The error that occurred while running cli.py for inference, using Qwen-7B-base as the LLM.

Issue - State: closed - Opened by 20191864218 6 months ago - 1 comment

#67 - [Question] 论文参数讨论

Issue - State: open - Opened by bufanx 6 months ago - 1 comment

#66 - [Question] 关于第三阶段训练loss

Issue - State: open - Opened by rangmiao 6 months ago

#65 - [Usage] Deepspeed MoE hangs when EP_SIZE > 1

Issue - State: closed - Opened by Wadaxiwan 6 months ago - 1 comment

#64 - DeepSpeed MoE 问题

Issue - State: open - Opened by BlackBearBiscuit 6 months ago

#63 - RuntimeError: mat1 and mat2 must have the same dtype

Issue - State: open - Opened by Crystalxd 6 months ago

#62 - 如何使用自己的数据集微调MoE-LLaVA

Issue - State: open - Opened by Tunanzzz 6 months ago - 4 comments

#61 - [Question]Can't find the "mm_projecotr.bin" in the model_path

Issue - State: closed - Opened by sdlyzhq 6 months ago

#60 - [Question] The evaluation results vary every time.

Issue - State: open - Opened by koda-11 6 months ago

#59 - [Question] The evaluation results vary every time.

Issue - State: closed - Opened by koda-11 6 months ago

#58 - [Question] Adding to the dataset.

Issue - State: open - Opened by arthurwolf 7 months ago

#57 - [Question] 第二阶段微调的模型会开源吗？

Issue - State: open - Opened by murray-z 7 months ago - 1 comment

#56 - [Question] 如何基于MoE模型，在自己的数据上进一步微调呢？

Issue - State: open - Opened by murray-z 7 months ago - 3 comments

#55 - [Question] how to visualize routing distribution ?

Issue - State: closed - Opened by koda-11 7 months ago - 1 comment

#54 - [Question] Model and Dataset Size

Issue - State: open - Opened by adrielkuek 7 months ago

#53 - [Question] How did u using 768x768 resolution?

Issue - State: open - Opened by lucasjinreal 7 months ago

#52 - [Question] How to finetune the moe-llava model on customized data?

Issue - State: open - Opened by RayshenSL 7 months ago

#51 - [Usage] ValueError: Unknown image tower: /data1/ljq/Moellava/MoE-LLaVA-Qwen-1.8B-4e/clip-vit-large-patch14-336

Issue - State: open - Opened by xiangchihuoguo 7 months ago - 3 comments

#50 - [Question] Scale down futher to support IOT usecases?

Issue - State: open - Opened by kinchahoy 7 months ago - 1 comment

#49 - [Question] About nlp_tune data.

Issue - State: open - Opened by Lucky-Lance 7 months ago - 2 comments

#48 - Error during training on custom dataset

Issue - State: open - Opened by saeedkhaki92 7 months ago - 1 comment

#47 - 推理效率对比问题

Issue - State: open - Opened by aprilehannibal 7 months ago - 1 comment

#46 - [Discussion] How to improve model's understanding of high-resolution images？

Issue - State: open - Opened by whalefa1I 7 months ago - 1 comment

#45 - [Question] how to check activate parameters of MoE models?

Issue - State: closed - Opened by koda-11 7 months ago - 2 comments

#44 - > Hi, everyone. Sorry for that, we updated the new runing command to fix it. Checking out [here](https://github.com/PKU-YuanGroup/MoE-LLaVA/blob/main/scripts/v1/qwen/finetune_moe.sh)

Issue - State: closed - Opened by hxhcreate 7 months ago - 2 comments

#43 - [Question] Image patch representation in this work

Issue - State: closed - Opened by cydiachen 7 months ago - 1 comment

#42 - 用LLava官方脚本替换Qwen2，用mpt的template训练 loss 0

Issue - State: open - Opened by lucasjinreal 7 months ago - 26 comments

#41 - [Usage] The training always stuck after formatting inputs

Issue - State: closed - Opened by detectRecog 7 months ago - 8 comments

#40 - Inference without Deepspeed

Issue - State: open - Opened by aaronnat23 7 months ago - 1 comment

#39 - [Discussion] Implementation of Qwen1.5 for the project

Issue - State: closed - Opened by cydiachen 7 months ago - 19 comments

#38 - inference error in llavamistral

Issue - State: open - Opened by saeedkhaki92 7 months ago - 2 comments

#37 - Support cuda 12

Issue - State: open - Opened by fishfree 7 months ago - 1 comment

#36 - Eval on MMVET

Issue - State: open - Opened by BeachWang 7 months ago - 1 comment

#35 - Wrong depedancies, why deepspeed dependency for inference, better transformers integration

Issue - State: open - Opened by sujitvasanth 7 months ago - 3 comments

#34 - finetune阶段内存占用太高

Issue - State: open - Opened by awzhgw 7 months ago - 2 comments

#33 - Can i use this to detect events in the video???

Issue - State: open - Opened by Shekharmeena28 8 months ago - 1 comment

#32 - License Questions

Issue - State: open - Opened by kbrostrom 8 months ago - 1 comment

#31 - 第二阶段，loss下降到多少比较合理？

Issue - State: open - Opened by awzhgw 8 months ago - 1 comment

#30 - MoE-LLaVA-StableLM for 4-bits and 8-bit

Issue - State: open - Opened by NikiBase 8 months ago - 1 comment

#29 - panic on finetune

Issue - State: closed - Opened by awzhgw 8 months ago - 3 comments

#28 - 用自己的数据训练MOE-LLAVA，pretrain阶段,loss下降的非常快

Issue - State: closed - Opened by awzhgw 8 months ago - 4 comments

#27 - Reproducing the stage1 and stage2 Model problem on L40s

Issue - State: closed - Opened by cydiachen 8 months ago - 14 comments

#26 - traning dataset?

Issue - State: closed - Opened by luohao123 8 months ago - 14 comments

#25 - 是否支持启动的时候，指定use_flash_attion_2 ??

Issue - State: closed - Opened by awzhgw 8 months ago - 2 comments

#24 - Whether stage-2 pre-train model(llavaphi-2.7b-finetune) is released?

Issue - State: open - Opened by yucheng-zyc 8 months ago - 4 comments

#23 - Wrong cuda allocation

Issue - State: open - Opened by paulgavrikov 8 months ago - 5 comments

#22 - paper和readme指标不一致

Issue - State: closed - Opened by sxu1997 8 months ago - 2 comments

#21 - languageBindVideo model may be hang ?

Issue - State: open - Opened by awzhgw 8 months ago - 8 comments

#20 - 当我使用moe-llava的架构集成了mixtral 7BX8的时候，奇怪的事情发生了

Issue - State: closed - Opened by awzhgw 8 months ago - 4 comments

#19 - ERROR: Could not build wheels for flash-attn, which is required to install pyproject.toml-based projects

Issue - State: open - Opened by andysingal 8 months ago - 2 comments

#18 - Error in predict.py

Issue - State: closed - Opened by KuofengGao 8 months ago - 1 comment

#17 - Moe finetune error

Issue - State: closed - Opened by AdonLee072348 8 months ago - 12 comments

#16 - Do you replicate the weights of the FFNs from stage 1 or stage 2?

Issue - State: closed - Opened by simon-lund 8 months ago - 4 comments

#15 - is support video ?

Issue - State: open - Opened by awzhgw 8 months ago - 3 comments

#14 - Hello, I want to know how much GPU memory is needed to run this model.

Issue - State: closed - Opened by dforel 8 months ago - 2 comments

#13 - Openchat, quantisation, multiimage

Issue - State: closed - Opened by sujitvasanth 8 months ago - 5 comments

#12 - /deepspeed/comm/comm.py", line 341, in all_to_all_single return cdb.all_to_all_single(output=output, AttributeError: 'NoneType' object has no attribute 'all_to_all_single'

Issue - State: open - Opened by lucasjinreal 8 months ago - 13 comments

#11 - image not processed

Issue - State: closed - Opened by leosongwei 8 months ago - 1 comment

#10 - Allow custom storage path of the google/siglip-so400m-patch14-384

Issue - State: closed - Opened by leosongwei 8 months ago - 1 comment

#9 - Is llavallama moe supported?

Issue - State: open - Opened by DietDietDiet 8 months ago - 14 comments

#8 - can support mixtral 7BX8 model ?

Issue - State: closed - Opened by awzhgw 8 months ago - 1 comment

#7 - Update qwen_generation_utils.py

Pull Request - State: open - Opened by eltociear 8 months ago

#6 - Very bad at language ability

Issue - State: closed - Opened by lucasjinreal 8 months ago - 1 comment

#5 - supports Chinese or multiple images?

Issue - State: closed - Opened by BaoyanWang 8 months ago - 3 comments

#4 - Images for training

Issue - State: closed - Opened by phellonchen 8 months ago - 2 comments

#3 - Method to Replicate Results from Huggingface Spaces

Issue - State: closed - Opened by hiroalchem 8 months ago - 5 comments

#2 - Can the author elaborate a bit more about how the stage 3 was achieved?

Issue - State: closed - Opened by CanyonWind 8 months ago - 2 comments

#1 - {'loss': 0.0, 'learning_rate': 1.6877637130801689e-07, 'epoch': 0.0}

Issue - State: closed - Opened by whalefa1I 8 months ago - 14 comments

GitHub / PKU-YuanGroup/MoE-LLaVA issues and pull requests