InternLM/xtuner issues and pull requests

#957 - command error: ''Adafactor is already registered in optimizer at torch.optim''

Issue - State: open - Opened by monteir03 11 days ago - 1 comment

#956 - torch编译错误

Issue - State: open - Opened by tcxia 15 days ago

#955 - 能否支持在服务器离线部署xtuner框架进行微调？

Issue - State: open - Opened by Fanxhion 16 days ago

#954 - 对Minicpm3进行了支持

Pull Request - State: closed - Opened by LDLINGLINGLING 18 days ago

#953 - 怎么设置random seed控制train dpo随机性

Issue - State: open - Opened by pingbowen23 19 days ago

#952 - xtuner 微调internLM2.5出错

Issue - State: open - Opened by sakura073 21 days ago - 6 comments

#951 - fix qwen2 tokenizer name

Pull Request - State: open - Opened by amulil 23 days ago

#950 - 报错[WinError 2] 系统找不到指定的文件。

Issue - State: open - Opened by renlingjie 23 days ago

#949 - 请问如何在微调训练时查看数据

Issue - State: open - Opened by liguoyu666 28 days ago

#948 - CheckpointHook开启by_epoch时未保存模型

Issue - State: open - Opened by aizhweiwei 28 days ago - 1 comment

#947 - 请问如何支持STF时对不同来源的数据分别画损失？

Issue - State: open - Opened by Abigail61 29 days ago

#946 - Add functionality to download models from sources other than HuggingFace

Pull Request - State: open - Opened by starmountain1997 29 days ago

#945 - xtuner如何根据checkpoints继续预训练

Issue - State: open - Opened by aizhweiwei about 1 month ago - 1 comment

#944 - xtuner在超过10000条数据集上运行正常，在1000条数据集上运行失败

Issue - State: open - Opened by tiang2002 about 1 month ago

#943 - Does xtuner support DPO for InternVL?

Issue - State: open - Opened by fabriceyhc about 1 month ago - 1 comment

#942 - llava-llama3-8b 微调过程中 loss nan

Issue - State: open - Opened by liboaccn about 1 month ago - 1 comment

#941 - 微调基于 InternLM2-7B 的模型时错误：TypeError: Linear4bit.forward() takes 2 positional arguments but 3 were given

Issue - State: open - Opened by AFObject about 1 month ago

#940 - 是否支持微调Flux.1 dev

Issue - State: open - Opened by TongrongHuang about 1 month ago

#939 - support for training lamma 3.2 - vision

Issue - State: open - Opened by JAVerma about 1 month ago

#938 - When seq_parallel_world_size is set to a value greater than 1, should use_varlen_attn not be set to true?

Issue - State: open - Opened by Fovercon about 1 month ago

#937 - docker利用xtuner微调时，出错，不知道哪的问题？

Issue - State: open - Opened by 159357hou about 1 month ago - 2 comments

#936 - 请问目前支持qwen2吗？

Issue - State: open - Opened by Zheng-Jay about 2 months ago - 3 comments

#935 - AttributeError: 'Qwen2FlashAttention2' object has no attribute '_flash_attention_forward'

Issue - State: open - Opened by zhangyuqi-1 about 2 months ago - 2 comments

#934 - 选择四卡训练卡住

Issue - State: open - Opened by AlittlePIE about 2 months ago - 1 comment

#933 - intern2.5-20B微调后词表长度不一致

Issue - State: open - Opened by topology1 about 2 months ago

#932 - 使用lengthgroupedsampler代替原本的sampler后卡死

Issue - State: open - Opened by xcy9614 about 2 months ago

#931 - [Fix] Fix OOM when qlora converting

Pull Request - State: open - Opened by fanqiNO1 about 2 months ago

#930 - [Bugs] fix qlora convert bugs

Pull Request - State: closed - Opened by HIT-cwh about 2 months ago

#929 - 如何进行val和test？

Issue - State: open - Opened by Diyigelieren about 2 months ago

#928 - version `GLIBCXX_3.4.29' not found

Issue - State: open - Opened by amannier about 2 months ago

#927 - Failed to inference single image using xtuner chat with llava-llama3-8b model

Issue - State: closed - Opened by J0eky about 2 months ago - 1 comment

#926 - 奖励模型问题

Issue - State: open - Opened by Eren139 about 2 months ago - 1 comment

#925 - transformers == 4.44.2 xtuner == 0.1.23 训练 qwen2 时报错

Issue - State: open - Opened by thomZ1 2 months ago - 2 comments

#924 - 多机多卡训练报错ss1.ss_family == ss2.ss_family. 2 vs 10

Issue - State: open - Opened by sph116 2 months ago

#923 - 请问与 llamaFactory 的训练 TGS 对比时的具体实验条件

Issue - State: open - Opened by shihanmax 2 months ago

#922 - 报错Cannot find reference 'VarlenAttnArgsToMessageHubHook' in 'init.py'

Issue - State: open - Opened by hutiehua-1 2 months ago - 1 comment

#921 - 有个疑问，计算Loss的时候并不是以reward_token_id最终loss计算的，为什么推理的时候可以以reward_token_id为准呢？

Issue - State: open - Opened by woshixiaobai2019 2 months ago - 6 comments

#920 - QwenVL支持

Issue - State: open - Opened by liyan1997 2 months ago

#919 - 整合Liger Kernel: 最高效的Triton Training Kernels

Issue - State: open - Opened by ByronHsu 2 months ago

#918 - 一些关于步数统计的疑问

Issue - State: open - Opened by young-chao 2 months ago

#917 - add rescale sp loss

Pull Request - State: open - Opened by HIT-cwh 2 months ago

#916 - reward model训练完如何预测？

Issue - State: open - Opened by tcxia 2 months ago - 1 comment

#915 - qlora微调的模型是不支持中断后继续训练吗？

Issue - State: open - Opened by deep-practice 2 months ago - 3 comments

#914 - sharegpt4v数据集map错误

Issue - State: closed - Opened by bjzhb666 2 months ago - 1 comment

#913 - InternVL构造单图多轮对话数据的时候，每轮对话都需要加上<image>标签吗？

Issue - State: open - Opened by deep-practice 2 months ago - 1 comment

#912 - 【应当修改哪个环境变量？】Setting ds_accelerator to cuda (auto detect) df: "/home/guochenchen/.triton/autotune": 没有那个文件或目录

Issue - State: open - Opened by gwoksansan 3 months ago

#911 - 如何修改 master port

Issue - State: open - Opened by AislantVentus 3 months ago - 1 comment

#910 - train time decrease from 13 hours to 9

Issue - State: open - Opened by mylesgoose 3 months ago

#909 - LLaVa phi-3 sft 报错 ConnectionResetError: [Errno 104] Connection reset by peer

Issue - State: open - Opened by Yu-Yang-Li 3 months ago

#908 - 微调数据集策略（dataset make confuse）

Issue - State: open - Opened by EasonQYS 3 months ago

#907 - internvl微调的数据集一条有多个jsonl文件和多个图片该怎么写config

Issue - State: open - Opened by mspythontu 3 months ago

#906 - [Feature] Support balanced dataset to speed-up VL training

Pull Request - State: open - Opened by yqyao 3 months ago - 2 comments

#904 - How to modify the vision encoder of llava-llama3-8b?

Issue - State: open - Opened by Jason8Kang 3 months ago

#903 - fine-tuning codegeex4

Issue - State: open - Opened by sgjohnson1981 3 months ago

#902 - 训练营3 XTuner运行xtuner train ./internlm2_chat_1_8b_qlora_alpaca_e3_copy.py 报错

Issue - State: open - Opened by Viki-researcher 3 months ago - 2 comments

#901 - Load failure with the converted finetune InternVL2-2B model

Issue - State: closed - Opened by leagend 3 months ago - 1 comment

#900 - 使用 xtuner convert pth_to_hf 会加载模型2次导致显存炸了，怎么解决

Issue - State: open - Opened by c-x-l-w 3 months ago

#899 - Error when doing sft training according to `https://xtuner.readthedocs.io/en/latest/get_started/quickstart.html#`

Issue - State: open - Opened by YanShuang17 3 months ago - 1 comment

#898 - 如何实现多张卡共同存放单个模型

Issue - State: open - Opened by RyanOvO 3 months ago - 1 comment

#897 - 之前理解错误，麻烦删除

Issue - State: closed - Opened by Hellcat1005 3 months ago

#896 - Packer 好像没有没有分块 attention_mask

Issue - State: closed - Opened by WallE-Chang 3 months ago

#895 - there is no script for gpt fintune

Issue - State: open - Opened by zhenghuawang6 3 months ago

#894 - 长对话的微调训练

Issue - State: open - Opened by RyanOvO 3 months ago

#893 - 怎么自己指定fp16 fp32 bp16？

Issue - State: closed - Opened by bjzhb666 3 months ago - 1 comment

#892 - accumulative_counts起作用了吗？

Issue - State: closed - Opened by YixinSong-e 3 months ago - 3 comments

#891 - 多机训练的数据集问题

Issue - State: closed - Opened by YixinSong-e 3 months ago

#890 - internlm2.py Boolean value of Tensor with more than one value is ambiguous

Issue - State: open - Opened by doudoudiule 3 months ago - 2 comments

#889 - Adjust the order of InternVL dataset log printing

Pull Request - State: open - Opened by KooSung 3 months ago - 1 comment

#888 - fix

Pull Request - State: open - Opened by ArtificialZeng 3 months ago

#887 - please fix llava 70B config

Issue - State: open - Opened by ds-kczerski 3 months ago

#886 - 无法识别修改后的配置文件

Issue - State: open - Opened by happyrenxiaozhao 3 months ago

#885 - [Bug] fix openai_map_fn bugs

Pull Request - State: closed - Opened by HIT-cwh 3 months ago

#884 - [Fix] Fix output_with_loss in openai_map_fn

Pull Request - State: closed - Opened by fanqiNO1 3 months ago

#883 - 训练时卡死在"Checkpoints will be saved to...”

Issue - State: open - Opened by No360201 3 months ago - 1 comment

#882 - qwen1.5-32b config 支持

Issue - State: open - Opened by WallE-Chang 3 months ago

#880 - CheckpointHook开启by_epoch时未保存模型

Issue - State: open - Opened by SingL3 3 months ago - 2 comments

#879 - deepseek v2 使用shard模式做训练，在load权重的部分报missing w1w3这类key的情况

Issue - State: open - Opened by FlyCarrot 3 months ago

#878 - support transformers >= 4.43

Pull Request - State: closed - Opened by HIT-cwh 3 months ago

#877 - [Feature] Pipeline Parallelization of Different Stages in RLHF

Pull Request - State: open - Opened by llkn-2 3 months ago - 4 comments

#876 - transformers 4.42.4以后已经没有_flash_attention_forward方法

Issue - State: open - Opened by Snowdar 3 months ago - 2 comments

#875 - 有效节省显存的原因

Issue - State: open - Opened by 2020zyc 3 months ago

#874 - llama3-8b扩充词表训练RuntimeError: CUDA error: device-side assert triggered

Issue - State: open - Opened by silvercherry 3 months ago - 1 comment

#873 - [Bug] fix dsv2 attn dispatch (softmax_scale)

Pull Request - State: closed - Opened by HIT-cwh 3 months ago

#872 - Add internlm2 5 cfgs

Pull Request - State: closed - Opened by HIT-cwh 3 months ago

#871 - xtuner可以支持v100上全量训练7b模型吗？

Issue - State: open - Opened by zhou888888 3 months ago - 2 comments

#870 - Json Error in vis_data

Issue - State: open - Opened by kleinzcy 3 months ago

#869 - readme中增加了MiniCPM的支持

Pull Request - State: closed - Opened by LDLINGLINGLING 3 months ago

#868 - deepseek v2 lite 模型 convert 时 print_on_rank0() 报错

Issue - State: open - Opened by FlyCarrot 3 months ago

#867 - llama3.1 support

Issue - State: open - Opened by yinjun622 4 months ago - 3 comments

#866 - internvl_v2_internlm2_26b_qlora_finetune 报错：libtorch_cuda_cpp.so: cannot open shared object file: No such file or directory

Issue - State: open - Opened by DEVOE-YUN 4 months ago

#865 - custom system/prompt template support?

Issue - State: open - Opened by HeegonJin 4 months ago - 3 comments

#862 - bump version to 0.1.23

Pull Request - State: closed - Opened by HIT-cwh 4 months ago

#857 - 关于dpo训练时chat template的使用问题

Issue - State: open - Opened by pokerc 4 months ago - 3 comments

#839 - Qwen2-72B，16K长文本，convert转换为HF模型OOM

Issue - State: open - Opened by daiyafei2013 4 months ago - 2 comments

#837 - [Feature] Add LLaST(WIP)

Pull Request - State: open - Opened by ChenX17 4 months ago

#836 - 多机多卡如何启动训练

Issue - State: open - Opened by poisonwine 4 months ago - 3 comments

#835 - llama3_8b_instruct_clip_vit_large_p14_336微调后模型如何转换为HuggingFace格式？

Issue - State: open - Opened by chalesguo 4 months ago - 8 comments

#834 - llava预训练报错RuntimeError: The size of tensor a (0) must match the size of tensor b (592) at non-singleton dimension 2

Issue - State: open - Opened by rfvscj 4 months ago - 3 comments

#833 - RuntimeError: The size of tensor a (0) must match the size of tensor b (592) at non-singleton dimension 0

Issue - State: open - Opened by CYing18 4 months ago - 4 comments

#815 - 关于训练的一些疑惑

Issue - State: open - Opened by Zheng-Jay 4 months ago - 3 comments

GitHub / InternLM/xtuner issues and pull requests