alibaba/Pai-Megatron-Patch issues and pull requests

#132 - Enhance MoE Upcycled and Fix qwen finetune issues

Pull Request - State: closed - Opened by jerryli1981 9 months ago - 1 comment

#131 - RuntimeError: The size of tensor a (120) must match the size of tensor b (119) at non-singleton dimension 2

Issue - State: open - Opened by wangxiang2713 10 months ago - 1 comment

#130 - Fix traning resume issue

Pull Request - State: closed - Opened by jerryli1981 10 months ago - 1 comment

#129 - Enhance MoE Upcycled and Fix Qwen hf & megatron alignment issues

Pull Request - State: closed - Opened by jerryli1981 10 months ago - 1 comment

#128 - [fixed] fix unmatched shape when PP_size > 1

Pull Request - State: closed - Opened by Dylancer1998 10 months ago - 2 comments

#127 - [fixed] unexpected eos_token concatenation

Pull Request - State: closed - Opened by Dylancer1998 10 months ago

#126 - [feat]: support safetensors format in llama converter

Pull Request - State: closed - Opened by Dylancer1998 10 months ago - 1 comment

#125 - [Fix] support mixtral_8x7b grouped_gemm load state_dict

Pull Request - State: closed - Opened by lxg2015 10 months ago - 1 comment

#124 - No such file or directory: '/mtn/workplace/qwen-ckpts/qwen-14b-hf-to-megatron-tp2-pp1/release/mp_rank_00/model_optim_rng.pt

Issue - State: open - Opened by jamestch 10 months ago - 1 comment

#123 - Update ReadMe

Pull Request - State: closed - Opened by jerryli1981 10 months ago - 1 comment

#122 - Update MoE with Megatron Core

Pull Request - State: closed - Opened by jerryli1981 10 months ago - 1 comment

#121 - qwen 7B 增量预训练，模型加载完，卡在 dataloader 部分, seq_length=0

Issue - State: open - Opened by songyingxin 10 months ago

#120 - Replace break to continue when process layer

Pull Request - State: closed - Opened by jinzhuer 11 months ago - 1 comment

#119 - Got a bug during the pretrain of chatglm.

Issue - State: closed - Opened by FeixLiu 11 months ago - 1 comment

#118 - Qwen 72B 中 megatron 和 huggingface 的不一致

Issue - State: open - Opened by chaochen99 11 months ago

#117 - Add mixtral mcore implementation

Pull Request - State: closed - Opened by jerryli1981 11 months ago - 1 comment

#116 - Mixtral model convert no "convert_checkpoint_from_megatron_to_transformers" function

Issue - State: closed - Opened by cdj0311 11 months ago - 1 comment

#115 - Update git submodule for megatron version control

Pull Request - State: closed - Opened by jerryli1981 11 months ago - 1 comment

#115 - Update git submodule for megatron version control

Pull Request - State: closed - Opened by jerryli1981 11 months ago - 1 comment

#114 - update llama2 ds train

Pull Request - State: closed - Opened by MengLeebin 11 months ago - 1 comment

#114 - update llama2 ds train

Pull Request - State: closed - Opened by MengLeebin 11 months ago - 1 comment

#113 - Fix save moe checkpint

Pull Request - State: closed - Opened by jerryli1981 11 months ago - 1 comment

#112 - 请问训练如何resume？

Issue - State: open - Opened by LittleWhite0208 11 months ago

#112 - 请问训练如何resume？

Issue - State: open - Opened by LittleWhite0208 11 months ago

#111 - Fix save moe checkpint

Pull Request - State: closed - Opened by jerryli1981 11 months ago - 1 comment

#111 - Fix save moe checkpint

Pull Request - State: closed - Opened by jerryli1981 11 months ago - 1 comment

#110 - Add hf2mcore convertor of mixtral model

Pull Request - State: closed - Opened by jerryli1981 11 months ago - 1 comment

#110 - Add hf2mcore convertor of mixtral model

Pull Request - State: closed - Opened by jerryli1981 11 months ago - 1 comment

#109 - Support expert tensor parallelism

Pull Request - State: closed - Opened by jerryli1981 11 months ago - 1 comment

#109 - Support expert tensor parallelism

Pull Request - State: closed - Opened by jerryli1981 11 months ago - 1 comment

#108 - fp8精度权重转换

Issue - State: open - Opened by liuxm117 11 months ago

#107 - update readme

Pull Request - State: closed - Opened by jerryli1981 11 months ago - 1 comment

#107 - update readme

Pull Request - State: closed - Opened by jerryli1981 11 months ago - 1 comment

#106 - 微调多轮对话的语料格式是什么

Issue - State: open - Opened by zeq263 11 months ago

#106 - 微调多轮对话的语料格式是什么

Issue - State: open - Opened by zeq263 11 months ago

#105 - Add Mixtral MoE and Qwen-vl

Pull Request - State: closed - Opened by jerryli1981 11 months ago - 1 comment

#105 - Add Mixtral MoE and Qwen-vl

Pull Request - State: closed - Opened by jerryli1981 11 months ago - 1 comment

#104 - TransformerLayer.init() got an unexpected keyword argument 'apply_query_key_layer_scaling'

Issue - State: closed - Opened by AGI-player 11 months ago - 2 comments

#103 - Pretrain megatron qwen-7b-tp4-pp1 报错 151851 is not divisible by 4

Issue - State: closed - Opened by KannbaraQRS 11 months ago - 2 comments

#103 - Pretrain megatron qwen-7b-tp4-pp1 报错 151851 is not divisible by 4

Issue - State: closed - Opened by KannbaraQRS 11 months ago - 2 comments

#102 - 训练baichuan2 13b 报 KeyError: 'instruction'

Issue - State: closed - Opened by joymcg 12 months ago - 4 comments

#102 - 训练baichuan2 13b 报 KeyError: 'instruction'

Issue - State: closed - Opened by joymcg 12 months ago - 4 comments

#101 - either train-iters or train-samples should be provided

Issue - State: closed - Opened by liuxm117 12 months ago - 1 comment

#100 - ModuleNotFoundError: No module named 'megatron.data.gpt_dataset'

Issue - State: closed - Opened by liuxm117 12 months ago - 6 comments

#100 - ModuleNotFoundError: No module named 'megatron.data.gpt_dataset'

Issue - State: closed - Opened by liuxm117 12 months ago - 6 comments

#99 - add cvcuda_image_processing

Pull Request - State: closed - Opened by jerryli1981 12 months ago - 1 comment

#98 - Fix zero shot evaluate megatron issue

Pull Request - State: closed - Opened by jerryli1981 12 months ago - 1 comment

#97 - Fix zero shot evaluate megatron issue

Pull Request - State: closed - Opened by jerryli1981 12 months ago - 1 comment

#97 - Fix zero shot evaluate megatron issue

Pull Request - State: closed - Opened by jerryli1981 12 months ago - 1 comment

#96 - Fix zero shot evaluate megatron issue

Pull Request - State: closed - Opened by jerryli1981 12 months ago - 1 comment

#96 - Fix zero shot evaluate megatron issue

Pull Request - State: closed - Opened by jerryli1981 12 months ago - 1 comment

#95 - Finetune Qwen-72B

Issue - State: closed - Opened by LittleWhite0208 12 months ago - 1 comment

#94 - Support pipeline evaluation for deepseek

Pull Request - State: closed - Opened by jerryli1981 12 months ago - 1 comment

#94 - Support pipeline evaluation for deepseek

Pull Request - State: closed - Opened by jerryli1981 12 months ago - 1 comment

#93 - Support pipeline evaluation for baichuan2, llama2, mistral and qwen

Pull Request - State: closed - Opened by jerryli1981 12 months ago - 1 comment

#93 - Support pipeline evaluation for baichuan2, llama2, mistral and qwen

Pull Request - State: closed - Opened by jerryli1981 12 months ago - 1 comment

#92 - Support pipeline evaluation for baichuan2, llama2, mistral and qwen

Pull Request - State: closed - Opened by jerryli1981 12 months ago - 1 comment

#92 - Support pipeline evaluation for baichuan2, llama2, mistral and qwen

Pull Request - State: closed - Opened by jerryli1981 12 months ago - 1 comment

#91 - Fix Yi evaluation issue

Pull Request - State: closed - Opened by jerryli1981 12 months ago - 1 comment

#90 - Fix llama2 finetune issue

Pull Request - State: closed - Opened by jerryli1981 about 1 year ago - 1 comment

#90 - Fix llama2 finetune issue

Pull Request - State: closed - Opened by jerryli1981 about 1 year ago - 1 comment

#89 - RLHF is not supported by the Patch in Megatron-LLM but is adapted to deepspeed lib

Issue - State: open - Opened by zhangzhenyu13 about 1 year ago

#89 - RLHF is not supported by the Patch in Megatron-LLM but is adapted to deepspeed lib

Issue - State: open - Opened by zhangzhenyu13 about 1 year ago

#88 - fix qwen-finetune-withqa when tensor parallel

Pull Request - State: closed - Opened by jerryli1981 about 1 year ago - 1 comment

#88 - fix qwen-finetune-withqa when tensor parallel

Pull Request - State: closed - Opened by jerryli1981 about 1 year ago - 1 comment

#87 - fix qwen-finetuen-withga bugs

Pull Request - State: closed - Opened by jerryli1981 about 1 year ago - 1 comment

#86 - fix llava and qwen finetune with ga bugs

Pull Request - State: closed - Opened by jerryli1981 about 1 year ago - 1 comment

#86 - fix llava and qwen finetune with ga bugs

Pull Request - State: closed - Opened by jerryli1981 about 1 year ago - 1 comment

#85 - fix llava, qwen-finetuen-withga bugs

Pull Request - State: closed - Opened by jerryli1981 about 1 year ago - 1 comment

#84 - fix finetune with GA

Pull Request - State: closed - Opened by lwmlyy about 1 year ago

#84 - fix finetune with GA

Pull Request - State: closed - Opened by lwmlyy about 1 year ago

#83 - Add Freeze for LLava

Pull Request - State: closed - Opened by jerryli1981 about 1 year ago - 1 comment

#82 - fix finetunewGA

Pull Request - State: closed - Opened by lwmlyy about 1 year ago

#82 - fix finetunewGA

Pull Request - State: closed - Opened by lwmlyy about 1 year ago

#81 - fix bugs for qwen and llama2

Pull Request - State: closed - Opened by lwmlyy about 1 year ago

#81 - fix bugs for qwen and llama2

Pull Request - State: closed - Opened by lwmlyy about 1 year ago

#80 - Qwen rope

Pull Request - State: closed - Opened by lwmlyy about 1 year ago

#79 - fix loading model & optimizer

Pull Request - State: closed - Opened by Renaissance25 about 1 year ago - 1 comment

#79 - fix loading model & optimizer

Pull Request - State: closed - Opened by Renaissance25 about 1 year ago - 1 comment

#78 - add GA finetune and deepseek&codellama rope

Pull Request - State: closed - Opened by lwmlyy about 1 year ago

#78 - add GA finetune and deepseek&codellama rope

Pull Request - State: closed - Opened by lwmlyy about 1 year ago

#77 - fix the name of Qwen model in readme

Pull Request - State: closed - Opened by jhuang1207 about 1 year ago - 1 comment

#77 - fix the name of Qwen model in readme

Pull Request - State: closed - Opened by jhuang1207 about 1 year ago - 1 comment

#76 - Add Qwen 72b finetune demo

Pull Request - State: closed - Opened by jerryli1981 about 1 year ago - 1 comment

#76 - Add Qwen 72b finetune demo

Pull Request - State: closed - Opened by jerryli1981 about 1 year ago - 1 comment

#75 - remove idx=0 in data module

Pull Request - State: closed - Opened by jerryli1981 about 1 year ago - 1 comment

#75 - remove idx=0 in data module

Pull Request - State: closed - Opened by jerryli1981 about 1 year ago - 1 comment

#74 - Fix data module remove idx=0

Pull Request - State: closed - Opened by jerryli1981 about 1 year ago - 1 comment

#74 - Fix data module remove idx=0

Pull Request - State: closed - Opened by jerryli1981 about 1 year ago - 1 comment

#73 - Add Baichuan2 for 2304

Pull Request - State: closed - Opened by jerryli1981 about 1 year ago - 1 comment

#72 - Add Yi Model

Pull Request - State: closed - Opened by jerryli1981 about 1 year ago - 1 comment

#72 - Add Yi Model

Pull Request - State: closed - Opened by jerryli1981 about 1 year ago - 1 comment

#71 - bugfix_llava_import

Pull Request - State: closed - Opened by tuofeilunhifi about 1 year ago

#71 - bugfix_llava_import

Pull Request - State: closed - Opened by tuofeilunhifi about 1 year ago

#70 - Fix dataset type with Pretrain-IdxMap

Pull Request - State: closed - Opened by jerryli1981 about 1 year ago - 1 comment

#69 - Fix dataset type with Pretrain-IdxMap

Pull Request - State: closed - Opened by jerryli1981 about 1 year ago - 1 comment

#69 - Fix dataset type with Pretrain-IdxMap

Pull Request - State: closed - Opened by jerryli1981 about 1 year ago - 1 comment

#68 - Fix dataset type with Pretrain-IdxMap

Pull Request - State: closed - Opened by jerryli1981 about 1 year ago - 1 comment

#67 - update codellama

Pull Request - State: closed - Opened by lwmlyy about 1 year ago

#67 - update codellama

Pull Request - State: closed - Opened by lwmlyy about 1 year ago

GitHub / alibaba/Pai-Megatron-Patch issues and pull requests