Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / InternLM/InternEvo issues and pull requests
#374 - fix(linear.py): linear module uneven split is forbidden
Pull Request -
State: open - Opened by huangting4201 6 days ago
#373 - fix(monitor): send exception when feishu alert is enable && remove light monitoring address
Pull Request -
State: open - Opened by JiaoPL 7 days ago
#372 - [QA] Does internEvo support loongtrain selective checkpoint++?
Issue -
State: open - Opened by wplf 7 days ago
- 1 comment
Labels: question
#371 - fix(gmm): change communicator.grad_hook to async
Pull Request -
State: open - Opened by blankde 7 days ago
#370 - fix(mha.py): fix evaluation argu key err
Pull Request -
State: closed - Opened by huangting4201 9 days ago
#369 - feat(fp8): [Work In Progress] enable FP8 training
Pull Request -
State: open - Opened by zigzagcai 21 days ago
- 1 comment
#368 - remove unused moe changes , modify _q_kv_without_cu_seqlens and _SplitForwardGatherBackward
Pull Request -
State: closed - Opened by KkHu-Kistch 26 days ago
#367 - Add hetero feat
Pull Request -
State: closed - Opened by fumihwh 26 days ago
#366 - fix(isp.py): fix isp overlap backward allgather twice when activation ckpt 0.x
Pull Request -
State: open - Opened by huangting4201 27 days ago
#365 - Add z loss to PipelineSchedule
Pull Request -
State: closed - Opened by zhhsplendid 28 days ago
#364 - fix lumina model and add lumina ckpt support
Pull Request -
State: closed - Opened by SHshenhao 28 days ago
#363 - fix lumina model and add lumina ckpt support
Pull Request -
State: closed - Opened by SHshenhao 28 days ago
#362 - fix lumina model and add lumina ckpt support
Pull Request -
State: closed - Opened by SHshenhao 28 days ago
#361 - A PR Provides Multi Machine MPI scripts
Pull Request -
State: closed - Opened by zhhsplendid 28 days ago
#360 - fix(mlp.py): fix mlp w1w2w3 init order to w1w3w2
Pull Request -
State: open - Opened by huangting4201 29 days ago
#359 - fix llava model device bugs
Pull Request -
State: open - Opened by hellozmz 30 days ago
#358 - Feat/refactor process group
Pull Request -
State: open - Opened by mwiacx 30 days ago
#357 - feat(pipeline): Zero Bubble V Shape Memory Efficient Editon
Pull Request -
State: closed - Opened by li126com about 1 month ago
#356 - Tmp fix QK norm bug
Pull Request -
State: closed - Opened by zhhsplendid about 1 month ago
#355 - Feat/heterogeneous x pu training
Pull Request -
State: closed - Opened by KkHu-Kistch about 1 month ago
#354 - [QA] 如何进行单卡微调的,需要调整那些设置
Issue -
State: open - Opened by OkGuai about 1 month ago
Labels: question
#353 - [Feature] Add Lumina Model to InternEvo. Tested on MUXI single card
Pull Request -
State: closed - Opened by zhhsplendid about 1 month ago
#352 - feat(moe): add gshard token rearrange optim
Pull Request -
State: open - Opened by blankde about 1 month ago
#351 - fix(checkpoint/components.py): fix lr scheduler resume step count
Pull Request -
State: closed - Opened by huangting4201 about 1 month ago
#350 - feat(moe): support moe zero1 setting
Pull Request -
State: open - Opened by blankde about 1 month ago
#349 - feat(model): support kv head copy
Pull Request -
State: closed - Opened by yingtongxiong about 2 months ago
#348 - fix(moe): dropless moe loss
Pull Request -
State: closed - Opened by blankde about 2 months ago
#347 - doc(2d): docs for 2d-attention
Pull Request -
State: closed - Opened by yingtongxiong about 2 months ago
#346 - [QA] loong train 支持packed_sample_into_one=false吗
Issue -
State: open - Opened by Lzhang-hub 2 months ago
- 1 comment
Labels: question
#345 - feat(moe): support group mlp for moe
Pull Request -
State: closed - Opened by blankde 2 months ago
#344 - feat(dataloader): refine implementation of mocked and megatron dataloader
Pull Request -
State: open - Opened by zigzagcai 2 months ago
#343 - feat(zero bubble): update zbh1
Pull Request -
State: open - Opened by li126com 2 months ago
#342 - [Bug] There will be timeout in some cases.
Issue -
State: closed - Opened by kkscilife 2 months ago
- 1 comment
Labels: bug
#341 - fix inject model and add multimodal dataloader
Pull Request -
State: closed - Opened by sallyjunjun 2 months ago
#340 - fix(enable_qkv_fusion): minor fix for qkv fusion
Pull Request -
State: closed - Opened by zigzagcai 2 months ago
#339 - fix dispatch model
Pull Request -
State: closed - Opened by sallyjunjun 2 months ago
#338 - fix(enable_qkv_fusion): refine wqkv fusion
Pull Request -
State: closed - Opened by zigzagcai 2 months ago
#337 - fix wqkv fusion
Pull Request -
State: closed - Opened by zigzagcai 2 months ago
#336 - fix wqkv fusion
Pull Request -
State: closed - Opened by zigzagcai 2 months ago
#335 - fix wqkv dim when enable qkv fusion
Pull Request -
State: closed - Opened by sallyjunjun 2 months ago
#334 - fix(pipeline): fix zero bubble pipeline parallelism
Pull Request -
State: closed - Opened by li126com 2 months ago
#333 - Feat(adam): support apex FusedAdam
Pull Request -
State: closed - Opened by li126com 2 months ago
#332 - feat(moe): add moe async param handler
Pull Request -
State: open - Opened by blankde 2 months ago
#331 - feat(usability): Refine model inject helper to support huggingface models
Pull Request -
State: closed - Opened by zigzagcai 2 months ago
#330 - remove isp memory pool
Pull Request -
State: closed - Opened by mwiacx 2 months ago
#329 - update test loss
Pull Request -
State: open - Opened by li126com 2 months ago
#328 - fix(isp): fix unnecessary module gather for isp
Pull Request -
State: closed - Opened by blankde 2 months ago
- 2 comments
#327 - add qwen2moe and mixtral
Pull Request -
State: closed - Opened by sallyjunjun 3 months ago
- 1 comment
#326 - feat(model: impl gpt 567 b
Pull Request -
State: closed - Opened by blankde 3 months ago
#325 - [Feature] MoE模型里稠密层和专家层zero和并行的解耦
Issue -
State: open - Opened by sunpengsdu 3 months ago
Labels: enhancement
#324 - [Feature] 不使用memory pool
Issue -
State: open - Opened by sunpengsdu 3 months ago
- 1 comment
Labels: enhancement
#323 - feat(dataloader): Implement megatron dataloader and mocked dataloader
Pull Request -
State: closed - Opened by zigzagcai 3 months ago
- 1 comment
#322 - feat(moe): support moe isp and no tp
Pull Request -
State: closed - Opened by blankde 3 months ago
#321 - feat(moe): support moe no tp
Pull Request -
State: closed - Opened by blankde 3 months ago
#320 - feat(moe): support dropless layer
Pull Request -
State: closed - Opened by blankde 3 months ago
- 3 comments
#319 - fix(ci): fix weekly ci
Pull Request -
State: closed - Opened by zigzagcai 3 months ago
- 1 comment
#318 - [Bug] There is an error in training : built-in model should inherited from BaseModel
Issue -
State: closed - Opened by kkscilife 3 months ago
- 1 comment
Labels: bug
#317 - fix(cross_entropy.py): replace the fa loss with apex loss
Pull Request -
State: closed - Opened by yingtongxiong 3 months ago
#316 - fix(shard.py): fix isp unpack data indexes err in rotary emb
Pull Request -
State: closed - Opened by huangting4201 3 months ago
#315 - add vacab parallel embedding
Pull Request -
State: closed - Opened by mwiacx 3 months ago
- 1 comment
#314 - fix(ci): fix error in train_CI
Pull Request -
State: closed - Opened by zigzagcai 3 months ago
#313 - fix(model): fix bugs of batch generation & support min_new_tokens for inference
Pull Request -
State: closed - Opened by x54-729 3 months ago
Labels: bug
#312 - Add new models
Pull Request -
State: closed - Opened by sallyjunjun 3 months ago
#311 - fix(embedding): fix incorrect computing of indexes in _update_cos_sin_cache
Pull Request -
State: closed - Opened by li126com 3 months ago
#310 - improve documentation
Pull Request -
State: closed - Opened by sallyjunjun 3 months ago
#309 - fix(910B): fix bugs in 910B for varlen and fixlen FA
Pull Request -
State: closed - Opened by li126com 3 months ago
- 2 comments
#308 - fix(isp): fix dist-attn infer
Pull Request -
State: closed - Opened by KimmiShi 3 months ago
- 1 comment
#307 - [Bug] 910B已知BUG和解决情况
Issue -
State: closed - Opened by li126com 3 months ago
Labels: bug
#306 - [Feature] 优化ce_loss计算
Issue -
State: closed - Opened by zigzagcai 3 months ago
Labels: enhancement
#305 - add data flow doc
Pull Request -
State: closed - Opened by sallyjunjun 3 months ago
#304 - feat(usability): Attempt for easier usability
Pull Request -
State: closed - Opened by zigzagcai 3 months ago
- 1 comment
#303 - Attempt for easier usability
Pull Request -
State: closed - Opened by zigzagcai 3 months ago
#302 - [Bug] Import Error: Import "deeplink_ext.internlm_ops" could not be resolved
Issue -
State: closed - Opened by kkscilife 3 months ago
- 1 comment
Labels: bug
#301 - support pip install on npu environment
Pull Request -
State: closed - Opened by sallyjunjun 3 months ago
#300 - [QA] check import system var at the start of training
Issue -
State: open - Opened by sunpengsdu 3 months ago
Labels: question
#299 - Zmz/qwen2
Pull Request -
State: closed - Opened by hellozmz 3 months ago
#298 - [Bug] 昇腾910安装internLM环境时报错需要nvcc
Issue -
State: closed - Opened by tungsten106 3 months ago
- 2 comments
Labels: bug
#297 - fix(launch): remove use_paked_data=use_flash_atten assert
Pull Request -
State: closed - Opened by yingtongxiong 4 months ago
#296 - fix(npu): fix npu dim incorrect squeeze when head num=1
Pull Request -
State: closed - Opened by SolenoidWGT 4 months ago
#295 - fix hf internlm nan bug
Pull Request -
State: closed - Opened by sallyjunjun 4 months ago
#294 - feat(modeling): support qwen2
Pull Request -
State: closed - Opened by SolenoidWGT 4 months ago
#293 - feat(trainer_builder): refactor trainer_builder and preserve optional callable for custom model dispatch function in isp mode
Pull Request -
State: closed - Opened by zigzagcai 4 months ago
- 5 comments
#292 - [QA] 代码中涉及到的字符串比较,整改为枚举类型比较
Issue -
State: closed - Opened by sallyjunjun 4 months ago
Labels: question
#291 - [QA] 梳理load_hf_llama_pretrained_weights相关代码逻辑,清理无用代码
Issue -
State: closed - Opened by sallyjunjun 4 months ago
Labels: question
#290 - fix(data): fix the unpack data
Pull Request -
State: closed - Opened by yingtongxiong 4 months ago
#289 - fix(moe): change moe norm reduced group
Pull Request -
State: closed - Opened by blankde 4 months ago
- 1 comment
#288 - Feat(*):loong train
Pull Request -
State: closed - Opened by huangting4201 4 months ago
#287 - add isp support of huggingface model
Pull Request -
State: closed - Opened by sallyjunjun 4 months ago
#286 - [Feature] how to finetuning lora
Issue -
State: open - Opened by wen020 4 months ago
- 1 comment
Labels: enhancement
#285 - [Bug] RuntimeError: [3] is setting up NCCL communicator and retrieving ncclUniqueId from [0] via c10d key-value store by key '0', but store->get('0') got error: Socket Timeout
Issue -
State: open - Opened by kkscilife 4 months ago
Labels: bug
#284 - Hf isp support
Pull Request -
State: closed - Opened by sallyjunjun 4 months ago
#283 - feat(varlen): support varlen training for huggingface models
Pull Request -
State: closed - Opened by zigzagcai 4 months ago
- 5 comments
#282 - feat(pipeline parallel): add zero bubble pipeline parallelism (ZB-H1)
Pull Request -
State: closed - Opened by li126com 4 months ago
#281 - feat(setup and docs): add one-click setup and refine docs
Pull Request -
State: closed - Opened by zigzagcai 4 months ago
#280 - fix: support newest internevo with deeplink
Pull Request -
State: closed - Opened by POI-WX 4 months ago
#279 - [Bug] 仅支持了GShard模式的MoE模型转huggingface
Issue -
State: open - Opened by Cerberous 4 months ago
Labels: bug
#278 - [Bug] 训练bf16 infer fp16出现NaN
Issue -
State: open - Opened by Cerberous 4 months ago
Labels: bug
#277 - fix(huggingface): fix huggingface dataloader when using some huggingface third-party tokenizers
Pull Request -
State: closed - Opened by zigzagcai 4 months ago
- 1 comment
#276 - Fix(ckpt): fix llama2 loading function
Pull Request -
State: closed - Opened by zigzagcai 5 months ago
#275 - feat(checkpoint): TP recomputation communication optimization
Pull Request -
State: open - Opened by li126com 5 months ago